I didn't mean what you thought I did, I was saying that if Tegra 2 doesn't have NEON then Broadway (Wii's CPU) has SIMD as an advantage over it.
As Arun has said many times, and I agree with him on this, ~80% of die area devoted to NEON is a waste.
I didn't mean what you thought I did, I was saying that if Tegra 2 doesn't have NEON then Broadway (Wii's CPU) has SIMD as an advantage over it.
As Arun has said many times, and I agree with him on this, ~80% of die area devoted to NEON is a waste.
However, as someone who is actually going to be putting NEON to good use (integer NEON at that) I find this to be an unfortunate compatibility issue.
80% of die area for NEON? 80% of what, the space of a single Cortex-A9 (L2 cache obviously not contributing)? Do you actually mean 80% of a Cortex-A9 with NEON is the NEON part, or do you mean that it's 80% larger, because the latter sounds much more plausible than the former.
Or maybe you mean that the entire Tegra 2 die would be 80% composed of NEON if they included it ;P
DO you need FCC approval for what is clearly a development platform ??
I meant ~80% of the instructions are meant for video decode, which is a pointless extension as there is ff decode hw in that space.
Like what?Because I want to use it for other things.
Like what?
Sounds pretty niche to me. IOW, not worth the silicon and power imho.I won't go into detail, but rendering in emulation.
Sounds pretty niche to me. IOW, not worth the silicon and power imho.
Well, your niche was so small that it didn't even occur to me.Where was I arguing that it wasn't? I never said that I think this design decision should be made based on MY wants, nor that I think going w/o NEON is unreasonable for Tegra. I do think there are other uses for integer SIMD as well, but current software climate is likely not exposing them, especially on ARM. Not including NEON is a tradeoff that they're fairly justified in making.
I just said I was personally unhappy about it the compatibility issues it'll create for me, and I said that Broadway's SIMD is an advantage over Tegra's lack thereof (here we're talking about float SIMD as well). It doesn't really matter if the advantages cater to the niche or not, an advantage is an advantage.
You said 80% of NEON was only useful for video decoding. I said it's useful to me for something else.
Regarding your die area comment earlier, that was just my misunderstanding. You said "~80% of die area devoted to NEON is a waste." Grammatically speaking you should have said "~80% of the die area." As it was I took it to mean that 80% of total die space some larger unit would have been dedicated to NEON, and therefore 80% of that die would be wasted.
Well, your niche was so small that it didn't even occur to me.
MfA said:Floating point probably dominates ... even ATI can afford to put some video instructions in their GPU.
The integer stuff is nice to have and practically free if you don't try to do the higher bit multiplies at single cycle throughput.
There's a related domain where AdvSIMD (the official ISA name for NEON, NEON being only the name of the unit implementing it) is very useful: some 2d graphics routines, and this will affect all of the users of some libs such as pixman.My niche is not THAT small. Emulation on handheld devices is actually pretty popular, with at least hundreds of thousands of people doing it (real number is probably in the millions)
You could also share some parts of VFP and NEON units. AdvSIMD implementations certainly don't have to be huge if you're ready to trade some performance :smile:NEON on Cortex-A8 (probably A9 too) has separate FPMUL, FPADD, integer MAC, and integer ALU pipelines.. therefore I'm not sure how much can be shared between the FP multiplier and the integer one, if anything. It can output 2x single-precision FMUL per cycle, or 1x 32x32->64 integer per cycle. You would think that'd be a similar multiplier load, so it'd make sense to share it; the FPMUL has two additional pipeline stages, but I figure those are for float stuff and not multiply stuff.
The early Z guys are somewhat overstating the benifit they get from Overdraw i.e. for an early Z architectrure to achieve a 2x effective fill benifit from overdraw it would required an overdraw of approx 4x in any normal title. I think that's pushing reality somewhat with 2.5x being a more resonable value, so EarlyZ shoudl treally only be assuming ~1.25x overdraw benifit imo. But, I guess it's all just marketing shmarketing anywhoo....
John.
Isn't that with vertexes appearing in completely uniformly/randomly distributed Z order? Seems like a pretty modest high level sort would change that a bit.