NV got DS2 contract according to BSN

rpg.314 · Mar 31, 2010

Exophase said:
I didn't mean what you thought I did, I was saying that if Tegra 2 doesn't have NEON then Broadway (Wii's CPU) has SIMD as an advantage over it.

As Arun has said many times, and I agree with him on this, ~80% of die area devoted to NEON is a waste.

Exophase · Mar 31, 2010

rpg.314 said:
As Arun has said many times, and I agree with him on this, ~80% of die area devoted to NEON is a waste.

However, as someone who is actually going to be putting NEON to good use (integer NEON at that) I find this to be an unfortunate compatibility issue.

80% of die area for NEON? 80% of what, the space of a single Cortex-A9 (L2 cache obviously not contributing)? Do you actually mean 80% of a Cortex-A9 with NEON is the NEON part, or do you mean that it's 80% larger, because the latter sounds much more plausible than the former.

Or maybe you mean that the entire Tegra 2 die would be 80% composed of NEON if they included it ;P

sfried · May 17, 2010

We have motherboard:

image above edited by GAF member defferoo. Newstory at WirelessGoodness from an FCC filling by Mitsumi.

Now somebody try to spot the Tegra chip in there...

tangey · May 18, 2010

DO you need FCC approval for what is clearly a development platform ??

rpg.314 · May 18, 2010

Exophase said:
However, as someone who is actually going to be putting NEON to good use (integer NEON at that) I find this to be an unfortunate compatibility issue.

80% of die area for NEON? 80% of what, the space of a single Cortex-A9 (L2 cache obviously not contributing)? Do you actually mean 80% of a Cortex-A9 with NEON is the NEON part, or do you mean that it's 80% larger, because the latter sounds much more plausible than the former.

Or maybe you mean that the entire Tegra 2 die would be 80% composed of NEON if they included it ;P

I meant ~80% of the instructions are meant for video decode, which is a pointless extension as there is ff decode hw in that space.

rpg.314 · May 18, 2010

tangey said:
DO you need FCC approval for what is clearly a development platform ??

It's going to be sold, so afaik, yes.

Exophase · May 18, 2010

rpg.314 said:
I meant ~80% of the instructions are meant for video decode, which is a pointless extension as there is ff decode hw in that space.

I'm not sure I follow you. I've been through the NEON instruction set about a million times and there are very few specialized instructions, and none that I can see that would only be useful for video code. Or do you mean that integer SIMD is only useful for video decode? Because I want to use it for other things.

From the way you describe it you'd think it was full of iDCT and color space conversion or something.

Of course, 80% of instructions and 80% of die area are pretty different things.. and I doubt the integer part is 80% of the die either.

Yes, integer NEON might not be that widely used but sometimes that's more of a symptom of what people are willing to use than what possible uses there are.

rpg.314 · May 18, 2010

Exophase said:
Because I want to use it for other things.

Like what?

Exophase · May 18, 2010

rpg.314 said:
Like what?

I won't go into detail, but rendering in emulation.

You may be thinking that 3D acceleration is suitable for this, but unfortunately OpenGL ES 2 is not well suited to this task, both due to technical incompatibilities and trends towards high overhead for certain tasks (like reading back the framebuffer into CPU visible userspace memory). OpenCL et al may prove differently but I'm not sure how useful it'll be in the most immediate next generation of handhelds.

It's not a game breaker or anything, I'd just prefer to have NEON.

rpg.314 · May 18, 2010

Exophase said:
I won't go into detail, but rendering in emulation.

Sounds pretty niche to me. IOW, not worth the silicon and power imho.

Exophase · May 18, 2010

rpg.314 said:
Sounds pretty niche to me. IOW, not worth the silicon and power imho.

Where was I arguing that it wasn't? I never said that I think this design decision should be made based on MY wants, nor that I think going w/o NEON is unreasonable for Tegra. I do think there are other uses for integer SIMD as well, but current software climate is likely not exposing them, especially on ARM. Not including NEON is a tradeoff that they're fairly justified in making.

I just said I was personally unhappy about it the compatibility issues it'll create for me, and I said that Broadway's SIMD is an advantage over Tegra's lack thereof (here we're talking about float SIMD as well). It doesn't really matter if the advantages cater to the niche or not, an advantage is an advantage.

You said 80% of NEON was only useful for video decoding. I said it's useful to me for something else.

Regarding your die area comment earlier, that was just my misunderstanding. You said "~80% of die area devoted to NEON is a waste." Grammatically speaking you should have said "~80% of the die area." As it was I took it to mean that 80% of total die space some larger unit would have been dedicated to NEON, and therefore 80% of that die would be wasted.

rpg.314 · May 18, 2010

Exophase said:
Where was I arguing that it wasn't? I never said that I think this design decision should be made based on MY wants, nor that I think going w/o NEON is unreasonable for Tegra. I do think there are other uses for integer SIMD as well, but current software climate is likely not exposing them, especially on ARM. Not including NEON is a tradeoff that they're fairly justified in making.

I just said I was personally unhappy about it the compatibility issues it'll create for me, and I said that Broadway's SIMD is an advantage over Tegra's lack thereof (here we're talking about float SIMD as well). It doesn't really matter if the advantages cater to the niche or not, an advantage is an advantage.

You said 80% of NEON was only useful for video decoding. I said it's useful to me for something else.

Regarding your die area comment earlier, that was just my misunderstanding. You said "~80% of die area devoted to NEON is a waste." Grammatically speaking you should have said "~80% of the die area." As it was I took it to mean that 80% of total die space some larger unit would have been dedicated to NEON, and therefore 80% of that die would be wasted.

Well, your niche was so small that it didn't even occur to me.

MfA · May 18, 2010

Floating point probably dominates ... even ATI can afford to put some video instructions in their GPU.

The integer stuff is nice to have and practically free if you don't try to do the higher bit multiplies at single cycle throughput.

Exophase · May 18, 2010

rpg.314 said:
Well, your niche was so small that it didn't even occur to me.

My niche is not THAT small. Emulation on handheld devices is actually pretty popular, with at least hundreds of thousands of people doing it (real number is probably in the millions)

MfA said:
Floating point probably dominates ... even ATI can afford to put some video instructions in their GPU.

The integer stuff is nice to have and practically free if you don't try to do the higher bit multiplies at single cycle throughput.

NEON on Cortex-A8 (probably A9 too) has separate FPMUL, FPADD, integer MAC, and integer ALU pipelines.. therefore I'm not sure how much can be shared between the FP multiplier and the integer one, if anything. It can output 2x single-precision FMUL per cycle, or 1x 32x32->64 integer per cycle. You would think that'd be a similar multiplier load, so it'd make sense to share it; the FPMUL has two additional pipeline stages, but I figure those are for float stuff and not multiply stuff.

Blazkowicz · May 18, 2010

What about emulating the DS? the graphics are said to be done in fixed point.
nintendo can also basically port most of Wii's virtual console.

Laurent06 · May 19, 2010

Exophase said:
My niche is not THAT small. Emulation on handheld devices is actually pretty popular, with at least hundreds of thousands of people doing it (real number is probably in the millions)

There's a related domain where AdvSIMD (the official ISA name for NEON, NEON being only the name of the unit implementing it) is very useful: some 2d graphics routines, and this will affect all of the users of some libs such as pixman.

NEON on Cortex-A8 (probably A9 too) has separate FPMUL, FPADD, integer MAC, and integer ALU pipelines.. therefore I'm not sure how much can be shared between the FP multiplier and the integer one, if anything. It can output 2x single-precision FMUL per cycle, or 1x 32x32->64 integer per cycle. You would think that'd be a similar multiplier load, so it'd make sense to share it; the FPMUL has two additional pipeline stages, but I figure those are for float stuff and not multiply stuff.

You could also share some parts of VFP and NEON units. AdvSIMD implementations certainly don't have to be huge if you're ready to trade some performance :smile:

JohnH · May 19, 2010

The early Z guys are somewhat overstating the benifit they get from Overdraw i.e. for an early Z architectrure to achieve a 2x effective fill benifit from overdraw it would required an overdraw of approx 4x in any normal title. I think that's pushing reality somewhat with 2.5x being a more resonable value, so EarlyZ shoudl treally only be assuming ~1.25x overdraw benifit imo. But, I guess it's all just marketing shmarketing anywhoo....

John.

Exophase · May 20, 2010

JohnH said:
The early Z guys are somewhat overstating the benifit they get from Overdraw i.e. for an early Z architectrure to achieve a 2x effective fill benifit from overdraw it would required an overdraw of approx 4x in any normal title. I think that's pushing reality somewhat with 2.5x being a more resonable value, so EarlyZ shoudl treally only be assuming ~1.25x overdraw benifit imo. But, I guess it's all just marketing shmarketing anywhoo....

John.

Isn't that with vertexes appearing in completely uniformly/randomly distributed Z order? Seems like a pretty modest high level sort would change that a bit.

sfried · May 20, 2010

There a possibility 3DS could be using custom Mali chip?

JohnH · May 20, 2010

Exophase said:
Isn't that with vertexes appearing in completely uniformly/randomly distributed Z order? Seems like a pretty modest high level sort would change that a bit.

At the cost of recking your state batching which remains one of the most critical things for pretty much all architectrures out there.

NV got DS2 contract according to BSN

Similar threads