G80's "Multifunction Interpolator"; Detailed PDF.

Arun · Jul 8, 2006

Presented on July 29, 2005 at the 17th IEEE Symposium on Computer Arithmetic
http://arith17.polito.it/foils/11_2.pdf
http://arith17.polito.it/final/paper-164.pdf

If that isn't interesting, I don't know what is.
And it's written by Stuart Oberman, who was architect for the FPU of AMD's Athlon microprocessor. He's currently Principal Engineer at NVIDIA.

Uttar

Demirug · Jul 8, 2006

Very nice find.

It describes a way to reuse parts of the interpolation unit for special functions (sqrt, sin, cos, â€¦). This allows to build a shader with less transistors.

Geo · Jul 8, 2006

Demirug said:
This allows to build a shader with less transistors.

Gosh, wouldn't that be handy when one of your early criticisms of USC was the transistor cost per unit. . .

Demirug · Jul 8, 2006

geo said:
Gosh, wouldn't that be handy when one of your early criticisms of USC was the transistor cost per unit. . .

But you can use the saved area to add more non US, too.

KimB · Jul 8, 2006

Some very cool stuff. It looks like nVidia is working very hard to decrease the die area of their designs. I wouldn't be surprised if the NV40 was actually the first architecture where they really started trying hard to decrease die area (as a quick example, the NV30 and the NV43 have about the same die area, but the NV43 has four times the FP processing power and the same texture filtering power). This seems to have also contributed to the G7x's favorable showing when compared against the R5xx.

One has to wonder if ATI's putting as much focus in saving die area for their next-gen architecture.

Geo · Jul 8, 2006

Chalnoth said:
One has to wonder if ATI's putting as much focus in saving die area for their next-gen architecture.

Well, it seems clear that ATI's current designs are less dense (transistors per mm2) than NV's to start with. It's less clear why this is so, tho theories get tossed out from time to time. I don't recall that ATI has ever addressed the point directly, tho possibly you could make some inferences from comments on why the ring bus is around the outer part of the die.

hoom · Jul 8, 2006

the NV30 and the NV43 have about the same die area, but the NV43 has four times the FP processing power and the same texture filtering power

Surely that came from chopping out the dedicated int & fp16 units & replacing it with fp32 units?
Not that hard to quadruple the fp32 power of NV30...

I wonder if we might now get that B3D review of gf7900 focusing on how they managed to do a big clock bump + reduce transistors/die area so substantially?

KimB · Jul 9, 2006

arrrse said:
Surely that came from chopping out the dedicated int & fp16 units & replacing it with fp32 units?
Not that hard to quadruple the fp32 power of NV30...

There were never any dedicated FP16 units in any of the NV3x GPU's. But you're probably right: it's better to compare the NV35 to the NV40. And still the NV43 doubles the FP processing power of the NV35 (theoretical: the NV43 is able to get much closer to its theoretical limit).

I wonder if we might now get that B3D review of gf7900 focusing on how they managed to do a big clock bump + reduce transistors/die area so substantially?

Well, er, it was the move to 90nm.

PeterAce · Jul 9, 2006

I'm sure it was stated at the time of NV40's launch that they brought in cell designs for the ALU/TEX units to aid more optimal layout.

trinibwoy · Jul 9, 2006

Can someone please explain what attribute interpolation involves and why it's necessary? Pretty please

Razor1 · Jul 9, 2006

interpolation can be used for aa and actually other things as well, pretty much you draw 2 pixels anywhere and then pixels are filled in between the two points using an algorithm, attributes can be color, transparancy etc. That is quite interesting techinque but can't see how it won't be very very expensive

KimB · Jul 9, 2006

trinibwoy said:
Can someone please explain what attribute interpolation involves and why it's necessary? Pretty please

Attribute interpolation is referring to the interpolation of per-vertex attributes to calculate the per-pixel values (such as color, depth, texture coordinates, normal vectors). See slide 9 of the presentation.

hoom · Jul 9, 2006

Well, er, it was the move to 90nm.

But they dropped transistor count by 24mil transistors at the same time with same features and thats no mean feat.
Nobody has yet explained it properly to my knowledge.

Razor1 · Jul 9, 2006

The die size shrunk, so they needed less transistors to get the data from point a to point b

trinibwoy · Jul 9, 2006

Razor1 said:
The die size shrunk, so they needed less transistors to get the data from point a to point b

I hope you're kidding

The most popular notion seems to be that they removed a few pipeline stages since the move to 90nm allowed them to crank clocks high enough to compensate.

trinibwoy · Jul 9, 2006

Chalnoth said:
Attribute interpolation is referring to the interpolation of per-vertex attributes to calculate the per-pixel values (such as color, depth, texture coordinates, normal vectors). See slide 9 of the presentation.

Ah, thanks. Any idea what percentage of transistor allocation in a shader goes to the higher-order function and/or interpolation logic. How much savings will they really see here?

KimB · Jul 9, 2006

trinibwoy said:
I hope you're kidding The most popular notion seems to be that they removed a few pipeline stages since the move to 90nm allowed them to crank clocks high enough to compensate.

Removing pipeline stages is a possibility, but moving data around takes a significant portion of the die area these days, from what I've been reading.

Ailuros · Jul 9, 2006

geo said:
Well, it seems clear that ATI's current designs are less dense (transistors per mm2) than NV's to start with. It's less clear why this is so, tho theories get tossed out from time to time. I don't recall that ATI has ever addressed the point directly, tho possibly you could make some inferences from comments on why the ring bus is around the outer part of the die.

IMO the higher angle-dependency with AF since NV4x and some extra hardware for efficient PS dynamic branching already save a large chunk of transistors in NV4x/G7x amongst others. I don't think you need any "theories" to acknowledge that.

KimB · Jul 9, 2006

Ailuros said:
IMO the higher angle-dependency with AF since NV4x and some extra hardware for efficient PS dynamic branching already save a large chunk of transistors in NV4x/G7x amongst others. I don't think you need any "theories" to acknowledge that.

Oh, I'd be willing to bet that nVidia's texture units take up more total die area than ATI's. Remember that nVidia has 24 of 'em.

The dynamic branching optimizations are likely the most costly. It is for this reason that I suspect that nVidia's next part isn't going to be as good as the R5xx at dynamic branching. The memory controller is also likely a big part.

But still, ATI's R580 is sitting at roughly twice the die area for a part with similar performance and featureset when compared to nVidia's G71. I don't buy that you can account for all of this just by the G71's relatively fewer features.

DemoCoder · Jul 9, 2006

Very cool trick. Before I read it, I assumed they were using lookup tables plus lerps (reusing HW used for evaling the plane equation) but the technique used is alot more refined and tricky.

G80's "Multifunction Interpolator"; Detailed PDF.

Arun

Unknown.

Demirug

Geo

Mostly Harmless

Demirug

KimB

Geo

Mostly Harmless

hoom

KimB

PeterAce

trinibwoy

Meh

Razor1

KimB

hoom

Razor1

trinibwoy

Meh

trinibwoy

Meh

KimB

Ailuros

Epsilon plus three

KimB

DemoCoder