The Official NVIDIA G80 Architecture Thread

Geo · Nov 22, 2006

Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations.

Don't think about when R520 came out. Think about when R520 was supposed to come out (answer: well in advance of C1/Xenos).

Maybe I'm misreading the data, but it seems to me there is some pretty big jumps in what we all thought were "cpu limited" games that now appear to be much more likely to have been vertex limited.

Razor1 · Nov 22, 2006

hmm could be its very hard to tell cause some of the vertex caclulations are offloaded to the cpu, that won't happen much anymore I guess with unified shaders, or it will happen less. In essense we were correct though lol, they were cpu limited

INKster · Nov 22, 2006

geo said:
Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations.

Don't want to "rain" on your predictions, geo, but i think he's probably contemplating another day indoors.
Rainy and cold, lately.

But, honestly... holidays in mid-November ?

PeterAce · Nov 22, 2006

geo said:
Don't think about when R520 came out. Think about when R520 was supposed to come out (answer: well in advance of C1/Xenos).

Dave Baumann said:
One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not.

http://www.beyond3d.com/forum/showpost.php?p=746536&postcount=16

Geo, it's hard not to think about it as we were given this little nugget of info back in April!

Twinkie · Nov 22, 2006

geo said:
Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations.

No wonder..

Nice find peter. So they were thinking about it after all.

Geo · Nov 22, 2006

PeterAce said:
http://www.beyond3d.com/forum/showpost.php?p=746536&postcount=16

Geo, it's hard not to think about it as we were given this little nugget of info back in April!

I hadn't seen that, but I don't see it as contradictory either. How do you know that decision wouldn't have come out the other way if they'd known from the beginning they were looking at Oct rather than April?

PeterAce · Nov 22, 2006

geo said:
I hadn't seen that, but I don't see it as contradictory either. How do you know that decision wouldn't have come out the other way if they'd known from the beginning they were looking at Oct rather than April?

Indeed.

I'm sure they had their reasons, I just wish I knew what they were!

Gleaning this from implied comments in past IHV interviews (from B3D and other 3D tech websites)doesn't go very far

.

reltham · Nov 22, 2006

I have a couple questions/comments about the G80 numbers on your 3D Tables (when you get the details by clicking G80).

First, the 575M triangles per second number. Is that triangles using 3 vertices each in a list form? nVidia's site doesn't give a vertex rate number for G80, but the G7x stuff is all in the 820-1100M vertices per second range. Which is the triangle rate when using indexed strips. So is the number listed here vertices? If so, then why is it so much lower than G7x?

Second, the texture fill rate number given is 18400M/s while nVidias site (as well as many other sites) has 36800M/s listed. My own testing shows that it's 18400M/s (the numbers I actually got where between 18114M/s and 18192M/s). This is because there are only 32 Texture addressing units. nVidia's number comes from the 64 Texture filtering units. I think it's confusing the matter somewhat... I think the best rating number here is the fastest rate at which the pixel/fragment shader can recieve texture samples (which is 18400M/s like this site lists).

Lastly, I'm really looking forward to the upcoming second and third parts to the G80 architecture article.

Zeross · Nov 22, 2006

reltham said:
I have a couple questions/comments about the G80 numbers on your 3D Tables (when you get the details by clicking G80).

First, the 575M triangles per second number. Is that triangles using 3 vertices each in a list form? nVidia's site doesn't give a vertex rate number for G80, but the G7x stuff is all in the 820-1100M vertices per second range. Which is the triangle rate when using indexed strips. So is the number listed here vertices? If so, then why is it so much lower than G7x?

G80 setup engine is limited at one triangle per cycle (hence the 575Mtriangles/seconds), G7x setup engine is limited at one triangle every two cycles. NVIDIA's site is giving the transform rate.

ClyssaN · Nov 22, 2006

geo said:
Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations.

For the next couple of days, the weather is gona be very bad (Windy) here ...

Where is he, Algarve ? I wanna steal his g80 ...

nAo · Nov 22, 2006

Setup rate is overrated and most of the times it's a useless number. Let test how performance degrades when the VS stage outputs many interpolants. Interpolants must be retrieved fromt the post transform cache..you really don't care much about setup rate if you need a lot of cycles just to retrieve all the data that belong to a single vertex.

reltham · Nov 22, 2006

Zeross said:
G80 setup engine is limited at one triangle per cycle (hence the 575Mtriangles/seconds), G7x setup engine is limited at one triangle every two cycles. NVIDIA's site is giving the transform rate.

Oh right, I should have realized the number was the triangle setup rate. I wonder what the vertex transform number for G80 is then?

pjbliverpool · Nov 22, 2006

reltham said:
Oh right, I should have realized the number was the triangle setup rate. I wonder what the vertex transform number for G80 is then?

10.8 GigaTransforms (if thats the right term) is what I work it out as. Compared to 1.3 GigaTransforms fo the 7900GTX.

Rolf N · Nov 23, 2006

I keep thinking all this "scalar" information could be a hoax, or to put it more mildly a little marketing spin on a grain of technical truth.

What if a "scalar ALU" is really a 4-wide SIMD ALU that performs an instruction not on four color channels of a single fragment at once but on the red color channels of four fragments? You just rotate the SIMD around, have some trouble with synching up the texture ops, but no longer need to worry about co-issue splitting for scalars, 2Ds and 3Ds.

Code:

R G B A
R G B A
R G B A
R G B A

The four rows should represent four fragments. "Old" SIMD style would iterate rows. Marketing-pseudo-scalar would iterate columns.

psurge · Nov 23, 2006

zeckensack - I think what you are saying is accurate, except that in those terms, a G8x cluster contains 4 4-way SIMD units (instead of 16 scalar units).

Either way it's "spin" - I mean in g8x the 16 "scalar" units are all running the same instruction on a bunch a fragments or vertices (a batch). That in my book is the definition of 16 way SIMD (where D = vertices/fragments). This is in contrast to (from what I understand) Xenos which uses 16way SIMD arrays (where D = vertices/fragments) of 4way SIMD functional units (where D = vec4 channels).

Tridam · Nov 23, 2006

Yes G80's main ALUs in each shader partition definitely are 16-way SIMD units.
Scalar refers only to the way they work on elements (pixels, vertices etc) or if you prefer to the instructions flow.

Previous quad engines ALUs were 16-way MIMD units.

Rolf N · Nov 23, 2006

Thanks, it's quite nice to have other people agree with me for a change

Doesn't that dissolve 3dilletante's concerns about extra overhead for more instructions then?
Because there really aren't more instructions for the same amount of work. If you use the same SIMDness and rotate the data model, a dispatched instruction might apply to different data, but it's still the same amount of data.

Bob · Nov 23, 2006

What if a "scalar ALU" is really a 4-wide SIMD ALU that performs an instruction not on four color channels of a single fragment at once but on the red color channels of four fragments?

SIMD and scalar are orthogonal. You can have one without the other, or you can have both (or neither).

There's a bunch of reasons why you want to still use quads, and they have little to do with instruction scheduling or vectorization!

Geo · Nov 23, 2006

Bob said:
SIMD and scalar are orthogonal. You can have one without the other, or you can have both (or neither).

There's a bunch of reasons why you want to still use quads, and they have little to do with instruction scheduling or vectorization!

And now that we've determined what they don't have to do with. . .?

psurge · Nov 23, 2006

zeckensack - I don't think so. The way you've organized things, I can imagine some kind of SIMD instruction is being issued to the scalar units and occupying them for multiple cycles, or a sequence of scalar instructions that produce the same effect or even support for both types of instruction. But IMO it doesn't make sense not to support a distinct scalar instruction per cycle with your proposed ALU organization (and with what I think g8x ALU organization is), so 3dilettante's points are valid.

To continue the example - for g80 1 instruction drives 16 scalar ALUs for 1 cycle. In Xenos, IIRC 1 instruction drives 64 ALUs (16*4way SIMD) for 4 cycles (batch size is 64). You can see how Xenos can really take it's time deciding which instruction to issue next compared to g8x.

The Official NVIDIA G80 Architecture Thread

Geo

Mostly Harmless

Razor1

INKster

PeterAce

Twinkie

Geo

Mostly Harmless

PeterAce

reltham

Zeross

ClyssaN

nAo

Nutella Nutellae

reltham

pjbliverpool

B3D Scallywag

Rolf N

Recurring Membmare

psurge

Tridam

Rolf N

Recurring Membmare

Bob

Geo

Mostly Harmless

psurge

Similar threads