The Official NVIDIA G80 Architecture Thread

Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations. :cool:

Don't think about when R520 came out. Think about when R520 was supposed to come out (answer: well in advance of C1/Xenos).

Maybe I'm misreading the data, but it seems to me there is some pretty big jumps in what we all thought were "cpu limited" games that now appear to be much more likely to have been vertex limited.
 
hmm could be its very hard to tell cause some of the vertex caclulations are offloaded to the cpu, that won't happen much anymore I guess with unified shaders, or it will happen less. In essense we were correct though lol, they were cpu limited :D
 
Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations. :cool:

Don't want to "rain" on your predictions, geo, but i think he's probably contemplating another day indoors.
Rainy and cold, lately. ;)

But, honestly... holidays in mid-November ? :D
 
Last edited by a moderator:
Last edited by a moderator:
I hadn't seen that, but I don't see it as contradictory either. How do you know that decision wouldn't have come out the other way if they'd known from the beginning they were looking at Oct rather than April?

Indeed.

I'm sure they had their reasons, I just wish I knew what they were!

Gleaning this from implied comments in past IHV interviews (from B3D and other 3D tech websites)doesn't go very far :).
 
I have a couple questions/comments about the G80 numbers on your 3D Tables (when you get the details by clicking G80).

First, the 575M triangles per second number. Is that triangles using 3 vertices each in a list form? nVidia's site doesn't give a vertex rate number for G80, but the G7x stuff is all in the 820-1100M vertices per second range. Which is the triangle rate when using indexed strips. So is the number listed here vertices? If so, then why is it so much lower than G7x?

Second, the texture fill rate number given is 18400M/s while nVidias site (as well as many other sites) has 36800M/s listed. My own testing shows that it's 18400M/s (the numbers I actually got where between 18114M/s and 18192M/s). This is because there are only 32 Texture addressing units. nVidia's number comes from the 64 Texture filtering units. I think it's confusing the matter somewhat... I think the best rating number here is the fastest rate at which the pixel/fragment shader can recieve texture samples (which is 18400M/s like this site lists).

Lastly, I'm really looking forward to the upcoming second and third parts to the G80 architecture article.
 
I have a couple questions/comments about the G80 numbers on your 3D Tables (when you get the details by clicking G80).

First, the 575M triangles per second number. Is that triangles using 3 vertices each in a list form? nVidia's site doesn't give a vertex rate number for G80, but the G7x stuff is all in the 820-1100M vertices per second range. Which is the triangle rate when using indexed strips. So is the number listed here vertices? If so, then why is it so much lower than G7x?

G80 setup engine is limited at one triangle per cycle (hence the 575Mtriangles/seconds), G7x setup engine is limited at one triangle every two cycles. NVIDIA's site is giving the transform rate.
 
Rys is deeply contemplating parts II and III. . . .on a beach in Portugal. We all look forward to the fruit of his contemplations. :cool:

For the next couple of days, the weather is gona be very bad (Windy) here ...

Where is he, Algarve ? I wanna steal his g80 ... :)
 
Setup rate is overrated and most of the times it's a useless number. Let test how performance degrades when the VS stage outputs many interpolants. Interpolants must be retrieved fromt the post transform cache..you really don't care much about setup rate if you need a lot of cycles just to retrieve all the data that belong to a single vertex.
 
G80 setup engine is limited at one triangle per cycle (hence the 575Mtriangles/seconds), G7x setup engine is limited at one triangle every two cycles. NVIDIA's site is giving the transform rate.

Oh right, I should have realized the number was the triangle setup rate. I wonder what the vertex transform number for G80 is then?
 
Oh right, I should have realized the number was the triangle setup rate. I wonder what the vertex transform number for G80 is then?

10.8 GigaTransforms (if thats the right term) is what I work it out as. Compared to 1.3 GigaTransforms fo the 7900GTX.
 
I keep thinking all this "scalar" information could be a hoax, or to put it more mildly a little marketing spin on a grain of technical truth.

What if a "scalar ALU" is really a 4-wide SIMD ALU that performs an instruction not on four color channels of a single fragment at once but on the red color channels of four fragments? You just rotate the SIMD around, have some trouble with synching up the texture ops, but no longer need to worry about co-issue splitting for scalars, 2Ds and 3Ds.

Code:
R G B A
R G B A
R G B A
R G B A
The four rows should represent four fragments. "Old" SIMD style would iterate rows. Marketing-pseudo-scalar would iterate columns.
 
zeckensack - I think what you are saying is accurate, except that in those terms, a G8x cluster contains 4 4-way SIMD units (instead of 16 scalar units).

Either way it's "spin" - I mean in g8x the 16 "scalar" units are all running the same instruction on a bunch a fragments or vertices (a batch). That in my book is the definition of 16 way SIMD (where D = vertices/fragments). This is in contrast to (from what I understand) Xenos which uses 16way SIMD arrays (where D = vertices/fragments) of 4way SIMD functional units (where D = vec4 channels).
 
Yes G80's main ALUs in each shader partition definitely are 16-way SIMD units.
Scalar refers only to the way they work on elements (pixels, vertices etc) or if you prefer to the instructions flow.

Previous quad engines ALUs were 16-way MIMD units.
 
Thanks, it's quite nice to have other people agree with me for a change :rolleyes: :)
Doesn't that dissolve 3dilletante's concerns about extra overhead for more instructions then?
Because there really aren't more instructions for the same amount of work. If you use the same SIMDness and rotate the data model, a dispatched instruction might apply to different data, but it's still the same amount of data.
 
Last edited by a moderator:
What if a "scalar ALU" is really a 4-wide SIMD ALU that performs an instruction not on four color channels of a single fragment at once but on the red color channels of four fragments?
SIMD and scalar are orthogonal. You can have one without the other, or you can have both (or neither).

There's a bunch of reasons why you want to still use quads, and they have little to do with instruction scheduling or vectorization!
 
SIMD and scalar are orthogonal. You can have one without the other, or you can have both (or neither).

There's a bunch of reasons why you want to still use quads, and they have little to do with instruction scheduling or vectorization!

And now that we've determined what they don't have to do with. . .? ;)
 
zeckensack - I don't think so. The way you've organized things, I can imagine some kind of SIMD instruction is being issued to the scalar units and occupying them for multiple cycles, or a sequence of scalar instructions that produce the same effect or even support for both types of instruction. But IMO it doesn't make sense not to support a distinct scalar instruction per cycle with your proposed ALU organization (and with what I think g8x ALU organization is), so 3dilettante's points are valid.

To continue the example - for g80 1 instruction drives 16 scalar ALUs for 1 cycle. In Xenos, IIRC 1 instruction drives 64 ALUs (16*4way SIMD) for 4 cycles (batch size is 64). You can see how Xenos can really take it's time deciding which instruction to issue next compared to g8x.
 
Back
Top