The Official NVIDIA G80 Architecture Thread

Arun

Unknown.
Moderator
Legend
It's out there. It's big, it's green, it's mean.
First things first, you're supposed to read our review, and get a good grasp of the whole paradigm. Hey, it's the biggest architectural change for NVIDIA since the NV40... I mean the NV5... I mean, gah, let's just say since the original Riva 128 damnit! And no, that doesn't mean it supports quadratic surfaces natively.

We've got an architecture piece on G80 ready today, plus Image Quality and performance pieces that'll put the whole thing in perspective in a few days maximum. And they're written by our friendly boss and overlord, aka Ryszard, so you'll really have no excuse not to read them all from start to finish once they're out. I mean, if they were poluted by my stupidly annoying prose, I'd see your point, but now, you've really got no excuse! ;) Comments aimed at the review specifically may also be put in the related thread for it in the articles forum, and in fact you're encouraged to try making that one very active too!

You may discuss both the architecture and its performance here, as well as our specific review. Comments aimed specifically at other reviews, of which there'll be plenty we're sure, should be put in the appropriate thread in the other forum. Oh, and as a side comment, we plan to publish our indepth analysis of a few specific aspects of the chip (including CSAA and some other things) in the coming days, so stay tuned.

And now that you've read it all, here's an extremely concise but highly complete (in terms of architecture) summary, in form of a SmartDraw diagram. It might be cluttered, but on the plus side of things, I think I/we can honestly claim it's more complete, precise and interesting than some entire multi-page pieces out there, if you're technically oriented! :)

The NVIDIA G80 Unofficial Technical Diagram
('The Cheat Sheet')

Click Here For A Bigger Version

You may link to this diagram and other parts of our analysis on other forums/sites
as much as you want to, as long as proper credit/linkage is given to us, that is!​

Our review, images and conclusions are the results of extensive analysis by me and Rys through synthetics and our own personal tests/shaders, as well as heavy discussions among ourselves and site staff overall. No guarantee of perfect accuracy, as there is no such thing, but it should be as near as can be for now. There still will be things we and many others will discover in the coming days, weeks and months, so the fun definitely doesn't end here, considering how new of an architecture it is, we think!

And now, you may stop listening to my needlessly long introduction post, and...
COMMENT, DISCUSS, SPECULATE! NOW! :)
 
I just want to jump in to say that the diagram is one of my favorite things ever. Grats and thanks to Rys and Uttar for busting tail to get this done.

Now, when's CUDA showing up again? :D
 
The reviewer's guide, already part-posted by CJ, indicates a batch size for G7x as 880 pixels - not the 1024 stated in the article. So, which is it?...

Jawed
 
The batch size on G7x is actually variable. Prior to that doc, the only number I ever saw for the maximum was 1024, so yeah, I was a bit surprised when I saw 880 there. It's a rough number anyway, as it can (and will) go below that depending on several factors (mostly the number of registers). So I don't think it really matters, and 1024 helps for comprehension's sake.

Personally, I thin the batch size is 1024 and the maximum number of fragments in flight in the pipeline is 880 for G7x. I wouldn't put my hand on fire based on it though, but I'm pretty sure tests in the pasts have been done that confirmed it was 1024 - so I think that doc is wrong, and for the time being we'll keep 1024 in the review.

What an odd time to speak of G70 though, hehe :) But yeah, that's good proof you'll never fully know an architecture, even after it's "outdated". Ah well, we strive to know it as much as we can, anyway! ;)


Uttar
 
CUDA is something I'm highly interested in. Hopefully B3D does a full article about it!
Baron will kill us if we don't ;) Good point about the Linux port Chalnoth, we'll definitely have to add that to our question list. Hopefully they'll be able to comment on it... *cough*

Uttar
 
What about the explanation to the Demirug's highway analogy/diagram? If it has already been dislclosed I have missed it.
 
Oh, yeah, me too. But there has to be a linux port, or I won't see any importance in it.
One exists. CUDA seems to be basically a combination of CTM and Brook. The higher-level API is exposed using a C library (I think), while the lower-level communication with the chip is handled by what NV calls NVasc (which I assume is their analogue to CTM). I am taking this two-tier route to mean that, like CTM, the runtime is driver-independent. Apparently, though it is possible to run both GPGPU and regular GPU applications simultaneously and the chip will schedule the threads appropriately.

We don't have that much information on CUDA yet, but it will be available to all on the NV dev site. No idea when it will be available (they said "soon").

EDIT: I lied. Registered developers only and apparently available now... (it's at the end)
Feh.
 
Now, to try to get a discussion started on something else than CUDA... :)
Some interesting old David Kirk unification quotes: http://www.beyond3d.com/forum/showthread.php?t=30014
D. Kirk: Our DirectX 10 GPU may be Unified-Shader, or not. Everyone thinks I said "we won't go there (Unified-Shader)." But what I said is just you can't know it until (our GPU) debuts.
D. Kirk: In the logical diagram of D3D 10, Vertex Shader, Geometry Shader and Pixel Shader are placed side by side. What happens if they are placed in the same box? Each Shader is a different part. If they get unified they become wasteful.

Besides, it requires more I/O (wires) because all connections with memory concentrate on the box. Registers and constants are put in a single box too. It's because you have to keep all vertex states, pixel states and geometry states together while doing load balancing. A bigger register array requires more ports.
I find it interesting how, in retrospective, you can notice he actually managed to bypass the question (in that specific interview) by criticizing aspects of Xenos that they were going to handle differently in G80... ;)

nelg: Nope, sadly. I'd note that even G7x and R58x have "schedulers" though. IMO, the scheduling ought to be quite expensive compared to G7x's rudimentary (but efficient for its purposes) so-called "scheduling", but if you compare the way it works to R580's PS schedulers, I'd speculate that the local schedulers probably aren't much more expensive. You still got the extra costs of the global scheduler/dispatcher/whatever though, and I can't really imagine how expensive that would be. I'd guess "A bit, but not dramatically so." but I really don't know - anyone has any opinion on this?


Uttar
 
nelg: Nope, sadly. I'd note that even G7x and R58x have "schedulers" though. IMO, the scheduling ought to be quite expensive compared to G7x's rudimentary (but efficient for its purposes) so-called "scheduling", but if you compare the way it works to R580's PS schedulers, I'd speculate that the local schedulers probably aren't much more expensive. You still got the extra costs of the global scheduler/dispatcher/whatever though, and I can't really imagine how expensive that would be. I'd guess "A bit, but not dramatically so." but I really don't know - anyone has any opinion on this?


Uttar

Too many variables. We don’t know the maximum numbers of threads in flight. Additional we don’t know exactly how many clocks are between two ALU/FPU injections. Another question is the number of different priority stages.
 
Too many variables. We don’t know the maximum numbers of threads in flight.
gimme some hw and I'll compute it :)

Additional we don’t know exactly how many clocks are between two ALU/FPU injections.
please define what you mean by ALU injection :) if it's what I think it is the answer is 42..emh..2 :)
Another question is the number of different priority stages.
what? :)
 
Anyone have a guess on why G80 loves Oblivion so? It's stupendously super fast!
 
Back
Top