The Official NVIDIA G80 Architecture Thread

nAo · Nov 16, 2006

Basicly the same thing every good PS2 VUs or CELL SPUs coder does..

dnavas · Nov 16, 2006

Jawed said:
Considering that G71 has 220 clock cycles of latency hiding, at 650MHz, you'd expect G80 to require in the region of 400+ clock cycles to hide latency at 1350MHz.

Jawed, I think this may be the root problem/difference. As the ALUs clock rises wrt tex units, the relative cost of doing a tex lookup rises, but no amount of latency hiding can mask throughput issues.

There is a limit to the throughput of the tex units, and a maximum speed for the ALU unit. If you want to keep both operating at peak performance, there's a ratio between tex and alu ops that is implied. As the tex units in each cluster can apparently produces 4 Vec4 per its clock, while the alu ops can perform 16 scalar ops per its clock, the ratio of ops looks to be roughly the same as the clock ratio ~2-1/3. Of course, the ALUs are likely busy about a quarter of a clock cycle doing perspective correction for addressing, so a roughly 2:1 ratio. So, assuming full utilization, with each batch running 2 ALU ops per tex lookup, hiding ~200+ cycles looks to be 128 scalar ops, or 32 Vec4-sized ops.

Rys · Nov 16, 2006

Lord CorDox said:
Greetings

Yes, hello and welcome to B3D! The results on R580 are broken, you (if you're the OP at R3D) completely misunderstand the results presented and G80 effectively branches for free. You* were told this by Demi, DC and others over at Rage3D and they were correct then. They'll be correct here again if you* really want to ask again......
btw: I think the penalty for not believing Demi about something 3D or DC about math is running an S3 Virge for life
Moving your 3D request to the programming forum so as not to pollute this thread.

Geo · Nov 16, 2006

Where Mike Houston of Stanford --and one of the developers of GPUBench-- has already pitched in. Enjoy!

http://www.beyond3d.com/forum/showthread.php?p=874511#post874511

Jawed · Nov 16, 2006

Interruption in Programming

http://www.beyond3d.com/forum/showthread.php?t=21971

Damn, that's funny. July 2005.

Jawed

nAo · Nov 16, 2006

back then we were all fooled by nvidia..

Twinkie · Nov 20, 2006

Hold on for a minute. Is the G80 using a fixed function ratio of its unified shaders under DX9 games?

Geo · Nov 20, 2006

Twinkie said:
Hold on for a minute. Is the G80 using a fixed function ratio of its unified shaders under DX9 games?

Not that I've seen anyone provide evidence of. Why do you think so?

Matas · Nov 20, 2006

One question: how 128 streaming processores works in DX9 games if they lack of unified shader support?

Bob · Nov 20, 2006

Matas said:
One question: how 128 streaming processores works in DX9 games if they lack of unified shader support?

They just do. Unified instruction set has nothing to do with unified hardware execution units. You can always have either one without the other.

Geo · Nov 20, 2006

Matas said:
One question: how 128 streaming processores works in DX9 games if they lack of unified shader support?

Well, that's your problem right there. Who says they lack it? More importantly, who says they require support from anywhere external (presumably you're thinking of Microsoft) in the first place? David Kirk pointed out repeatedly that the two concepts of unified programming model and unified hardware assets were independant. . . G80 proves it is true --just not quite the way we all thought he meant it at the time. . .

Matas · Nov 20, 2006

So, streaming processors are sorted for example 80pixel and 48vertex(on 8800GTX) in DX9 games?

Geo · Nov 20, 2006

Matas said:
So, streaming processors are sorted for example 80pixel and 48vertex(on 8800GTX) in DX9 games?

Well, it would be a lot more granular and variable than that. It would be happening per frame. I don't know if we've decided yet if there would be any min/max stops, or if it might go 128/0 or 0/128 in spots.

Actually, "per frame" isn't even granular enough if we take the usage graphs that both ATI and NV have shown literally, based on B3D member RoOoBo's work.

Razor1 · Nov 20, 2006

Unification of shader hardware is controlled by the drivers and control silicon in the chip, it has nothing to do with the software running on top of the hardware (API or appilication).

Matas · Nov 20, 2006

Ok, I understood

KimB · Nov 20, 2006

The way I expect the unified pipelines work is that you just have N threads in flight at any given time, some of which will be pixel threads, while others will be vertex threads. nVidia will have a scheduler that decides which threads to execute during any given clock. Thus, in essence, you just have a cluster of processing units that can be assigned a thread to work on each and every clock, independent of whether it's a pixel or vertex thread. The load balancing would thus be perfect, and would be an improvement even for DX9 games, as you don't have to worry about small triangles eating up your vertex units, or large triangles eating up your pixel units while hardly touching the vertex units.

Arun · Nov 20, 2006

Razor1 said:
Unification of shader hardware is controlled by the drivers and control silicon in the chip, it has nothing to do with the software running on top of the hardware (API or appilication).

Actually, you're doing something massively wrong if the drivers have anything to say about it when running realworld applications. And it balances the workload constantly, not just per-frame. I'm not going to write an introduction to the subject however, as plenty of people have summarized it excellently in the past already.

Uttar

Malo · Nov 20, 2006

Would there be anything stopping nvidia or ati producing cutdown unified chips solely to make DX9 games faster and more efficient whilst saving transistor cost by excluding DX10 requirements not needed? And if not, then obviously it would have to be sooner rather than later but since DX9 games will still be around for a considerable length of time, would it be worth it to either IHV?

_xxx_ · Nov 20, 2006

No, they'll keep the current DX9 crop alive in some form as long as needed. Would be a waste of money to design new DX9 stuff.

Twinkie · Nov 22, 2006

geo said:
Not that I've seen anyone provide evidence of. Why do you think so?

I always though unified shaders were needed to be supported at a software level as well as in a hardware level. Therefore DX10/unified shader architecture. Now it could be assumed that the scheduler in G80 is dynamically allocating the shaders per frame.

So, theorectically a unified shader architecture will be more efficent than a fixed function architecture in DX9 games? but if so, why didnt ATi use the R500 and make a derivative for the PC market to face nVIDIA (7800GTX) instead of making (spending much R&D as well) another GPU that became known as R520?

Im presuming that they were going for a safer route instead. Or the decision was made long time ago, even before the R500 project took place.

When is the seond part of the B3D review of G80 coming? if there is any

The Official NVIDIA G80 Architecture Thread

nAo

Nutella Nutellae

dnavas

Rys

Graphics @ AMD

Geo

Mostly Harmless

Jawed

nAo

Nutella Nutellae

Twinkie

Geo

Mostly Harmless

Matas

Bob

Geo

Mostly Harmless

Matas

Geo

Mostly Harmless

Razor1

Matas

KimB

Arun

Unknown.

Malo

Yak Mechanicum

_xxx_

Twinkie

Similar threads