ATi presentation on future GPU design

991060 · Mar 30, 2004

The screens are captured from a leaked ppt (once available from ATi.com, but it's deleted later). I found sth interesting you may want to have a look.

http://bbs.gzeasy.com/uploads/post-28-1080642030.png

more available at:
http://bbs.gzeasy.com/uploads/post-28-1080642064.png
http://bbs.gzeasy.com/uploads/post-28-1080642109.png
http://bbs.gzeasy.com/uploads/post-28-1080642147.png
http://bbs.gzeasy.com/uploads/post-28-1080642180.png
http://bbs.gzeasy.com/uploads/post-28-1080642232.png
http://bbs.gzeasy.com/uploads/post-28-1080642273.png

Edit: Image > File by 56k Nazi

Zaphod · Mar 30, 2004

There is a seven page thread on this below: http://www.beyond3d.com/forum/viewtopic.php?t=11153

991060 · Mar 30, 2004

Some questions floating in my brain:

1. SM3.0 requires FP32 all the way down, is there a confirmation from MS?
2. what could ATi's "own SM3.0 part" be?
3. flow control hit, is it unavoidable in this generation hardware?

991060 · Mar 30, 2004

Zaphod said:
There is a seven page thread on this below: http://www.beyond3d.com/forum/viewtopic.php?t=11153

Errr, didn't spot that before post, please lock the thread then.

Dave Baumann · Mar 30, 2004

That thread was somewhat hijacked by the revelation of the notes. I think there is room to talk about the ideas behind a unified shader architecture if people wish to here.

radar1200gs · Mar 30, 2004

just noticed the following on the http://bbs.gzeasy.com/uploads/post-28-1080642147.png slide:

Also mention that our VPU's are 5D. The vertex units can paralellise a 4D op with a 1D op - giving 5 paralell results! [NV can't do this]

Anyone care to explain this a little more? Could this be the basis of an "Extreme" pipeline?

Also on that slide they state:

Branches - VPUs don't like branching...

Surely they meant ATi VPUs don't like branching - nVidia VPUs don't have a problem with it.

Dave Baumann · Mar 30, 2004

1 - As I mentioned in the other thread, ATI's VS can handle both a full vector op and scalar simultaneously.

2 - Presently all graphics has issues with branching in relation to the speed a CPU can do branching. The message is not to expect branching in graphics to be fast because you see its fast in CPU's - CPU's have large sections of logic dedicated to efficient handling of branching, whereas graphics chips have very little or none. Basically the message is that just because a chip may be able to do it don't necessarily expect it to be good, be careful where / when you use it.

Luminescent · Mar 30, 2004

I believe the R3xx VS architecture can concurrently handle a full vector op alongside a scalar one [per unit].

davepermen · Mar 30, 2004

DaveBaumann said:
2 - Presently all graphics has issues with branching in relation to the speed a CPU can do branching. The message is not to expect branching in graphics to be fast because you see its fast in CPU's - CPU's have large sections of logic dedicated to efficient handling of branching, whereas graphics chips have very little or none. Basically the message is that just because a chip may be able to do it don't necessarily expect it to be good, be careful where / when you use it.

depends on processor, too.. p4 aren't that fast in doing it. or.. let's reformulate.. extremely variable in performance, depending on if they guessed right.

Evildeus · Mar 30, 2004

Is there a processor/GPU that does it fast almost always? :?:

davepermen · Mar 30, 2004

amd athlons.. the old one as well as the new ones. they work quite at the same (very good) performance, independent on what code you feed in. be it SIMD, be it branching, be it some other stuff. they aren't heavily pipelined, so they don't have big issues.

and that will be the only way gpu's can do fast branching: by having tons of parallel working units, that don't have much pipelines.. but how that can be done, or, if at all, thats another question..

Tim · Mar 30, 2004

radar1200gs said:
Surely they meant ATi VPUs don't like branching - nVidia VPUs don't have a problem with it.

So nVidia is lying when they claim tha dynamic branching are "expensive"? Just to help Ati and make their own product look bad.

davepermen · Mar 30, 2004

radar1200gs said:
Surely they meant ATi VPUs don't like branching - nVidia VPUs don't have a problem with it.

nvidia states branching can hurt the performance and should only be used to skip big blocks of data, or early outs.

this is nv40. http://developer.nvidia.com

Evildeus · Mar 30, 2004

davepermen said:
amd athlons.. the old one as well as the new ones. they work quite at the same (very good) performance, independent on what code you feed in. be it SIMD, be it branching, be it some other stuff. they aren't heavily pipelined, so they don't have big issues.

and that will be the only way gpu's can do fast branching: by having tons of parallel working units, that don't have much pipelines.. but how that can be done, or, if at all, thats another question..

Thanks

akira888 · Mar 30, 2004

radar - the reason that dynamic branching is almost neccessarily slow is simply that you could end up calculating four times as many fragments than you would without. For a thorough explantion search this site for "dynamic branching."

Dave Baumann · Mar 30, 2004

We're talking about the VS here. The fragment shaders were either one full vector or one < 4 vector op + a scalar, however the vertex shader can cope with one full vector op and a scalar simultaneously (in R300).

radar1200gs · Mar 30, 2004

While NV3x branching may not be the absolute greatest, it beats R3xx branching (there is none, yes there is the f-buffer thingo, but that isn't part of any spec, NV3x's branching does follow DX9 spec).

Also I you don't know yet if nVidia have improved NV40 branching over NV3x branching.

Dave Baumann · Mar 30, 2004

a.) R300 has constant branching IIRC.

b.) The point still stand - developers shouldn't expect branching to be anything near CPU's brnaching capabilities and both ATI and NVIDIA are urging developer to be circumspect with its use.

arjan de lumens · Mar 30, 2004

Evildeus said:
Is there a processor/GPU that does it fast almost always?

The massively multithreaded Cray MTA processor. When it detects a branch (or any instruction that might have a >1 cycle latency, such as a memory load) it swaps execution to another thread; by juggling around 100+ threads, it can easily sustain >98% of maxiumum theoretical IPC even on code heavily loaded with branches or memory loads, despite the fact that it has no branch predictor or data cache.

radar1200gs · Mar 30, 2004

DaveBaumann said:
a.) R300 has constant branching IIRC.

b.) The point still stand - developers shouldn't expect branching to be anything near CPU's brnaching capabilities and both ATI and NVIDIA are urging developer to be circumspect with its use.

Its been a long time since I last looked at branching, but I seem to remember a digit-life article saying that R3xx did its limited branching through drivers unrolling the code, in other words the cpu, not the gpu is really doing the branching.

nVidia's is all in hardware on the GPU.

ATi presentation on future GPU design

991060

Zaphod

Remember

991060

991060

Dave Baumann

Gamerscore Wh...

radar1200gs

Dave Baumann

Gamerscore Wh...

Luminescent

davepermen

Evildeus

davepermen

Tim

davepermen

Evildeus

akira888

Dave Baumann

Gamerscore Wh...

radar1200gs

Dave Baumann

Gamerscore Wh...

arjan de lumens

radar1200gs

Similar threads