ATi presentation on future GPU design

Some questions floating in my brain:

1. SM3.0 requires FP32 all the way down, is there a confirmation from MS?
2. what could ATi's "own SM3.0 part" be?
3. flow control hit, is it unavoidable in this generation hardware?
 
That thread was somewhat hijacked by the revelation of the notes. I think there is room to talk about the ideas behind a unified shader architecture if people wish to here.
 
just noticed the following on the http://bbs.gzeasy.com/uploads/post-28-1080642147.png slide:
Also mention that our VPU's are 5D. The vertex units can paralellise a 4D op with a 1D op - giving 5 paralell results! [NV can't do this]

Anyone care to explain this a little more? Could this be the basis of an "Extreme" pipeline?

Also on that slide they state:
Branches - VPUs don't like branching...
Surely they meant ATi VPUs don't like branching - nVidia VPUs don't have a problem with it.
 
1 - As I mentioned in the other thread, ATI's VS can handle both a full vector op and scalar simultaneously.

2 - Presently all graphics has issues with branching in relation to the speed a CPU can do branching. The message is not to expect branching in graphics to be fast because you see its fast in CPU's - CPU's have large sections of logic dedicated to efficient handling of branching, whereas graphics chips have very little or none. Basically the message is that just because a chip may be able to do it don't necessarily expect it to be good, be careful where / when you use it.
 
DaveBaumann said:
2 - Presently all graphics has issues with branching in relation to the speed a CPU can do branching. The message is not to expect branching in graphics to be fast because you see its fast in CPU's - CPU's have large sections of logic dedicated to efficient handling of branching, whereas graphics chips have very little or none. Basically the message is that just because a chip may be able to do it don't necessarily expect it to be good, be careful where / when you use it.

depends on processor, too.. p4 aren't that fast in doing it. or.. let's reformulate.. extremely variable in performance, depending on if they guessed right.
 
amd athlons.. the old one as well as the new ones. they work quite at the same (very good) performance, independent on what code you feed in. be it SIMD, be it branching, be it some other stuff. they aren't heavily pipelined, so they don't have big issues.

and that will be the only way gpu's can do fast branching: by having tons of parallel working units, that don't have much pipelines.. but how that can be done, or, if at all, thats another question..
 
radar1200gs said:
Surely they meant ATi VPUs don't like branching - nVidia VPUs don't have a problem with it.

So nVidia is lying when they claim tha dynamic branching are "expensive"? Just to help Ati and make their own product look bad.
 
davepermen said:
amd athlons.. the old one as well as the new ones. they work quite at the same (very good) performance, independent on what code you feed in. be it SIMD, be it branching, be it some other stuff. they aren't heavily pipelined, so they don't have big issues.

and that will be the only way gpu's can do fast branching: by having tons of parallel working units, that don't have much pipelines.. but how that can be done, or, if at all, thats another question..
Thanks
 
radar - the reason that dynamic branching is almost neccessarily slow is simply that you could end up calculating four times as many fragments than you would without. For a thorough explantion search this site for "dynamic branching."
 
We're talking about the VS here. The fragment shaders were either one full vector or one < 4 vector op + a scalar, however the vertex shader can cope with one full vector op and a scalar simultaneously (in R300).
 
While NV3x branching may not be the absolute greatest, it beats R3xx branching (there is none, yes there is the f-buffer thingo, but that isn't part of any spec, NV3x's branching does follow DX9 spec).

Also I you don't know yet if nVidia have improved NV40 branching over NV3x branching.
 
a.) R300 has constant branching IIRC.

b.) The point still stand - developers shouldn't expect branching to be anything near CPU's brnaching capabilities and both ATI and NVIDIA are urging developer to be circumspect with its use.
 
Evildeus said:
Is there a processor/GPU that does it fast almost always? :?:
The massively multithreaded Cray MTA processor. When it detects a branch (or any instruction that might have a >1 cycle latency, such as a memory load) it swaps execution to another thread; by juggling around 100+ threads, it can easily sustain >98% of maxiumum theoretical IPC even on code heavily loaded with branches or memory loads, despite the fact that it has no branch predictor or data cache.
 
DaveBaumann said:
a.) R300 has constant branching IIRC.

b.) The point still stand - developers shouldn't expect branching to be anything near CPU's brnaching capabilities and both ATI and NVIDIA are urging developer to be circumspect with its use.

Its been a long time since I last looked at branching, but I seem to remember a digit-life article saying that R3xx did its limited branching through drivers unrolling the code, in other words the cpu, not the gpu is really doing the branching.

nVidia's is all in hardware on the GPU.
 
Back
Top