David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

one

Unruly Member
Veteran
After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture.

http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm

This article contains how David Kirk of NVIDIA evaluated US architecture and Xbox 360 GPU probably at GDC 2006, and a comment by an ATI guy so I pick up all of them leaving Goto's commentary.

D. Kirk: Our DirectX 10 GPU may be Unified-Shader, or not. Everyone thinks I said "we won't go there (Unified-Shader)." But what I said is just you can't know it until (our GPU) debuts.
D. Kirk: When's the right time for a Unified-Shader hardware, that's the problem. I agree that in future GPU will be simpler, less kinds of processors. Different hardware pieces such as Vertex Shader, Pixel Shader, ROP, frontend processor and Tesselator will change into a single piece that can do all things one day. But it takes time and can't be done at once. The change will happen progressively.
D. Kirk: The cost (of US) is huge. For example, (an updated architecture of) G71 can support "Unified" programming model, but (even in that case) execution is not Unified. The performance/mm^2 (die size) of G71 is very high. On the other hand, The performance/mm^2 of Xbox 360 GPU (with Unified-Shader) (Xenos) is lower. Which do you prefer?
Rick Bergman (Senior Vice President, PC Business Unit, ATI Technologies): To support DirectX 10, it requires 30-40% more logic (circuits).
(ATI GPUs that are preparing components for US are getting bigger)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_2l.gif

(A slide from the NVIDIA presentation at Graphics Hardware 2005, Aug. 2005)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg
D. Kirk: It's true that Unified-Shader is flexible, but it's more flexible than actual need. It's like 200-inches belt. If it's 200-inches it fits you however overweight you are, but if you're not overweight it's useless.

One of the reasons that support Unified-Shader is it enables better load balancing. You can assign Shader to pixel processing if required, and to vertex processing too. But, in the end, in most cases pixel processing is required. For example you may render 100 million pixels but not 100 million polygons. Of course, even if the setup unit can draw 100 million polygons.
D. Kirk: In the logical diagram of D3D 10, Vertex Shader, Geometry Shader and Pixel Shader are placed side by side. What happens if they are placed in the same box? Each Shader is a different part. If they get unified they become wasteful.

Besides, it requires more I/O (wires) because all connections with memory concentrate on the box. Registers and constants are put in a single box too. It's because you have to keep all vertex states, pixel states and geometry states together while doing load balancing. A bigger register array requires more ports.
D. Kirk: Let's take a look at the computation trend. A simple CPU of 20 years ago had only 1 function unit. In other words, it was Unified-Shader. (laugh) But now even Intel doesn't design such a CPU.

Complicated operations always give us the possibility to make many operations parallel. So we've been evolved GPU by making different pieces busy at the same time in a pipeline approach. If you distribute (a pipeline) to 20 operations each piece can do 20 operations by processing them in parallel. But if all are Unified you have to do 20 operations on 20 processors (Shaders).

I'm not saying Unified-Shader is not a good idea. But to enable (a single Shader) to do everything is a lot more difficult than expected. So I think it will go progressively.
D. Kirk: Even though they say it's a unified pipeline I think it's a hybrid and not completely unified. It's possible that it's an incomplete Unified-Shader with some parts unified but other parts shared.

It's not that I have a proof of that. But it should be the right decision for them. I think they don't make waste in Unified-Shader as they are clever.
D. Kirk: We want to remove special-purpose units from GPU. On the other hand, we also want to run (special graphics functions) really fast. If you remove all special-purpose implementations from GPU it's just a Pentium.
 
  • Like
Reactions: Geo
I always love it when really bright guys speculate on the other fellows architectural decisions based on what they know about their own. :smile:

Now, it isn't only because I get a certain enjoyment out of really bright people looking stupid once in awhile.

At least, not only. :LOL:
 
I think his initial statement on unified shaders came out a bit harsher than he would have liked. There is undoubtedly some benefit to it and it is clear to him that there is. I think what he is trying to say is hey, it's a good idea, but it does have its drawbacks and we are going to move alot slower than ATI towards it.

I especially like his comments about performance per square millimeter. I am almost beginning to think that flops are going away and fps/die size is our new measuring stick. ;)
 
D. Kirk: The cost (of US) is huge. For example, (an updated architecture of) G71 can support "Unified" programming model, but (even in that case) execution is not Unified. The performance/mm^2 (die size) of G71 is very high. On the other hand, The performance/mm^2 of Xbox 360 GPU (with Unified-Shader) (Xenos) is lower. Which do you prefer?
that should be peak performance/mm²
 
Gateway2 said:
David Kirk: It's a stupid idea. However, we will do it eventually.

:smile:

He never said "stupid", rather "expensive" :)

(also supported by the statement from R. Bergman that it requires ~40% more core logic)
 
I can understand unified shaders and I can understand seperate shaders but i do not have enough graphics know how to understand how you have a combination of the two. Does this mean Kirk is suggesting they will have a small set of vertex shaders, a large set of pixel shaders and small set of shaders that can do both when needed ie firefighting duties ?

Or is he suggesting something else ?
 
one said:
After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture.

I was directed from the G80 rumours thread as I posted this same article there since yesterday :cool:

Thank you for your kindly invitation to join here, but I see this content is more relate to the PC part on the G80 stuffs, as both X360 and RSX (console parts) are not DX10 part (or I might be wrong again). Anyway, it's nice to read a better translation than that of BabelFish gave me yesterday ;). All in all, I still think this article is more suitable for PC part as it is about Kirk talking on DX10 and US implementation on NV part, but X360 is one of the example Kirk commented as it is now only chip to be US even it is not DX10, and the RSX was never mentioned to be US too.
 
Last edited by a moderator:
Kirk said:
Complicated operations always give us the possibility to make many operations parallel. So we've been evolved GPU by making different pieces busy at the same time in a pipeline approach. If you distribute (a pipeline) to 20 operations each piece can do 20 operations by processing them in parallel. But if all are Unified you have to do 20 operations on 20 processors (Shaders).http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg

I don't really get this way of thinking. Either his idea of US is not pilelined anymore :rolleyes:, or he just pretend not to understand the advantages of US!

This interview feels a bit like the one about AA and floating point blending.
 
i believe its still years to early for a true unified shader model, until cards reach a certain performance level + demand, specicalisation is gonna win out performancewise ( this is why i believe the ati cards hyped as unified shader actually aint really 100% unified, ati aint dumb )
 
Gateway2 said:
Yes Nvidia has been really hammering that lately..maybe because they have a current edge there?

That's fine for the manufacturer, but the end user doesn't care if his die is a little bigger and a little less efficient per square mm if it means he get a big performance hike over the competing product.
 
Bouncing Zabaglione Bros. said:
That's fine for the manufacturer, but the end user doesn't care if his die is a little bigger and a little less efficient per square mm if it means he get a big performance hike over the competing product.

Of course the end user cares because this is what makes great products like the GF7600GT possible for the end user. Where is ATI's competeting product to this one? There is none. Don't tell me that stupid X1800GTO. It' only R520 inventory left over and won't nearly sell in quantities like the GF7600GT. It's not even in the same price spot.

Performance per sq mm is very important when it comes to a balanced, good and complete product lineup. That is something ATI is lagging since R300 days.
 
Gateway2 said:
David Kirk: It's a stupid idea. However, we will do it eventually.

And then he'll say it's the best thing ever.

He's right that a non-unified shader (NUS) architecture is faster than a US, but that is for existing workloads. The ratio of VS to PS has been fairly constant over the years, workloads reflect this.

It's guaranteed that developers will suck up any performance that is there. A unified shader model will present new possibilities and hence will produce different workloads that will run less stellar on a traditional architecture.

Cheers
 
dizietsma said:
I can understand unified shaders and I can understand seperate shaders but i do not have enough graphics know how to understand how you have a combination of the two. Does this mean Kirk is suggesting they will have a small set of vertex shaders, a large set of pixel shaders and small set of shaders that can do both when needed ie firefighting duties ?

Or is he suggesting something else ?
I think the easiest route for "partial" unification is to unify the vertex shaders and geometry shaders, while still keeping the pixel shaders separate - all the geometry operation types will be unified, but still a separation between geometry and pixel ops.

_xxx_ said:
(also supported by the statement from R. Bergman that it requires ~40% more core logic)
Errr, the quoted statement above pertains to the requirements of DX10, not unification.
 
The problem with these kind of pieces is that you just know that Kirk would be saying exactly the opposite if G80 had a full US.

Just another PR-led piece spreading some FUD around which sounds good to investors.

I await the ATI rebuttal which will contain plenty of PR-led FUD about non-unified architectures! ;)
 
Dave Baumann said:
Errr, the quoted statement above pertains to the requirements of DX10, not unification.

Well the discussion so far points to unified being the "nicer" way to go with DX10, of course without being a hard requirement. I know the one doesn't implicate the other, but the increase in core logic is imminent this way or the other, since even if nV don't go unified they'll have to invest lots of die space to support all DX10 features with some muscle, hopefully (meaning, not just having all the checkboxes filled but actually being able to deliver speed wise as well).

EDIT: my point being, is that die space better invested in additional logic for the US or a few extra "classic" pipes? I guess we'll have to wait till we have actual parts for comparison. My guess out of the blue: non-US will have higher peak performance, US will have less variations in fps (or "more steady" framerates, but with lower overall peaks).
 
Last edited by a moderator:
NocturnDragon said:
I don't really get this way of thinking. Either his idea of US is not pilelined anymore :rolleyes:, or he just pretend not to understand the advantages of US!

In general, the idea stands its own. But that's all a matter of POV. US will exceed "specialized" pipelines under certain conditions, or vice-versa. If the load can be managed/distributed properly, US will be the more efficient architecture under increased load. But he questions if we're there yet and not if it's better as such, at least that's how I get it.
 
Back
Top