David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

one · Apr 19, 2006

After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture.

http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm

This article contains how David Kirk of NVIDIA evaluated US architecture and Xbox 360 GPU probably at GDC 2006, and a comment by an ATI guy so I pick up all of them leaving Goto's commentary.

D. Kirk: Our DirectX 10 GPU may be Unified-Shader, or not. Everyone thinks I said "we won't go there (Unified-Shader)." But what I said is just you can't know it until (our GPU) debuts.

D. Kirk: When's the right time for a Unified-Shader hardware, that's the problem. I agree that in future GPU will be simpler, less kinds of processors. Different hardware pieces such as Vertex Shader, Pixel Shader, ROP, frontend processor and Tesselator will change into a single piece that can do all things one day. But it takes time and can't be done at once. The change will happen progressively.

D. Kirk: The cost (of US) is huge. For example, (an updated architecture of) G71 can support "Unified" programming model, but (even in that case) execution is not Unified. The performance/mm^2 (die size) of G71 is very high. On the other hand, The performance/mm^2 of Xbox 360 GPU (with Unified-Shader) (Xenos) is lower. Which do you prefer?

Rick Bergman (Senior Vice President, PC Business Unit, ATI Technologies): To support DirectX 10, it requires 30-40% more logic (circuits).

(ATI GPUs that are preparing components for US are getting bigger)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_2l.gif

(A slide from the NVIDIA presentation at Graphics Hardware 2005, Aug. 2005)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg

D. Kirk: It's true that Unified-Shader is flexible, but it's more flexible than actual need. It's like 200-inches belt. If it's 200-inches it fits you however overweight you are, but if you're not overweight it's useless.

One of the reasons that support Unified-Shader is it enables better load balancing. You can assign Shader to pixel processing if required, and to vertex processing too. But, in the end, in most cases pixel processing is required. For example you may render 100 million pixels but not 100 million polygons. Of course, even if the setup unit can draw 100 million polygons.

D. Kirk: In the logical diagram of D3D 10, Vertex Shader, Geometry Shader and Pixel Shader are placed side by side. What happens if they are placed in the same box? Each Shader is a different part. If they get unified they become wasteful.

Besides, it requires more I/O (wires) because all connections with memory concentrate on the box. Registers and constants are put in a single box too. It's because you have to keep all vertex states, pixel states and geometry states together while doing load balancing. A bigger register array requires more ports.

D. Kirk: Let's take a look at the computation trend. A simple CPU of 20 years ago had only 1 function unit. In other words, it was Unified-Shader. (laugh) But now even Intel doesn't design such a CPU.

Complicated operations always give us the possibility to make many operations parallel. So we've been evolved GPU by making different pieces busy at the same time in a pipeline approach. If you distribute (a pipeline) to 20 operations each piece can do 20 operations by processing them in parallel. But if all are Unified you have to do 20 operations on 20 processors (Shaders).

I'm not saying Unified-Shader is not a good idea. But to enable (a single Shader) to do everything is a lot more difficult than expected. So I think it will go progressively.

D. Kirk: Even though they say it's a unified pipeline I think it's a hybrid and not completely unified. It's possible that it's an incomplete Unified-Shader with some parts unified but other parts shared.

It's not that I have a proof of that. But it should be the right decision for them. I think they don't make waste in Unified-Shader as they are clever.

D. Kirk: We want to remove special-purpose units from GPU. On the other hand, we also want to run (special graphics functions) really fast. If you remove all special-purpose implementations from GPU it's just a Pentium.

Geo · Apr 19, 2006

I always love it when really bright guys speculate on the other fellows architectural decisions based on what they know about their own. :smile:

Now, it isn't only because I get a certain enjoyment out of really bright people looking stupid once in awhile.

At least, not only.

superguy · Apr 19, 2006

David Kirk: It's a stupid idea. However, we will do it eventually.

:smile:

ondaedg · Apr 19, 2006

I think his initial statement on unified shaders came out a bit harsher than he would have liked. There is undoubtedly some benefit to it and it is clear to him that there is. I think what he is trying to say is hey, it's a good idea, but it does have its drawbacks and we are going to move alot slower than ATI towards it.

I especially like his comments about performance per square millimeter. I am almost beginning to think that flops are going away and fps/die size is our new measuring stick.

superguy · Apr 19, 2006

I especially like his comments about performance per square millimeter

Yes Nvidia has been really hammering that lately..maybe because they have a current edge there?

oeLangOetan · Apr 19, 2006

D. Kirk: The cost (of US) is huge. For example, (an updated architecture of) G71 can support "Unified" programming model, but (even in that case) execution is not Unified. The performance/mm^2 (die size) of G71 is very high. On the other hand, The performance/mm^2 of Xbox 360 GPU (with Unified-Shader) (Xenos) is lower. Which do you prefer?

that should be peak performance/mmÂ²

Moloch · Apr 19, 2006

*awaits an ati emploee offering a rebuttal*

_xxx_ · Apr 19, 2006

Gateway2 said:
David Kirk: It's a stupid idea. However, we will do it eventually.

:smile:

He never said "stupid", rather "expensive"

(also supported by the statement from R. Bergman that it requires ~40% more core logic)

dizietsma · Apr 19, 2006

I can understand unified shaders and I can understand seperate shaders but i do not have enough graphics know how to understand how you have a combination of the two. Does this mean Kirk is suggesting they will have a small set of vertex shaders, a large set of pixel shaders and small set of shaders that can do both when needed ie firefighting duties ?

Or is he suggesting something else ?

satein · Apr 19, 2006

one said:
After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture.

I was directed from the G80 rumours thread as I posted this same article there since yesterday

Thank you for your kindly invitation to join here, but I see this content is more relate to the PC part on the G80 stuffs, as both X360 and RSX (console parts) are not DX10 part (or I might be wrong again). Anyway, it's nice to read a better translation than that of BabelFish gave me yesterday

. All in all, I still think this article is more suitable for PC part as it is about Kirk talking on DX10 and US implementation on NV part, but X360 is one of the example Kirk commented as it is now only chip to be US even it is not DX10, and the RSX was never mentioned to be US too.

NocturnDragon · Apr 19, 2006

Kirk said:
Complicated operations always give us the possibility to make many operations parallel. So we've been evolved GPU by making different pieces busy at the same time in a pipeline approach. If you distribute (a pipeline) to 20 operations each piece can do 20 operations by processing them in parallel. But if all are Unified you have to do 20 operations on 20 processors (Shaders).http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg

I don't really get this way of thinking. Either his idea of US is not pilelined anymore

, or he just pretend not to understand the advantages of US!

This interview feels a bit like the one about AA and floating point blending.

zed · Apr 19, 2006

i believe its still years to early for a true unified shader model, until cards reach a certain performance level + demand, specicalisation is gonna win out performancewise ( this is why i believe the ati cards hyped as unified shader actually aint really 100% unified, ati aint dumb )

Bouncing Zabaglione Bros. · Apr 19, 2006

Gateway2 said:
Yes Nvidia has been really hammering that lately..maybe because they have a current edge there?

That's fine for the manufacturer, but the end user doesn't care if his die is a little bigger and a little less efficient per square mm if it means he get a big performance hike over the competing product.

Richthofen · Apr 19, 2006

Gateway2 said:
Yes Nvidia has been really hammering that lately..maybe because they have a current edge there?

a rather huge edge if you ask me.

Richthofen · Apr 19, 2006

Bouncing Zabaglione Bros. said:
That's fine for the manufacturer, but the end user doesn't care if his die is a little bigger and a little less efficient per square mm if it means he get a big performance hike over the competing product.

Of course the end user cares because this is what makes great products like the GF7600GT possible for the end user. Where is ATI's competeting product to this one? There is none. Don't tell me that stupid X1800GTO. It' only R520 inventory left over and won't nearly sell in quantities like the GF7600GT. It's not even in the same price spot.

Performance per sq mm is very important when it comes to a balanced, good and complete product lineup. That is something ATI is lagging since R300 days.

Gubbi · Apr 19, 2006

Gateway2 said:
David Kirk: It's a stupid idea. However, we will do it eventually.

And then he'll say it's the best thing ever.

He's right that a non-unified shader (NUS) architecture is faster than a US, but that is for existing workloads. The ratio of VS to PS has been fairly constant over the years, workloads reflect this.

It's guaranteed that developers will suck up any performance that is there. A unified shader model will present new possibilities and hence will produce different workloads that will run less stellar on a traditional architecture.

Cheers

Dave Baumann · Apr 19, 2006

dizietsma said:
I can understand unified shaders and I can understand seperate shaders but i do not have enough graphics know how to understand how you have a combination of the two. Does this mean Kirk is suggesting they will have a small set of vertex shaders, a large set of pixel shaders and small set of shaders that can do both when needed ie firefighting duties ?

Or is he suggesting something else ?

I think the easiest route for "partial" unification is to unify the vertex shaders and geometry shaders, while still keeping the pixel shaders separate - all the geometry operation types will be unified, but still a separation between geometry and pixel ops.

_xxx_ said:
(also supported by the statement from R. Bergman that it requires ~40% more core logic)

Errr, the quoted statement above pertains to the requirements of DX10, not unification.

Mariner · Apr 19, 2006

The problem with these kind of pieces is that you just know that Kirk would be saying exactly the opposite if G80 had a full US.

Just another PR-led piece spreading some FUD around which sounds good to investors.

I await the ATI rebuttal which will contain plenty of PR-led FUD about non-unified architectures!

_xxx_ · Apr 19, 2006

Dave Baumann said:
Errr, the quoted statement above pertains to the requirements of DX10, not unification.

Well the discussion so far points to unified being the "nicer" way to go with DX10, of course without being a hard requirement. I know the one doesn't implicate the other, but the increase in core logic is imminent this way or the other, since even if nV don't go unified they'll have to invest lots of die space to support all DX10 features with some muscle, hopefully (meaning, not just having all the checkboxes filled but actually being able to deliver speed wise as well).

EDIT: my point being, is that die space better invested in additional logic for the US or a few extra "classic" pipes? I guess we'll have to wait till we have actual parts for comparison. My guess out of the blue: non-US will have higher peak performance, US will have less variations in fps (or "more steady" framerates, but with lower overall peaks).

_xxx_ · Apr 19, 2006

NocturnDragon said:
I don't really get this way of thinking. Either his idea of US is not pilelined anymore , or he just pretend not to understand the advantages of US!

In general, the idea stands its own. But that's all a matter of POV. US will exceed "specialized" pipelines under certain conditions, or vice-versa. If the load can be managed/distributed properly, US will be the more efficient architecture under increased load. But he questions if we're there yet and not if it's better as such, at least that's how I get it.

David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

one

Unruly Member

Geo

Mostly Harmless

superguy

ondaedg

superguy

oeLangOetan

Moloch

God of Wicked Games

_xxx_

dizietsma

satein

NocturnDragon

zed

Bouncing Zabaglione Bros.

Richthofen

Richthofen

Gubbi

Dave Baumann

Gamerscore Wh...

Mariner

_xxx_

_xxx_

Similar threads