After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture.
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm
This article contains how David Kirk of NVIDIA evaluated US architecture and Xbox 360 GPU probably at GDC 2006, and a comment by an ATI guy so I pick up all of them leaving Goto's commentary.
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_2l.gif
(A slide from the NVIDIA presentation at Graphics Hardware 2005, Aug. 2005)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm
This article contains how David Kirk of NVIDIA evaluated US architecture and Xbox 360 GPU probably at GDC 2006, and a comment by an ATI guy so I pick up all of them leaving Goto's commentary.
D. Kirk: Our DirectX 10 GPU may be Unified-Shader, or not. Everyone thinks I said "we won't go there (Unified-Shader)." But what I said is just you can't know it until (our GPU) debuts.
D. Kirk: When's the right time for a Unified-Shader hardware, that's the problem. I agree that in future GPU will be simpler, less kinds of processors. Different hardware pieces such as Vertex Shader, Pixel Shader, ROP, frontend processor and Tesselator will change into a single piece that can do all things one day. But it takes time and can't be done at once. The change will happen progressively.
D. Kirk: The cost (of US) is huge. For example, (an updated architecture of) G71 can support "Unified" programming model, but (even in that case) execution is not Unified. The performance/mm^2 (die size) of G71 is very high. On the other hand, The performance/mm^2 of Xbox 360 GPU (with Unified-Shader) (Xenos) is lower. Which do you prefer?
(ATI GPUs that are preparing components for US are getting bigger)Rick Bergman (Senior Vice President, PC Business Unit, ATI Technologies): To support DirectX 10, it requires 30-40% more logic (circuits).
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_2l.gif
(A slide from the NVIDIA presentation at Graphics Hardware 2005, Aug. 2005)
http://pc.watch.impress.co.jp/docs/2006/0419/kaigai_3.jpg
D. Kirk: It's true that Unified-Shader is flexible, but it's more flexible than actual need. It's like 200-inches belt. If it's 200-inches it fits you however overweight you are, but if you're not overweight it's useless.
One of the reasons that support Unified-Shader is it enables better load balancing. You can assign Shader to pixel processing if required, and to vertex processing too. But, in the end, in most cases pixel processing is required. For example you may render 100 million pixels but not 100 million polygons. Of course, even if the setup unit can draw 100 million polygons.
D. Kirk: In the logical diagram of D3D 10, Vertex Shader, Geometry Shader and Pixel Shader are placed side by side. What happens if they are placed in the same box? Each Shader is a different part. If they get unified they become wasteful.
Besides, it requires more I/O (wires) because all connections with memory concentrate on the box. Registers and constants are put in a single box too. It's because you have to keep all vertex states, pixel states and geometry states together while doing load balancing. A bigger register array requires more ports.
D. Kirk: Let's take a look at the computation trend. A simple CPU of 20 years ago had only 1 function unit. In other words, it was Unified-Shader. (laugh) But now even Intel doesn't design such a CPU.
Complicated operations always give us the possibility to make many operations parallel. So we've been evolved GPU by making different pieces busy at the same time in a pipeline approach. If you distribute (a pipeline) to 20 operations each piece can do 20 operations by processing them in parallel. But if all are Unified you have to do 20 operations on 20 processors (Shaders).
I'm not saying Unified-Shader is not a good idea. But to enable (a single Shader) to do everything is a lot more difficult than expected. So I think it will go progressively.
D. Kirk: Even though they say it's a unified pipeline I think it's a hybrid and not completely unified. It's possible that it's an incomplete Unified-Shader with some parts unified but other parts shared.
It's not that I have a proof of that. But it should be the right decision for them. I think they don't make waste in Unified-Shader as they are clever.
D. Kirk: We want to remove special-purpose units from GPU. On the other hand, we also want to run (special graphics functions) really fast. If you remove all special-purpose implementations from GPU it's just a Pentium.