ophirv, despite some of the seemingly demeaning comments in this thread, you brought up some very good points. While some of the other members brought up valid counter-arguments, they do not make up for the huge difference in performance per transistor.
Guys, consider this: If ATI made a part with the new memory controller, added FP blending, and used their old pipeline architecture, they'd have a faster performing part for the same die size.
All the other points in this thread are pretty moot.
-Regarding the memory controller:
Chalnoth said:
Well, if you look at pictures of the R520 die, you'll notice that a huge portion is taken up by the new memory controller. This new memory controller may well be one reason why the performance per clock per transistor in no AA/AF scenarios seems rather low. But, the memory controller does have its benefits. In particular, I believe it is reponsible for the very small performance hit from enabling AA that the R5xx architecture enjoys.
We've been through this before. The memory controller is only
8% of the die. That's not a huge portion, and it could (AFAIK) easily have been used in a design that had R300 style pixel shaders.
-FP32 is not as expensive as you guys are making it out to be. ATI added 32 FP32 math units to R520 with only 60M transistors. True, maybe they culled out unneeded areas or tweaked some parts to be smaller, but clearly this shows that it's a small part of the transistor jump between R420 and R520.
-"SM3.0 shaders" in just about every current game amounts to FP blending. Not including this feature was far and away ATI's biggest mistake last generation. I think it'll take nearly a year before you see something in games that truly needs PS3.0 or runs notably faster with it. This holds even more so for NVidia. FP24 and SM2.0 has lots of room left to make prettier games, and this is the point that ophirv is getting at.
-The "dispatcher" which keeps getting mentioned was not implemented primarily to improve efficiency, as the cost greatly outweighs the benefit. The main reason for the new dispatcher is for good dynamic branching performance. This is where the majority of the die space was consumed. You can see that NVidia's design packs higher performance per transistor in normal pixel shading scenarios, even versus R580. However, G70 will easily be 1/2 or even 1/10th the speed of ATI when dynamic branching is involved. In these cases, ATI has the performance per transistor advantage.
This last point pretty much sums it up. If ATI wasn't going for good dynamic branching performance, they'd come up with a much more compact design.
Anyways, ophirv, in the end I think you're right. ATI will pay for ditching its traditional architecture. If NVidia can get their AA/AF performance hit and quality up to ATI's level, then they will have a notable performance advantage with the same transistor count and clock speed. 90nm is the only thing saving ATI right now, and both R300/R420 and NV40/G70 have shader designs that make more business sense.