One full ALU and one "mini-ALU", whatever that means.Ostsol said:Doesn't ATI have 2 ALUs per pixel pipeline? 16x2 + 6 = 38. . .
Also, the VS pipes are vec4 + scalar currently.
R5x0 will reportedly have 48 "full" 3+1 ALUs, and I *guess* there will be 24 TMUs, or maybe 32.
All in all, I think 2.5 times the shader performance of R420 is an optimistic estimate (average, though VS bound scenes should fly).
Considering unified shader pipelines, I think it most likely won't be worth it until WGF, from a pure peak performance POV. Because currently, you only have to balance two loads, PS and VS, and you're most likely PS bound (on the PC platform at least). In WGF, you'll have to balance pre-tessellation and post-tessellation VS, geometry shader and PS. And you'll be able to write intermediate output streams, i.e. you are going to bypass some stages later.
E.g., if you tried to replace the VS and PS of NV40 with unified pipelines, you'd have to implement the special abilities of the VS as well as those of the PS, and on top of that the control logic to distribute the work loads. Overall, you might end up with 16 "unified" pipelines taking up the same transistor count as the 16+6 separate pipelines. The pipelines would be able to do a bit more per clock, but you most likely would end up with less performance on average.
But there might be other reasons for unified pipelines, like scalability and ease of design. And, of course, different requirements for different platforms.