ATI's idea on transistor budgets

superguy

Banned
I violently trashed ATI for their whole decoupled TMU's strategy. The performance of a 48 pipe part did not seem to be there and I blamed a lack of TMU's. However, a huge silver lining just occured to me.

It seems they saved massive transistors.

They went from 16 pixel shader pipes to 48 with just 63m more transistors.
Nvidia took 68 million to go up just 8 pipes.

I'm thinking ATI can now scale math power at will with much smaller transitor loss. Perhaps due at least in part to that they can target it narrowly. Going forward this will only increase the relative strength of X1900 series.

Smart, perhaps very smart.
 
I'm glad, that you finally invented the boiled water (for yourself). :D

And the story doesn't end just to the plain transistor count but a general marchitecture approach. ;)
 
Mentioned where? I had not seen it. I would be interested in that discussion.

Anyways are we sure there's a huge size difference between ATI and Nvidia pipes? The only difference being the non-decoupled TMU on a Nvidia pipe. And well I guess the mini-Alu's may actually be a transistor performance loss not win for Nvidia when contrasted with just adding more pipes.
 
Daryl said:
Mentioned where? I had not seen it. I would be interested in that discussion.

It was mentioned in some of the countless R520/R580 threads, there were also some interviews with ATI where they said that the new momry controller is the best thing since sliced bread, since it will enable them to expand on the architecture and that it was their "investment in the future" etc. You may try searching, but not much more than that was mentioned.
 
So let me ask this in this thread then: Is the 7900 series release by NVidia essentially the last high-end DX9 release or will ATI come back with an R590 release as a refresh on the X1900 series? Or is the R590 only for new lower-end cards like an X1900XL and perhaps an X1800GTO?
 
Daryl said:
I violently trashed ATI for their whole decoupled TMU's strategy. The performance of a 48 pipe part did not seem to be there and I blamed a lack of TMU's. However, a huge silver lining just occured to me.

It seems they saved massive transistors.

They went from 16 pixel shader pipes to 48 with just 63m more transistors.
Nvidia took 68 million to go up just 8 pipes.

I'm thinking ATI can now scale math power at will with much smaller transitor loss. Perhaps due at least in part to that they can target it narrowly. Going forward this will only increase the relative strength of X1900 series.

Smart, perhaps very smart.

ATI doesn't have 48 shader pipelines. It has 16. There are now 48 pixel shader PROCESSORS. ie. Three pixel shader processors per pipe.
 
rwolf said:
ATI doesn't have 48 shader pipelines. It has 16. There are now 48 pixel shader PROCESSORS. ie. Three pixel shader processors per pipe.
Define shader pipeline.
 
Hrm, here's one thing I've been wondering: are the three shader processors in ATI's pixel shader pipelines independent of one another? That is to say, do you need three ALU ops in a row to keep them all full? Would Tex, ALU, Tex, ALU force the 2/3rds of the ALU pipelines to remain idle (assuming ALU ops are dependent on the texture results, of course), or can ATI fill the units independently from different threads?
 
Ah, thanks. Well, then, it's rather pointless to talk about pixel pipelines at all with the R5xx architecture. It just doesn't have them.

It has arrays of units, each of which is pipelined, of course, but independently-addressable.

A pipeline is a different beast entirely: data flows through a pipeline in sequence, with various bits of work done along the way.

It really is time to throw all of this nomenclature out the window and just look at performance.
 
Chalnoth said:
Hrm, here's one thing I've been wondering: are the three shader processors in ATI's pixel shader pipelines independent of one another? That is to say, do you need three ALU ops in a row to keep them all full? Would Tex, ALU, Tex, ALU force the 2/3rds of the ALU pipelines to remain idle (assuming ALU ops are dependent on the texture results, of course), or can ATI fill the units independently from different threads?
Same thread, different/parallel pixels.
 
Let's not forget that the R580 features larger Z-buffers as well, which is also in the transistor count.

EDITED for typo
 
Last edited by a moderator:
3dcgi said:
Same thread, different/parallel pixels.
This thread is shared between 12 pixel shaders, not 16 right? That was the impression I got from Beyond3D's review; four dispatch processors each outing to 12 shader cores which themselves are grouped as quads.
 
Yup; In truth R580 does not have 3 ALUs per TMU-ROP pipe, but rather 12 quads. The 3:1 ratio is purely the total ALUs:Total TMU-ROPs.
 
JF_Aidan_Pryde said:
This thread is shared between 12 pixel shaders, not 16 right? That was the impression I got from Beyond3D's review; four dispatch processors each outing to 12 shader cores which themselves are grouped as quads.
R580 has 3x the thread/batch size of R520. I think that's what you're asking.
 
3dcgi said:
R580 has 3x the thread/batch size of R520. I think that's what you're asking.
I meant to confirm: given a thread, how many shader cores are working on it at once. For the R580, it should be 12 shader cores (3 quads). For the G70 it's 4 shader cores (1 quad).

So the R580 has four threads active at anytime, with a maximum of 512 in flight.
The G70 has six threads active at a time, with a maximum of 'hundreds' (according to NV).

Both are SIMD architectures; for a given clock, all active threads are executing the same shader program.

Am I interpreting the two architectures correctly?
 
JF_Aidan_Pryde said:
So the R580 has four threads active at anytime, with a maximum of 512 in flight.
Yes. Although if you include texturing, then it's possible for each's shader core's corresponding texture pipe to be working on a different thread - hence 8 concurrent threads are possible.

The G70 has six threads active at a time, with a maximum of 'hundreds' (according to NV).
Six threads, yes, but it's always just six. There's 1024 fragments in each thread.

Both are SIMD architectures; for a given clock, all active threads are executing the same shader program.
Each of R5xx's quad-shader cores has its own shader state and runs independent of the other quad-shader cores.

While each of G70's quads has its own shader state, the scan conversion assigns fragment-quads to shader-quads in a round-robin fashion. So two adjacent fragment-quads on a single triangle will be shaded by two different shader-quads:

11223344
11223344
556611
556611
2233
2233
44
44

Though the pattern is prolly more fiendish than that! I don't know how to take account of G70's ability to shade multiple triangles lumped together in one thread. It prolly walks one triangle at a time though.

Most of the time, with shaders that have no dynamic branching, all G70's shader-quads will progress together.

Jawed
 
Back
Top