RXXX Series Roadmap from AnandTech

Dave Baumann said:
Note that I'm not saying what constitutes a pipeline, I'm saying what is now commonly being referred to in terms of "pipes" elsewhere and then consider how that might relate to the two bits of information (1.) Anands article,2.) the stuff seen here)

Does the bloody notation at least mean the same thing for all four cards? They didn't switch notation going from high-end to mid/low? Cause that first column is a killer if it means the same thing for all four cards.
 
geo said:
Does the bloody notation at least mean the same thing for all four cards? They didn't switch notation going from high-end to mid/low? Cause that first column is a killer if it means the same thing for all four cards.

It means the same thing.
 
caboosemoose said:
It means the same thing.

Well, that's a pretty flat statement. Going into the tease business? Competition is pretty fierce around here. . . ;)
 
Well, all I will say is this: where did the X-X-X-X information originally come from? I have no idea what it means, but I have good reason to believe it is likely to be consistent.
 
Note that I'm not saying what constitutes a pipeline, I'm saying what is now commonly being referred to in terms of "pipes" elsewhere and then consider how that might relate to the two bits of information (1.) Anands article,2.) the stuff seen here)
In the NV30 timeframe, that would have been Z-or-Color-ONLY-outputs, but this isn't accepted by the media; on the other hand, the NV31/NV34/NV36 pipeline configuration (4x1 only without loopback; 2x2 otherwise) was generally accepted as a "4 pipelines design".

Another media-accepted configuration is having more "shading" pipelines than "output" pipelines, ala G70; in that case, the "shading" pipeline count is taken. Still, that doesn't really give me any way to choose between either possibilities - if it isn't something else, in fact.


Uttar
 
I believe the value of 3 for R580 vs the 1 for R520, indicates texturing or shader performance.

for a long time now, ATI has had only 1 texture unit per pipeline, ever since R300. perhaps they've decided to dramatically beef up this figure and beat Nvidia, even if Nvidia's value in the same area is 2.


remember the Matrox Parhelia? it had 4 TMU per pipeline. remember ATI's own R100? it had 3 TMUs per pipeline.

I'm not saying that that 3 in R580 means TMUs in the traditional sense, but some sort of texture-shader ALU unit, or performance figure.


hey, I might be WAY off in what I'm saying here. just offering up my thoughts to the board..
 
caboosemoose said:
Indeed, very probably, given that RV350's RPI (relative performance index) vs R520 is rated at 0.5x(+). By contrast, R580 is 1.5(+)...

Note that I'm not saying what constitutes a pipeline, I'm saying what is now commonly being referred to in terms of "pipes" elsewhere and then consider how that might relate to the two bits of information (1.) Anands article,2.) the stuff seen here)

The only other thing I kept from Anands article are the clockspeeds.

Let's put this on another level, especially considering caboosemoose's note:

I expect to see with "today's" terminology from the desktop R5xx/RV5xx product line:

high end: 4 quads (with the possibility of 3 operational quad variants for the lowest part of this market segment)

mainstream: 2 quads

value: 1 quad

As for the mystical "3" I have severe doubts that between R520 and R580 any kind of physical units that take up large transistor counts are being tripled for a high end refresh. Any kind of OPs are an entirely different story though. All of course IMHLO.

In any case it's been noted over and over again that pixel fill-rates are losing in a relative sense importance, while on contrary internal shader fill-rate gains importance. However the "gap" between 2400MPixels/s and hypothetical 7200MTexels/s for RV530 sounds too big to me, as does the latter sound too much in relation to R520's expected </=10000MTexels/s, and I'm just borrowing texel fill-rates as a comparative measure here (albeit not an ideal paradigm).
 
More texture units per shader ALU make no sense right now and even less in the future. Less texture units per shader ALU make a lot of sense.

The near future is likely to be less ROPs and texture units per shader ALU. Both pure fillrate and multitextured fillrate are becoming less important to the point that 3DMark01 or any other pure fillrate benchmark are already meaningless. We already have examples of such GPUs, expect more in the future.
 
If RV530 is 8 fragment shader pipelines and only 4 TMUs, what is 22.4GB/s of bandwidth going to be doing?

If RV530 is 4 fragment shaders pipelines and 8 TMUs, boy those are some extreme extreme fragment shader pipelines!

Jawed
 
Ailuros said:
Xmas' input helped a bit speculating any further, but I think we're still scratching around in the dark.

The information is in plain sight in this thread. If you use common sense, knowledge of what ATI have publically said about ALU:TEX ratio in next-gen hardware, given where shader-happy games are going, it's blindingly obvious.

RV530 has 12 ALUs available for shader ops, but only one fragment quad of processors with a texture sampler each, IMHO. Not 2 quads and 4 TMUs (?!?!, not sure where Jawed is pulling that from), nor 1 quad and 8 TMUs (?!?! again, given the ALU:TEX ratio comments in publically available presentations). 4 fragment units, each with: 1 texture sampler, 3 fragment ALUs, 2 ROP engines. 4-1-3-2.

Same thing with R580. 16 fragment units, each with: 1 texture sampler, 3 fragment ALUs, 1 ROP engine. 16-1-3-1.
 
Last edited by a moderator:
Rys said:
Same thing with R580. 16 fragment units, 1 texture sampler, 3 fragment ALUs per fragment unit, 1 ROP engine per fragment unit. 16-1-3-1.

Does that mean ATI have switched from a vec-3 + scalar arrangement to vec-4 ALU's for pixel processing?

If R520 has only a single ALU per pipe wouldn't it be at a disadvantage in shader power per clock to R420/480 with two Vec-3 + Scalar pairs per pipe, not to mention G70 with two vec-4's?
 
Shogun said:
Does that mean ATI have switched from a vec-3 + scalar arrangement to vec-4 ALU's for pixel processing?

If R520 has only a single ALU per pipe wouldn't it be at a disadvantage in shader power per clock to R420/480 with two Vec-3 + Scalar pairs per pipe, not to mention G70 with two vec-4's?

The ALU is, presumably, the same vec3+scaler pair (plus texture sampler). The ALU config isn't changed, I'd bet, it's just beefier in terms of caps. 4D vectors should still be the order of the day for fragment processing, in R5 and RV5.
 
Unfortunatly, Rys, the concpet falls down a little saying that its just one fragement quad as that would indicate that the ALU's are deep, i.e. operating in a single pixel with multiple instructions, which would go against the idea of "12 pipelines". I know where you are coming from, but in this case it more likely that the "pipelines" are no longer single quad pipelines, but pipelines that can handle multiple simultaneous quads.
 
Dave Baumann said:
Unfortunatly, Rys, the concpet falls down a little saying that its just one fragement quad as that would indicate that the ALU's are deep, i.e. operating in a single pixel with multiple instructions, which would go against the idea of "12 pipelines". I know where you are coming from, but in this case it more likely that the "pipelines" are no longer single quad pipelines, but pipelines that can handle multiple simultaneous quads.

I had thought about new fragment dispatch and scheduling, but I didn't think that'd make it in there. If that's the case, and RV530 and R580 can push more fragments down the pipeline, then all the better for those parts. Even more so if it can fall back and use the ALUs to also process a single pixel.
 
If Xenon has anything to say about the future of GPU architectures ATI isn't going to follow the 'more ALUs in cascade' path. Personally I don't think it makes sense more than 2 ALUs in cascade either, in fact I doubt it's better 2 ALUs in cascade that twice the number of shader ALUs available for different fragment threads. For performance reasons if all fragments are independant and there are thousands of fragments roaming around ALUs in cascade make little sense. However I think there are small implementation benefits, like less read/write ports or less state per shader scheduler that may make the choice more difficult.

What if the R5xx shader processor is the same architecture than Xenon shader processor? The more vector-like architecture introduced for Xenon and unified shading is independant. You can still put a few old style vertex shaders on the top of the geometric pipeline and only feed fragments to the lower shader processor.

BTW, 16 x 3 is 48 which is a number that I have seen somewhere ... What transistor difference could we expect from the R520 to the R580?

If the length/number of ALUs of the vector-like shader processor could be parametrizable R520 could be only one processor/scheduler with 16 ALUs, RV530 3 processors with 4 ALUs (what current GPUs seem to implement), RV515 1 processor with 4 ALUs and R580 three processors/scheduler with 16 ALUs each for a total of 48. But I doubt they mean that, and in any case it doesn't explain the other two numbers.
 
Back
Top