AMD: R7xx Speculation

Status
Not open for further replies.
ROP count isn't tied to the MIMD width of the ALU array in this case.

Point taken, but I wonder what that number will be. Wouldn't 32 be overkill and 16 seem limited?


On a side note, I wonder what marketing point they'll go after. 1.25TF per card (~780mhz)? A 25% increase in peak shader power over the 3870x2 per chip (825mhz)?

I know flops arn't everything, but it just sounds absurd in contrast considering the die size differance.
 
Last edited by a moderator:
I don't think you've understood it correctly.

That's (4+1) * 32 * 5 arrays = 800 stream processors.
HD3870/HD3850 is (4+1) * 64 = 320 stream processors.

Thanks, but my point was that outcome of RV770 huge 800 stream processor number does not mean it is 6.25 times faster then Nvidia G80 or G92.

Edit: I was also wondering what would be more efficient 4 arrays or 5 arrays ??
4 arrays of 40 shaders totaling 160 shaders, = (4+1) * 40 * 4 arrays = 800 stream processors.
5 arrays of 32 shaders totaling 160 shaders, = (4+1) * 32 * 5 arrays = 800 stream processors.
 
Last edited by a moderator:
Edit: I was also wondering what would be more efficient 4 arrays or 5 arrays ??
4 arrays of 40 shaders totaling 160 shaders, = (4+1) * 40 * 4 arrays = 800 stream processors.
5 arrays of 32 shaders totaling 160 shaders, = (4+1) * 32 * 5 arrays = 800 stream processors.
More arrays should be more efficient, but more complex to implement. More units per array means your branching granularity increases. With 40 units per array, given how the texture units are connected, you'd also need 40 texture units, not 32.
That said, I don't believe this rumor for a second...
 
Hmmmm it does seem to be another case where adding more shader power wasn't that expensive. 800 SP's are going to put up some serious numbers in theoretical tests. Hopefully the TMU array has enough muster to feed the beast.

ATi seems to like round numbers so I vote for either 16 or 32 ROPs on RV770. I figure they're going to continue with the better pixels > more pixels philosophy and stick to 16 ROPs.
 
Maybe,"R700" has 800 SPs, which means RV770 will have 400, which seem more likely, if there also other changes and the die only grows to 250mm².

VR-Zone's rumours are from Chiphell's thread drawing info from Cho (from PCInLife I suppose, and same guy on B3D?)

A 5*32 cluster can't be for the X2.

Also:
400 would be for such a supposedly post-momentum-revival architecture, very utterly dull.

The game where they need ALUs the most happens to be that game ;). So they will try to pile up AMAP in a best-compromise matter like R580 even when it doesn't supposedly help too much.

OTOH the TMUs should be specced instead with workload in mind (while ATI shot much lower with R600 I believe 32 TMUs and a much better Z is all that is really needed) compared to nVidia's G80 (Edit: G92/94) safety net.
 
Last edited by a moderator:
I'm correcting a clear error in peoples thinking on the current architecture that appears to proliferated through quite a number of in this thread.
 
I'm gritting my teeth, as I think this rumour is one of those dreams, but 160:32 is a 5:1 ALU:TEX ratio :LOL:

Jawed
 
I'm correcting a clear error in peoples thinking on the current architecture that appears to proliferated through quite a number of in this thread.

What gave you that impression? I think most people acknowledge that R600's ALU structure is essentially 5 independent scalars. However, they also acknowledge that getting good throughput hinges on the compilers ability to pack independent instructions into those five slots which may not be easy or even feasible in many situations.

And for what it's worth I interpreted (4+1) as 4 mini + 1 uber ALU. Not a vec4 + scalar. You guys are really sensitive :LOL:
 
No its not. Its 5 scalars for all R6xx products. The only Unified core so far that was 4+1 was Xenos.


Only "so far" ? ;)

Anyway if it´s 5 arrays of 32 shaders processors they could perfectly be more decoupled ALUs, for example being 2 escalars by shader processor, and this way they would approach G80 ALU organization.
 
Last edited by a moderator:
Why is that relevant?
Because if all 5 units had the same capabilities the compiling/scheduling would be easier.

Could it be that RV770 is using something more like Geforce8800 style scalar setup?
32 * 5 * (1 special +1 madd) = 320 units
 
Because if all 5 units had the same capabilities the compiling/scheduling would be easier.
This is moronic. There's no point in making all five support transcendental functions, for a start. It would hugely increase the size while bringing no appreciable performance benefit.

Could it be that RV770 is using something more like Geforce8800 style scalar setup?
32 * 5 * (1 special +1 madd) = 320 units
In case you haven't noticed, G80 ALUs are not scalar - compilation is complicated by having to schedule a co-issue across two units.

Jawed
 
Status
Not open for further replies.
Back
Top