AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
I'm going to have to find those estimates you made on diesize/percentages for the different types of units in R770.

My rough speculation was a 1600SP(32spx10c) part w/ 32ROPs, 80TMUs, 256bit bus w/ 6.3ghz GDDR5(200Gbps) with the at least 650mhz to make 2Tflops. Fitting into a die size around RV770, ~250mm2 w/ ~1.3b trannies.


But I don't expect them to stay at 10 SIMD arrays, though I do expect them to increase the amount of ALU's per array. I was thinking more along the lines of a 8:1 ALU:TEX ratio with 20 arrays. Yes, that is a considerable amount (640 alu's vs 160, and 120 tu's vs 40), but even with DX11 and 32 RBE's, I figure less than 400mm2 on 40nm. The way I see it, the ALU:TEX ratio every generation is going to continue a growing disparity. Perhaps 16:1 by RV970.
 
I hope AMD's wary of going higher than 4:1 as without a radical change in architecture, >64-way divergence penalty is going to be nasty.

I think there was some recent discussion of the "oddball" datastructures that underlie any kind of tessellation, i.e. patches will have some irregular number of points/vertices and what does that mean for batching? Patches are allowed to have upto 32 vertices. I don't know how this compares with the handling of vertex strips etc. :???:

Jawed
 
As long as they share the same topology they can properly put together more than one patch per batch.
 
But I don't expect them to stay at 10 SIMD arrays, though I do expect them to increase the amount of ALU's per array. I was thinking more along the lines of a 8:1 ALU:TEX ratio with 20 arrays. Yes, that is a considerable amount (640 alu's vs 160, and 120 tu's vs 40), but even with DX11 and 32 RBE's, I figure less than 400mm2 on 40nm. The way I see it, the ALU:TEX ratio every generation is going to continue a growing disparity. Perhaps 16:1 by RV970.
With a size like that they would probably be increasing the bus to 384bit and go with 1.5Gb of GDDR5 without needing +6ghz.

Doubling the SPs per cluster and doubling the clusters seems insane.
Maybe double the SPs per cluster and increase the clusters up to 12, 1920SPs, but doing both leaves a huge, theorectical, gap between RV770/790 and RV870.

Why would they want to go over 300 mm2 this time?

I agree, at the very least I expect them to stay around 300mm2.
 
but I thought I remembered one of your posts breaking down the different units even more...
Maybe it's safest to do this in terms of area. The 10 clusters of RV770 are 40.7% of the die. RV740 has 8 clusters of nominally the same design (missing double-precision, added burst-fetch, I think).

The real die size of RV770 is 264mm², in comparison with 137mm² for RV740. So RV740 is 51.9% of RV770's area, with 80% of the clusters, 100% of the RBEs (seemingly) and 50% of the memory channels. What scaling factor to use for 40nm?

A straight 53% scaling for 55-40nm implies that the 8 clusters are 0.8 * 0.53 = 42.4% of the size of the 10 clusters in RV770, i.e. 42.4% of 104mm² = 44mm², which is 32% of RV740. That leaves 93mm² for MCs/RBEs/L2s (assuming everything else is negligible).

If RV870 has 2x RV740's MCs/RBEs/L2s (256-bit bus, 32 RBEs), that's 186mm². So, ahem, with 110mm² for clusters, you could fit in a total of 20 clusters :LOL: making a die of 296mm² :LOL:

These things never work, look what happened with RV770 :p

With a bit of luck we'll get a die photo for RV740...

Jawed
 
Maybe it's safest to do this in terms of area. The 10 clusters of RV770 are 40.7% of the die. RV740 has 8 clusters of nominally the same design (missing double-precision, added burst-fetch, I think).

The real die size of RV770 is 264mm², in comparison with 137mm² for RV740. So RV740 is 51.9% of RV770's area, with 80% of the clusters, 100% of the RBEs (seemingly) and 50% of the memory channels. What scaling factor to use for 40nm?

A straight 53% scaling for 55-40nm implies that the 8 clusters are 0.8 * 0.53 = 42.4% of the size of the 10 clusters in RV770, i.e. 42.4% of 104mm² = 44mm², which is 32% of RV740. That leaves 93mm² for MCs/RBEs/L2s (assuming everything else is negligible).

If RV870 has 2x RV740's MCs/RBEs/L2s (256-bit bus, 32 RBEs), that's 186mm². So, ahem, with 110mm² for clusters, you could fit in a total of 20 clusters :LOL: making a die of 296mm² :LOL:

These things never work, look what happened with RV770 :p

With a bit of luck we'll get a die photo for RV740...

Jawed

Thanks, I appreciate it.
Wasn't expecting you to do it again but I went through your posts up until Jan and couldn't find the one I was specifically looking for, at least what I thought I was looking for.

So my diesize estimate looks to be a bit off but like you said you never know.
 
Makes sense for their usual x2 philosophy I suppose (doubling RV740). Personally with the exception of the bus, and RBE's, I would rather them take it even further by tripling the Shaders and TU's over RV770 (not RV740). More is at stake here this next go around of graphics wars, especially in the GPGPU front where I think the real battle is going to be at.
 
Maybe it's safest to do this in terms of area. The 10 clusters of RV770 are 40.7% of the die. RV740 has 8 clusters of nominally the same design (missing double-precision, added burst-fetch, I think).

The real die size of RV770 is 264mm², in comparison with 137mm² for RV740. So RV740 is 51.9% of RV770's area, with 80% of the clusters, 100% of the RBEs (seemingly) and 50% of the memory channels. What scaling factor to use for 40nm?

A straight 53% scaling for 55-40nm implies that the 8 clusters are 0.8 * 0.53 = 42.4% of the size of the 10 clusters in RV770, i.e. 42.4% of 104mm² = 44mm², which is 32% of RV740. That leaves 93mm² for MCs/RBEs/L2s (assuming everything else is negligible).

If RV870 has 2x RV740's MCs/RBEs/L2s (256-bit bus, 32 RBEs), that's 186mm². So, ahem, with 110mm² for clusters, you could fit in a total of 20 clusters :LOL: making a die of 296mm² :LOL:

These things never work, look what happened with RV770 :p

With a bit of luck we'll get a die photo for RV740...

Jawed
Did you include spare die for DX11 compliance?
 
Did you include spare die for DX11 compliance?
Nope, nor PCI Express, nor UVD, nor control processor. It seems to me that "best case", 2x RV770 in everything except memory bus, would be pretty big.

Do you have any ideas on what aspects of D3D11 will cost a noticeable amount of die space?

Jawed
 
I don't think RV870 will be "2× RV770", AMD is doing the sweet spot strategy again and I think that will be somewhere between 20 and 50 percent above RV770.
 
20% is too little. Already RV790 is at least 10% faster. I think 40-45% over RV790 is possible with 256bit bus and todays GDDR5. Maybe over 50% with faster VRAM. There are 2 questins:

1. Can we expect significantly faster GDDR5 modules for R8xx launch?
2. We speculate on 300mm2 GPU, which is highly dependant on availability of ultra-fast GDDR5 modules, which aren't still available (I think). Additional 100mm2 would bring the possibility of 512bit bus implementation. Is 100mm2 really too much?
 
Well they just did a 150% increase in alu's with their sweet spot strategy with RV770 while staying on the same process. I expect big things from AMD this time. And like Jawed said, 20 simd's is a distinct possibility. I am very sure that they won't increase the number of alu's per simd. That would mean twice the branch granularity which is bad news, whichever way you slice it.
 
1. Can we expect significantly faster GDDR5 modules for R8xx launch?
Q first announced GDDR5 samples back in Nov '07 to customers. 4870 released in June '08.
Hynix announced 7Gbps GDDR5 in Nov '08. RV870 is slated for Q3 '09.

Trying to download Hynix's 2008 Financial Report but it... just finished. (good timing I guess)

Edit- Report doesn't mention any specific, or implied, timelines other than the year. 54nm production to increase and mature throughout 2008/2009.
 
Last edited by a moderator:
LordEC911: Thanks.
rpg.314: Wouldn't it be sufficient to increase ALU:TEX to e.g. 6:1? Anyway, I wouldn't be surprised even if the ratio will stay unchanged. ALU:TEX on RV770 was in fact a bit higher than on RV670/R600 due tu the weaker texturing units (half-speed FP16 and removing of sampling units).
 
I don't think RV870 will be "2× RV770", AMD is doing the sweet spot strategy again and I think that will be somewhere between 20 and 50 percent above RV770.
I don't think that RV770 was "a sweet spot" strategy, more like a "put everything we can in a die less than 300mm^2" strategy.
The strategy for RV870 will probably turn out to be the same but the question is -- what die size will they set as a limit this time?
I think that +50% RV770 math power is a given and I hope that it'll be closer to +100%.
 
If the CrossFire-on-a-stick strategy is still in the ATi's mind for a flagship SKU, then I think some considerations will be in favour for sub-300 mm² design.
Speaking of that, what about the side-port thingy? It is obvious by now, that the added bandwidth to the existing bridge interconnection is just not sufficient to bother about, so why not just use the extra [strike]PCIe[/strike] port for bridge-less X2 setup in a kind of master-slave configuration, and save some pennies by ditching the "third wheel"!
 
Back
Top