AMD: R7xx Speculation

Status
Not open for further replies.
But a 16 TMU/480 SP part might actually explain it.. such part would just see a small boost based on clock and more shader strength, maybe fixed ROPS, maybe 10-30%, maybe then two on RV770 get you to 50%.
If RV770 has 4x Z per clock with 16 TUs and 400 SPs I expect the increase in Z performance alone will make a massive difference. That's because 4x MSAA is the default benchtest these days (as it should be) and Z also affects shadow rendering and Z prepass performance.

So, RV770 will mostly catch-up with G92 and GT200 will be ~2x G92 performance and AMD will be trying to use 2xRV770 to match GT200.

Edit: The above speculation would also fit in with the rumored small die sizes of Rv770,
I think that's the killer-punch - there's no magic required to explain the rumoured small increase in die size. Increasing the Z rate is prolly quite a substantial task.

as well as the idea that AMD is anbandoning single high end chips in favor of multi-GPU configurations at the high end (perhaps AMD figures a small, cleaned up, shader beefed up, faster clocked, 16 TMU chip is just fine since it's principle high end use would be as a building block for multi-gpu anyway)..which of course would be a horrible road considering the drawbacks those suffer.. but I wouldn't be surprised because it's AMD...
I also think it's a horrible road and the current state of ATI drivers for D3D10:

http://enthusiast.hardocp.com/article.html?art=MTQ4MCw1LCxoZW50aHVzaWFzdA

is shocking - though it has to be said even a single RV670 is an abortion in D3D10:

http://enthusiast.hardocp.com/article.html?art=MTQ4MCw2LCxoZW50aHVzaWFzdA

Jawed
 
IMHO this does not cope well with the rumored transistor count and die size of RV 770 (830 Million - 230-240 mm^2), in this case we will have more than 150 million transistors only for obtaining 4z-sample per clock in ROPs and 16 more 5-way ALUs only.
It's just a guesstimate, but I reckon 48M transistors for an extra 16-wide SIMD.

Extra Z rate is costly because it affects a lot of hardware:
  • rasteriser generating Zs
  • hierarchical-Z updates generated by rasteriser + responding to Z queries by rasteriser and RBE
  • Z buffer cache
  • Z compression encoding/decoding
  • compression tag tables
  • bandwidth twixt shaders and RBEs (may not need increasing)
OK, there could be other architectural improvements (i.e. more cache, and so on) but IMO it makes more sense from a preformance perspectivw to have more texture power along with more shader power. 32 TU seems a bit too high, however, so IMHO we could see a 4 SIMD GPU with 20 5-Way SP per SIMD; and 20 TU. This IMO has a bit more sense, because we have a slight dynamic branching penalty compared to RV670, but a decise improvement in all other scenarios.
What do you think about this?
I'm basing my current position on the expected bandwidth of RV770, ~70GB/s. I think a doubling of Z rate is most likely to fit within the rumoured die-size and that an increase in TU count wouldn't be able to make use of that bandwidth properly without also increasing the ALU count by at least as much.

32 TUs just seems like magic. 24 seemed feasible for a while, requiring 4 ALU SIMDs each 24-wide (I reckon about 96M extra transistors for just the ALUs). That would come to ~950M transistors I reckon.

Refresh rumours always seem to suffer from wild inflation. 400/16/doubled-Z with extra stuff to make X2 work better sounds sensible to me. Its performance won't be very exciting and the X2's drivers will take months and months to become shipshape.

Jawed
 
I fell on an interesting post in XS :

' Based on discussions with people in Taiwan, RV770 seems to employ 5:1 ratio and 160 shaders. Not that this is a fact, but it does look a lot more believable than anything else I've heard. '

//Andreas
 
If RV770 has 4x Z per clock with 16 TUs and 400 SPs I expect the increase in Z performance alone will make a massive difference. That's because 4x MSAA is the default benchtest these days (as it should be) and Z also affects shadow rendering and Z prepass performance.

So, RV770 will mostly catch-up with G92 and GT200 will be ~2x G92 performance and AMD will be trying to use 2xRV770 to match GT200.


I think that's the killer-punch - there's no magic required to explain the rumoured small increase in die size. Increasing the Z rate is prolly quite a substantial task.

If that's the case it's just terrible, AMD will have the same number of TMU's for three generations now and be 500% behind their enemy. It's passed ridiculous now and entered into clinical insanity.

As far as I'm concerned RV670 is entirely TMU limited so simply increasing to 32 doubles your performance (especially in lieu of increased shaders to 480 as well which apparantly dont take much die). I cant imagine 16 more TMU's takes THAT much more die, especially compared to how much larger GT200 will likely be.

And competing with a dual chip is foolhardy, multi-GPU is still pretty much a disaster (and getting worse according to Carmack's recent comments about how games are programmed to make SLI more and more difficult going forward). And what's to stop Nvidia from simply slapping two of their vastly superior single chips on a die then anyway? Maybe heat and power at first, but not forever. It's the same as why Intel defeats defeats AMD, if number of cores is same it comes down to which one is much faster per core.

But if things are as we speculate, one GT200 will crush two RV770 (given again the horrible scaling of even two multi-GPU) anyway so Nvidia wouldn't even need to bother. Banking on multi-GPU will be a horribly inefficient path (as far as power/performance per transistor/heat/expense and every other relevant metric) in every way, and just cause AMD to be even worse off than they could have been.
 
Last edited by a moderator:
I fell on an interesting post in XS :

' Based on discussions with people in Taiwan, RV770 seems to employ 5:1 ratio and 160 shaders. Not that this is a fact, but it does look a lot more believable than anything else I've heard. '

//Andreas

So...32 TU? 800 "sp?"

Sounds kind of monster, but if they can manage that seems they wont need any help against G200.

Those seem to be one of the several commonly qouted sets of specs. We'll just have to wait I guess.
 
So...32 TU? 800 "sp?"

Sounds kind of monster, but if they can manage that seems they wont need any help against G200.

Those seem to be one of the several commonly qouted sets of specs. We'll just have to wait I guess.

Whats half of 32 TU's and 800sp's(160)? You get 16 TU's and 400sp's(80). Could it just be that this 32 TU/800sp monster is R700?
 
If RV770 has a 80SP(400SP);16 TMUs, this card will be a big fault and couldn't compete with
NV GT200 based GPUs from Performance segment a.k.a D10-30 (9900GT) etc.
 
Who knows how fast these cards can go, once the AA-thingie gets fixed? In some titles (without AA), they are already quite competitive.

Plus, AMD wouldn't want to compete GT200 with a single GPU. And plus, if these specs are true, they'll be pretty small and thus cheap to manufacture.
 
So...32 TU? 800 "sp?"

Sounds kind of monster, but if they can manage that seems they wont need any help against G200.

Those seem to be one of the several commonly qouted sets of specs. We'll just have to wait I guess.

If RV770 is supposedly 32TMUs/800SPs then R7x0 would be 64TMUs/1600SPs right?

RV770 is rumoured to have an estimated die size of ~240-250mm^2 @55nm; don't you think it'll be fairly impossible to squeeze as many units/transistors into only that much die space?

The 5 clusters * 16 ALUs = 80 ALUs scenario seems to make most sense of all so far suggestions I've heard.

I'm wondering if they ever thought of using fast-trilinear TMUs instead of plain bilinear. Granted they should be a tad more expensive than bi-TMUs, but at least they'd get single cycle trilinear which would speed up AF performance quite a bit IMHLO.
 
Who knows how fast these cards can go, once the AA-thingie gets fixed? In some titles (without AA), they are already quite competitive.
I agree and since most new games have shadowing I expect to see shadowing get a big boost from enhanced Z fillrate. So there'll be a double-whammy there.

Jawed
 
another 16 wide SIMD, 50mhz faster core clock?, more z cache, double the z fill and AA fill seems to point perfectly to Fudo's 50% faster figure, and a not so big die size increase.

EDIT: Should beat a 8800GT out and be around the GTS.
 
Depends where the limiting factors in each application lies. If you count 80 ALUs with just 850MHz you get already over 700GFLOPs/s. At 900MHz it's already close to twice the rate of a 3870.
 
similar to 2900XT

http://forums.vr-zone.com/showthread.php?t=261753

20080412_99a352fa5cdc00c12ef59TliiiYtmh6h.jpg


20080412_c081e23a80c02f4d6fdacVhZElGKEgGc.jpg
 
It's just a guesstimate, but I reckon 48M transistors for an extra 16-wide SIMD.

Are you talking per core? cause a 16 wide FP32 SIMD unit shouldn't take more than million tranies no matter how you are going to do it. RF will take roughly 4k transistors per vector register file.
 
Well, that's how big the die is going to be. I wonder how much bigger that is over RV670?

Judging by the die area on the base and the little heatsink in there, this doesn't look very exciting. Pretty small. RV770 is undoubtedly designed for an X2 card. The cooler looks similar to what came with my 3850.


card1.jpg

cooler1.jpg
 
Are you talking per core? cause a 16 wide FP32 SIMD unit shouldn't take more than million tranies no matter how you are going to do it. RF will take roughly 4k transistors per vector register file.

He is talking about a 5th SIMD array... I think, I can never be so sure with you guys. :eek:
 
Last edited by a moderator:
Judging by the die area on the base and the little heatsink in there, this doesn't look very exciting. Pretty small. RV770 is undoubtedly designed for an X2 card. The cooler reminds me of what came with my 3850.

It looks about has big as a 25 cent piece. Maybe a little smaller. Either way, it's little hard to judge by just plain looking at it. If I'm right, that's a pretty good increase over RV670. A FYI, the reference cooler for the HD 3850 and the one for the HD 3870 don't look like they can hold a candle to this one. Well, the HD 3850 is supposed to have single slot, so that is not really a fair comparison. The one on RV770 looks to be near dead related to the one found on R600 which cools substantially better than the one on the HD 3870(although with some good noise lol). So it's kind of interesting that they are bringing it back for the RV770. May speak something for performance.
 
Status
Not open for further replies.
Back
Top