AMD: R7xx Speculation

Status
Not open for further replies.
True raw SP counts dont mean anything, but certain with all the rumors swirling that it was just 480 SP's, pulling 800 out of the hat must certainly be a nice little surprise to observers and rivals

I don't believe ATi will adopt CUDA and PhysX, I don't even believe what Fudzilla says about nVidia offering those technologies to ATi. A few weeks ago I spoke to an nVidia PR guy and he told me CUDA and PhysX are two things their cards will have and the competition won't, giving nVidia a clear advantage. Now there's the question - would it be better for them if they risked keeping it for themselves, or will they play safe and let the competition support it as well so that no developers will be afraid of using it?

Well discrete GPU's currently see about a 3 to 1 ratio of Nvidia cards to ATI cards in market share. Given the current trend of game companies going for the "safe" route - that is, console games then port to PC, I'd say game companies are more likely to go the safe route and offer the same features to all cards. A 3 to 1 ratio is a big advantage to Nvidia but even so, it's still 25% of the market you'd alienate... and if DX10.1 is any indication, developers are wary
 
:LOL: I went to bed last night thinking "if it's really 800:32, that's amazing."

I wake up and find it's definitely 800 and extremely likely to be 32 and I'm gobsmacked :LOL:

I'd already concluded, based upon RV635->RV670 scaling where 60% extra die-space delivers 2-3x performance, that the scalability of this architecture is good - but this is outrageous.

We're looking at ~1.1 billion transistors for RV770?

With that much ALU it seems inevitable that it'll generally be TU:RBE limited. Crysis performance on a single RV770 is going to be an eye-opener because HD3870X2 scales really badly.

Jawed
 
Then again, you still don't know if a G8x/G9x SP is as powerful as a G200 SP or not, so that raw count is meaningless...
I guess the increased ALU:TEX ratio + prodigal MUL will deliver ~2.5x G92 performance per clock, perhaps more, when G92 is shader limited (I don't think the lowered bilinear will have a large impact).

The issue is that G92 is essentially bandwidth bound. So it would seem that GT200, with 2x more bandwidth, is going to be generally 2x faster.

This is still far ahead of RV770, which at best will be bandwidth bound at around 88% of GT200.

Jawed
 
As R6xx was obviously texture and z-fillrate bound; now that the math processing power has increased to 2.5x what it was, shouldn't the texture fillrate have increased more than three times? I think it only increased less than 2 times as the texture units doubled and their clockrate actually lowered.
 
I guess the increased ALU:TEX ratio + prodigal MUL will deliver ~2.5x G92 performance per clock, perhaps more, when G92 is shader limited (I don't think the lowered bilinear will have a large impact).

The issue is that G92 is essentially bandwidth bound. So it would seem that GT200, with 2x more bandwidth, is going to be generally 2x faster.

This is still far ahead of RV770, which at best will be bandwidth bound at around 88% of GT200.

Jawed

In most real-world cases, i've found that a G92-based 9800 GTX is about 10~15% faster than a G92-based 8800 GT, despite having a little over 20% more memory bandwidth, 16 extra SP's, 8 more TMU's, etc.
That performance delta is almost entirely diluted when we compare another G92-based card, 8800 GTS 512MB, to 9800 GTX.

So, i still can see a 55nm G92 (perhaps with a cheaper version of the 8800 GTS 512MB or 8800 GT PCB) battling at least the HD4850 without much issue.
The HD4870 is another matter. It has a significant edge in both clockspeed and bandwidth, an issue i don't see G92b addressing anytime soon (and any better than G92, BTW).
 
As R6xx was obviously texture and z-fillrate bound; now that the math processing power has increased to 2.5x what it was, shouldn't the texture fillrate have increased more than three times? I think it only increased less than 2 times as the texture units doubled and their clockrate actually lowered.
Yeah, if you assume that RV670, in DX9 games, was performing as though it had around 50GB/s then RV770 with 32 TUs and presumably 16 RBEs with 4xZ and ~123GB/s is really looking at least as bottlenecked as RV670. i.e. no change there.

At least HD4850 is going to be bandwidth-bound :smile:

I say "in DX9 games" because there's still every chance that with advanced D3D10(.1) code the bandwidth usage of RV670 is much higher, justifying the spec. But that code still seems to be a long way off...

Jawed
 
Yeah, if you assume that RV670, in DX9 games, was performing as though it had around 50GB/s then RV770 with 32 TUs and presumably 16 RBEs with 4xZ and ~123GB/s is really looking at least as bottlenecked as RV670. i.e. no change there.

At least HD4850 is going to be bandwidth-bound :smile:

I say "in DX9 games" because there's still every chance that with advanced D3D10(.1) code the bandwidth usage of RV670 is much higher, justifying the spec. But that code still seems to be a long way off...

Jawed

Which all means that slide could be fake and just fabricated by someone who saw the Computex clocks we've seen screenshots of.
 
In most real-world cases, i've found that a G92-based 9800 GTX is about 10~15% faster than a G92-based 8800 GT, despite having a little over 20% more memory bandwidth, 16 extra SP's, 8 more TMU's, etc.
That performance delta is almost entirely diluted when we compare another G92-based card, 8800 GTS 512MB, to 9800 GTX.
So what you're saying is that 9800GTX is running out of scalability? 20% more bandwidth, 29% more shader throughput, results in 10-15% performance gain.

So, i still can see a 55nm G92 (perhaps with a cheaper version of the 8800 GTS 512MB or 8800 GT PCB) battling at least the HD4850 without much issue.
The HD4870 is another matter. It has a significant edge in both clockspeed and bandwidth, an issue i don't see G92b addressing anytime soon (and any better than G92, BTW).
If 9900GTX/G92b is equipped with upto 1300MHz GDDR3, <83.2GB/s it should easily see off HD4850, with <64GB/s.

HD4870 is clocked only 20% higher than HD4850, so it's really a matter of how bandwidth bound HD4850 is. If HD4870 comes out 30% faster then that would make HD4870 and 9900GTX/G92b as basically equal. Except that the scalability of G92, based on 9800GTX numbers, is a bit poor.

EDIT: These numbers:

http://www.techreport.com/articles.x/14524/5

imply that G92 is fillrate bound (9900GTX is 12.5% faster than 8800GT) if we take your 10-15% at face value. But, in that review, for example, Call of Duty is 24% faster at 1920 16xAF/4xAA.

Jawed
 
Er...

amdhd4000finaldocumentch5.jpg
 
Yeah, maybe if have to die next week anyway, but then i wouldnt spend my time in this forum. I cant tell u how i know, but i can assure u that this one is 100% a fake. Just trust me w/ this one.
 
Status
Not open for further replies.
Back
Top