AMD: R7xx Speculation

Status
Not open for further replies.
While R580 got a fair boost from adding ALUs they were also math bound at the time. I'm not sure I'd go as far as calling RV670 limited by shader power.

They actually got a terribly weak boost. I remember being all excited that R580 was going to have 3x the shader pipes, and arguing with guys on [H] that it was really going to have 48 shader pipes (this made it sounds really fast because the 7800GTX was 24 pipes). Then it came out and it was like 10% faster than R520.
 
God knows where you're getting 10% from. Day zero reviews showed huge gains depending on the game and other bottlenecks. If you saw 10%, it wasn't representative of the overall picture. Benchmark them now on modern games and R520 would look very slow indeed comparatively speaking.
 
Another thing that has been bothering me is this "terascale" term that people seem to be basing the "must be 1 teraflop" performance rumours on.

The term is tera SCALE. If we take crossfire into the equation then the terascale makes sense even if the RV770 itself can't hit 1 teraflop (just add more cards - hence the scale part).

Also, if we assume the RV670 is heavily TMU bound, like the R520 was ALU bound, it makes sense that only have 160 more "stream processors" than the previous generation increase performance 50+ percent if the primary bottleneck was TMU's, and that they have have doubled (assuming that is true of course), simple because the 320sp's in RV670 where being held back by lack of TMU power.

Something like this perhaps...

Currently we assume it's the following

RV670: 16TMU's/320sp

TMU ratio: [____]
ALU ratio: {________________}

RV770: 32TMU's/800sp

TMU ratio: [________]
ALU ratio: {________________________________________}

This fit in with ATi's "we must increase ALU to TMU ratio at all costs" mantra.

It does not fit with performance figures which suggest a min 50%.
If indeed the RV670 was TMU bound by such a margin that adding an extra 16 TMU's would have made performance parity with the shader array* (excluding other factors), then RV770 must be atleast %100 faster.

*I know I'm not taking into account ROP limited/ Z-fillrate limited/Bandwidth limited scenario's here.

So lets take other rumours into consideration.

If it is only 480 sp at equal clockspeeds, how can we acheive 50% or greater performance.

Perhaps:

RV670:
TMU ratio: [____]
ALU ratio: {________________}

RV770 : 24 TMU's/480sp
TMU ratio: [______]
ALU ratio: {________________________}


*There is 1 underscore per quad of TMU's - ALu's are based on 4:1 ratio.

So if it's 24 TMU's and RV670 was TMU limited but such a degree that giving 50% more TMU power gives a linear increase in overall performance, maybe RV770 isn't 800sp.

Please note that I am insane.

thanks.
 
God knows where you're getting 10% from. Day zero reviews showed huge gains depending on the game and other bottlenecks. If you saw 10%, it wasn't representative of the overall picture. Benchmark them now on modern games and R520 would look very slow indeed comparatively speaking.

It increased somewhat over time, I'm sure as games became more shader intensive, but it simply wasn't much faster.

When you say "48" pipes back then you assume twice as fast. Because prior to that chip, (and always with Nvidia chips) the rest of the parts on a GPU had always scaled acordingly such that doubling shaders gave you double speed.

With R580 you got nothing of the sort.

http://techreport.com/articles.x/9310/1 Here is review. First one I found in google, so I didn't cherry pick it..
 
If you make assumptions about performance based on a single number, you almost deserve to be disappointed. R580 showed that it was possible to scale a chip out differently in terms of performance while sticking to the same basic architecture, and thank goodness for that. Again, 10% wasn't representative of the overall picture on launch day. Go check out all the old reviews, it's plain as day.
 
Another thing that has been bothering me is this "terascale" term that people seem to be basing the "must be 1 teraflop" performance rumours on.

The term is tera SCALE. If we take crossfire into the equation then the terascale makes sense even if the RV770 itself can't hit 1 teraflop (just add more cards - hence the scale part).

because the 3870x2 already more than 1 teraflop?
 
http://techreport.com/articles.x/9310/1 Here is review. First one I found in google, so I didn't cherry pick it.. According to this review, 10% might have been kind.

Well quickly checking the Half-Life 2 results (I chose HL2 because it's my favorite game, hopefully not cherry picking :D) from that review shows the X1900XT 30% faster than the X1800XT at the highest settings.
 
Another thing that has been bothering me is this "terascale" term that people seem to be basing the "must be 1 teraflop" performance rumours on.

The term is tera SCALE. If we take crossfire into the equation then the terascale makes sense even if the RV770 itself can't hit 1 teraflop (just add more cards - hence the scale part).

Also, if we assume the RV670 is heavily TMU bound, like the R520 was ALU bound, it makes sense that only have 160 more "stream processors" than the previous generation increase performance 50+ percent if the primary bottleneck was TMU's, and that they have have doubled (assuming that is true of course), simple because the 320sp's in RV670 where being held back by lack of TMU power.

Something like this perhaps...

Currently we assume it's the following

RV670: 16TMU's/320sp

TMU ratio: [____]
ALU ratio: {________________}

RV770: 32TMU's/800sp

TMU ratio: [________]
ALU ratio: {________________________________________}

This fit in with ATi's "we must increase ALU to TMU ratio at all costs" mantra.

It does not fit with performance figures which suggest a min 50%.
If indeed the RV670 was TMU bound by such a margin that adding an extra 16 TMU's would have made performance parity with the shader array* (excluding other factors), then RV770 must be atleast %100 faster.

*I know I'm not taking into account ROP limited/ Z-fillrate limited/Bandwidth limited scenario's here.

So lets take other rumours into consideration.

If it is only 480 sp at equal clockspeeds, how can we acheive 50% or greater performance.

Perhaps:

RV670:
TMU ratio: [____]
ALU ratio: {________________}

RV770 : 24 TMU's/480sp
TMU ratio: [______]
ALU ratio: {________________________}


*There is 1 underscore per quad of TMU's - ALu's are based on 4:1 ratio.

So if it's 24 TMU's and RV670 was TMU limited but such a degree that giving 50% more TMU power gives a linear increase in overall performance, maybe RV770 isn't 800sp.

Please note that I am insane.

thanks.

TMU issues seem to occur the most when AF is on but the basis of your argument is that increasing TMU's means it will scale accordingly

If anything, there seems to be a point of diminishing returns - G80 doesn't have the same fill rate as G92 due to their 64 TMU's and yet G80 certainly performs as good as if not better (at higher settings) than G92.

In fact, GTX280 having 80TMU's (a small incresae from 64 over G92) despite having nearly 2x the shaders and so on seems to suggest the the new trend is to increase shaders by a big amount versus increasing TMU count in the same ratio (G80 ROPs are tied to memory bus anyways, to the 32ROPS was a necessary measure given 512-bit memory bus).

As for RV770 performing 50% better - I've heard everything from 50% better to actually being 2x as fast as RV670 but I certainly bet all of that depends on setting. A 50% *average* increase is likely given a wide gamut of games to play, much like many are saying that the GTX280 might be at most a 50% *average* increase over the 9800GX2.

But I'm sure there are certain games (such as shader intensive ones) and settings (such as high AF) where the RV770 can pull far away and be double performance all the time, much like GTX280 can as well (such as games where SLI doesn't scale well or at higher res's and settings)
 
They actually got a terribly weak boost. I remember being all excited that R580 was going to have 3x the shader pipes, and arguing with guys on [H] that it was really going to have 48 shader pipes (this made it sounds really fast because the 7800GTX was 24 pipes). Then it came out and it was like 10% faster than R520.
Give R580 something meaty to do:

http://www.computerbase.de/artikel/...geforce_8800_gtx/16/#abschnitt_call_of_juarez

Also, read off those charts 7900GT, which is about the same performance as 7800GTX if I remember right (7800GTX does appear on some pages).

Jawed
 
http://techreport.com/articles.x/9310/1 Here is review. First one I found in google, so I didn't cherry pick it..
1280*1024 AA/AF ~ 27%
1600*1200 AA/AF ~ 21%

(computerbase)

1600*1200 HDR ~ 30%
1920*1200 HDR ~ 33%

(behardware)

It's irrelevant, that R580 performed only 5-10% faster in undemanding situations, where even R520 was fast enough. But R580 removed bottlenecks for the worst-case scenarios and 30% at the average is very nice result for these cases. As I remember, AOE3 was about twice as fast on R580.
 
If you make assumptions about performance based on a single number, you almost deserve to be disappointed. R580 showed that it was possible to scale a chip out differently in terms of performance while sticking to the same basic architecture, and thank goodness for that. Again, 10% wasn't representative of the overall picture on launch day. Go check out all the old reviews, it's plain as day.

Well. Rangers could also have pointed you to Dave Bs excellent review here at B3D. http://www.beyond3d.com/content/reviews/2/13
The url points to the first benchmark of a game, Far Cry, showing pretty much no advantage for the x1900 product, and the remaining pages show that Rangers 10% estimation is actually quite generous. (And Dave Baumann used the most demanding games of the time. Then, as now, the lions share of games were not bleeding edge in terms of graphics demands.)

So - did Dave do a crap review, or are you engaging in some revisionism, for whatever reason?
 
The url points to the first benchmark of a game, Far Cry, showing pretty much no advantage for the x1900 product, and the remaining pages show that Rangers 10% estimation is actually quite generous. (And Dave Baumann used the most demanding games of the time. Then, as now, the lions share of games were not bleeding edge in terms of graphics demands.)

http://www.beyond3d.com/content/reviews/2/14 (look at the last graph with HDR and AA/AF)

Yeah a 20% gain is "pretty much no advantage". Do people even look at their own links?
 
http://www.beyond3d.com/content/reviews/2/14 (look at the last graph with HDR and AA/AF)

Yeah a 20% gain is "pretty much no advantage". Do people even look at their own links?

One has to wonder as that link he provided has Dave B. metioning a 33% lead for 1900 XTX over 1800 XT in Far Cry HDR. Up to a 36% advantage in SC:CT. In F.E.A.R. up to a 28% increase in performance. Drops to "only" about a 15% advantage in Doom 3... And about here I stopped reading...

Entropy said:
The url points to the first benchmark of a game, Far Cry, showing pretty much no advantage for the x1900 product, and the remaining pages show that Rangers 10% estimation is actually quite generous.

Erm... I'm guessing someone desperate to prove a point not bothering to read the entire article. ;) I think I'll go with Rhys on this one. :)

Rhys said:
Again, 10% wasn't representative of the overall picture on launch day. Go check out all the old reviews, it's plain as day.

Regards,
SB
 
Who's Rhys?

:p
30%5C503457.jpg
 
Well. Rangers could also have pointed you to Dave Bs excellent review here at B3D. http://www.beyond3d.com/content/reviews/2/13
The url points to the first benchmark of a game, Far Cry, showing pretty much no advantage for the x1900 product
From that page:
Rhys--er said:
Even with 4x FSAA and 8x AF enabled all the X1000 configurations are CPU limited in all the resolutions tested
The next three pages--Far Cry HDR, Splinter Cell, and FEAR--all show a 30% improvement at the higher resolutions from R520XT to R580XTX, even with AA+AF. Far from underwhelming for a refresh part that was released so quickly after the original, even considering this isn't quite clock-for-clock (as with that BeHardware page I linked).

As for that review being excellent, well, let's not get swept away in a wave of revisionist fantasy. =D

Point taken, though--tripling the math power doesn't translate into a similar performance boost, and math doesn't seem to be R600's easiest target for improvement. Considered as as in increase in die size, though, you got your transistor's worth.
 
Status
Not open for further replies.
Back
Top