G94 vs RV670 - Which one is more future-proof?

AnarchX · Feb 22, 2008

Since yesterday booth GPUs are facing, bechmarks show 20% advantage for 9600GT over 3850 and a tie with 3870.

But if you look at the specs of this cards you recognize, that they follow very different philosophies:

BW:
- 3850 ~ 9600GT
- 3870 ~30% advantage
-> nearly equal

Z-fill:
- 38x0: ~22GZix/s
- 9600GT: ~80GZix/s
-> over 3 times advantage for G94

tex-fill:
- 38x0: ~12GTex/s
- 9600GT: ~20GTex/s
-> nearly two times advantage for G94

arithmetic power:
- 38x0: 400-500GFLOPs MADD (Vec5)
- 9600GT: ~200GFLOPs MADD (scalar)
-> over 2 times advantage for RV670

geometry power:
- benchmarks show also a big advantage for RV670, since it has the suitable Vec5-ALUs and setup is 1tri/clock

To summarize, we see that G94 has big advantages in fill-rate situation like Z or tex, while RV670 is ahead in shader-limited situations, where his Vec5-ALUs can be feeded, and where geometry is an important factor.

In which direction you would think will upcoming games in the next 1-1.5 years develop and which of this competitors could be the winner?

fellix · Feb 22, 2008

Given the fact, that majority of the contemporary shader code tends to be quite texture lookup bound and there is still no clear path to the implementation of heavy geometry amplification (and other DX10 specifics), a TMU-centric architecture--like NV's one--is still preferred.

mczak · Feb 22, 2008

I've got some feeling rv670 could be a bit more future-proof than G94, but probably not enough that it would matter. It seems alu/tex ratio just isn't increasing that fast.
People basically said the same thing for r580 vs G71, and I think to some extent it has come true. For instance if you now look at UT3, even the mighty 7900GTX can't keep up with the (much cheaper) x1950pro (which is even only rv570 and not r580), despite having way more pixel fillrate, more than twice the texel fillrate and more memory bandwidth - I guess it's lacking in alus (or hampered by large branching granularity).

wingless · Feb 22, 2008

RV670

You two both make excellent points about the architectural strengths of both of these cards. I, however, want to bring up the driver support that is backing up these two architectures. I believe the RV670 will be more future proof because of its flexibility. Catalyst 8.3 will allow better multi-GPU management and even allow mismatched cards like a 3850+3870, or 3870X2+3870 to work together in crossfire. This gives AMD offerings much more flexibility over Nvidia. Also Intel Chipsets enjoy full crossfire support. Even with the slight disparity in performance and price to Nvidia's offering, the AMD options will look more attractive to many consumers. The DX10.1 capabilities are nice too.

To sum it all up, I believe the DX10.1 and flexible multi-GPU options will give the RV670 staying power.

AnarchX · Feb 22, 2008

@mczak
G7x is a more special case, since ALUs and TMUs were coupled, which leads often to stalls, which inhibit this GPUs to reach their theoretical numbers and this was also the reason, why they were destroyed in newer games by R5xx, which did not have this problem
But G9x and RV6xx have booth decoupled TMUs.

@wingless
I doubt that many users consider a upgrade via CF and more than two GPUs are very questionable for midrange-users, since there are many problems.

mczak · Feb 22, 2008

AnarchX said:
@mczak
G7x is a more special case, since ALUs and TMUs were coupled, which leads often to stalls, which inhibit this GPUs to reach their theoretical numbers and this was also the reason, why they were destroyed in newer games by R5xx, which did not have this problem
But G9x and RV6xx have booth decoupled TMUs.

I merely used this as an example that the more future-proof looking solution indeed might turn out to be more future-proof - some people would argue that newer games would just pretty much perform the same as current games, just slower, on any architecture.
So if you think that the alu:tex ratio indeed is going to increase (and assuming those alu ops aren't going to be all-dependent scalar ops) then rv670 should have some advantage compared to G94. Well in theory (think about something like the perlin noise test). And only against G94, not against G92 (whose raw ALU peak rate is too close to really say one's got more ALU power than the other, given the large differences in how those ALUs are organized).

Arun · Feb 22, 2008

I think excluding even the debate you guys are having here, it's obvious that as performance requirements go up, you won't be able to activate AA/AF as often. And since the 9600 GT benefits from both of these compared to RV670, obviously that's another thing you can't get away from and need to consider.

Jawed · Feb 22, 2008

mczak said:
So if you think that the alu:tex ratio indeed is going to increase (and assuming those alu ops aren't going to be all-dependent scalar ops) then rv670 should have some advantage compared to G94. Well in theory (think about something like the perlin noise test). And only against G94, not against G92 (whose raw ALU peak rate is too close to really say one's got more ALU power than the other, given the large differences in how those ALUs are organized).

The perlin noise test from 3DMk06 is interesting because it runs at 93% scalar utilisation on R6xx (197 instruction slots - 916 scalar operations - 4.65 scalars per instruction slot) - ignoring the TEX instructions, that is.

As far as I can tell, 8800GTS-512 and HD3870 run this code at about the same rate, ~170fps. 9600GT runs at 97fps.

I'm doubtful most shaders run at such high utilisation on R6xx, but then again, you can prolly argue that most shaders are not ALU-limited in R6xx, so utilisation doesn't matter.

G9x, in contrast suffers from having to run all attribute interpolation in the ALUs. So texturing-intensive shaders could easily bottleneck in the ALUs. Put another way, G9x spends some of its "serial scalar efficiency" gain on doing work that R6xx has dedicated hardware for (interpolators).

Of course, the ALU-interpolation configuration of G9x is forward-looking because increased shader complexity will reduce the percentage of interpolations. e.g. looking at the code for the perlin noise test, I think there's no interpolations at all - it's all dependent texturing (48 fetches from the same texture). But, ahem, I'm a noob with D3D assembly code.

Anyway, the bottom line is that G9x has an in-built "upwards-adjustment" to the ALU:TEX ratio as shader code increases in complexity. Increases in ALU:TEX "double-count", as it were.

Jawed

Twinkie · Feb 22, 2008

Quick performance numbers between a stock 9600GT, 9600GT OC and HD3870.

http://www.anandtech.com/video/showdoc.aspx?i=3235&p=2

Future proof as in being able to handle DX10 much better than its competition? If that was the case, nVIDIA surprisingly enough is doing better in DX10 benchmarks compared to AMD/ATi.

IMO discussing about which card being future proof never makes sense in the first place. By then, new faster cards will already be out. Like the G71 vs R580. The R580 was a more forward looking architecture but how long did it take to show this? By then, alot of other next gen cards were out that made those cards obsolete in both performance and price.

fellix · Feb 22, 2008

If that was the case, nVIDIA surprisingly enough is doing better in DX10 benchmarks compared to AMD/ATi.

Then we should probably question the true DX10-degree in the current "upgraded" GFX engines.

I hope 3DMark Vantage can show us some believable proof, regarding DX10.

mczak · Feb 22, 2008

Twinkie said:
Future proof as in being able to handle DX10 much better than its competition? If that was the case, nVIDIA surprisingly enough is doing better in DX10 benchmarks compared to AMD/ATi.

Well "future proof" (ok call it forward looking) isn't really about the API - much more about how it's expected to get used. So far I haven't seen any numbers which would show that those DX10 titles have a higher alu:tex ratio (which is what AMD apparently is expecting and which might come true to some degree at some time in the future).

IMO discussing about which card being future proof never makes sense in the first place. By then, new faster cards will already be out. Like the G71 vs R580. The R580 was a more forward looking architecture but how long did it take to show this? By then, alot of other next gen cards were out that made those cards obsolete in both performance and price.

Yes, there is no such thing as future-proof really. There obviously will be cards which will be faster and cheaper in the future. This is more about how long you're able to use a card you now buy. Though granted, people willing to buy pretty high-end cards probably won't really hesitate to upgrade them every once in a while. But if you have that card based on G71 or R580 and are not willing to buy a new card yet (after all both are still quite competent for todays games if you're willing to sacrifice some quality) the one based on R580 now sure looks like it was the better option.

Zengar · Feb 23, 2008

Well, I don't think ATI vector acchitecture can utilize all it's power in most situations (even it looks nice on paper), I prefer the nVidia scalar design to be honest.

fellix · Feb 23, 2008

Well, I don't think ATI vector acchitecture can utilize all it's power...

Actually, it's not vector but rather parallel scalar.

compres · Feb 24, 2008

fellix said:
Actually, it's not vector but rather parallel scalar.

... it's VLIW.

mczak · Feb 24, 2008

compres said:
... it's VLIW.

Yes, that's pretty much "parallel scalar". True the term "vec5" is a bit misleading since that's what you'd usually use for SIMD units.

Arnold Beckenbauer · Feb 24, 2008

I'm the next one: It's superscalar. And VLIW.

Zengar · Feb 24, 2008

Well, it still performs a single operation on a packed vec4 value, doesn't it? So if you call it SIMD or parallel scalar or vector, it doesn't really matter.

P.S. Please correct me if I am wrong, possibly I am missinterpreting something...

Jawed · Feb 24, 2008

Zengar said:
Well, it still performs a single operation on a packed vec4 value, doesn't it? So if you call it SIMD or parallel scalar or vector, it doesn't really matter.

P.S. Please correct me if I am wrong, possibly I am missinterpreting something...

Here's a single instruction slot in R6xx:

Code:

216 x: CNDGE R123.x, R7.z, 1.0f, 0.0f 
    y: CNDGE R126.y, R11.x, 1.0f, 0.0f 
    z: MAX R123.z, R11.y, 0.0f 
    w: ADD R123.w, R6.x, -R2.w VEC_120 
    t: MUL R124.x, R2.w, C2.w

Meanwhile in G80 you get MAD + special function (SIN, RCP etc.) co-issued (or MAD + interpolation). G80 is complicated because SF instructions sometimes take 2 effective clocks instead of a single clock for all other instruction types.

In R600 an ALU is 16-wide, so each of those five operations (x, y, z, w or t) has its own 16-wide ALU. Whereas in G80 each MAD ALU is 8-wide, while the SF (interpolator) ALU is 2-wide.

Jawed

Zengar · Feb 24, 2008

Ah, thank you for the explanation, Jawed...

mczak · Feb 25, 2008

Jawed said:
In R600 an ALU is 16-wide, so each of those five operations (x, y, z, w or t) has its own 16-wide ALU. Whereas in G80 each MAD ALU is 8-wide, while the SF (interpolator) ALU is 2-wide.

It's worth noting that the wideness of 16 is only true for R600 (and rv670), but not the lower end units (12 or 8 there). G80 doesn't look scalable that way. I thought though G80 is 16-wide too, I don't think the half-clusters can run different instructions or can they? I never saw what the purpose of the 8x2 internal cluster arrangement was. And, I thought the SF ALU has the same width as the normal shader unit (it pretty much has to otherwise you couldn't co-issue mad+mul every clock), just needs 4 cycles for most functions (sin, cos etc.) - not sure what the interpolate rate is.

G94 vs RV670 - Which one is more future-proof?

AnarchX

fellix

mczak

wingless

AnarchX

mczak

Arun

Unknown.

Jawed

Twinkie

fellix

mczak

Zengar

fellix

compres

mczak

Arnold Beckenbauer

Zengar

Jawed

Zengar

mczak

Similar threads