G70 Vs X1800 Efficiency

Sxotty said:
Youguys seem very worried by this. No one is saying it is inferior. If it brings better performance at stock speeds then groovy, but it is in no way evil to test this, quite the contrary this is a very interesting thing that they have tested.

I'm not worried about X1800's ability to compete. I'm more worried about the average gamer being able to understand what these results actually mean. When "configured similarly" the G70 may indeed have higher IPC throughput in these tests, but claiming this is a measure of "efficiency" is wrong in my book since they will never be "configured similarly" when you find them in the stores, so it's certainly not something I'd advice anyone to take into account when making purchasing decisions. If anything, this is an IPC test, not an efficiency test.
 
Here is an interesting thing...

I beleive there is a basic FLAW in the assumption that this article made -

that both cards are ACTUALLY running at the clocks that driver-heaven thinks they set

Looking at the fillrate numbers and available bandwidth - it appears that X1800 is running at the clocks that the authors claim it is running at

But G70 appears to be running faster than the claimed clocks - specially the memory clock definetly

Didn't someone find that G70 always has dynamic clocking enabled? Did someone figure a way to disable dynamic clocking?
 
Pete said:
hugh and Tokelil, I think nV's had very high single texture fillrate figures in 3DM since NV40. I remember how shocked we all were when the first ST fillrate figures were leaked, seeing as close they were to the theoretical ones compared to previous cards.
The difference still shouldn't be huge like it is here (25%) when having same teoretical fillrate and same bandwidth IMO.
What could be the course of this? Cache efficiency?
 
This test is funny. Get nV GeForce FX 5900U and ATi Radeon 9800PRO, set the same clocks for both, lock one of two R350's quads (so both chips will be single-quad), don't care about pixel pipeline architecture, don't care about ROPs architecture, don't care about image quality and perform some <SM1.x benchmarks (3DMark01, UT2003, Doom3).

Which solution will offer better performance? Surely the FX. And does it mean, that R350's architecture is inferior to NV35's architecture? :LOL:
 
no-X said:
This test is funny. Get nV GeForce FX 5900U and ATi Radeon 9800PRO, set the same clocks for both, lock one of two R350's quads (so both chips will be single-quad), don't care about pixel pipeline architecture, don't care about ROPs architecture, don't care about image quality and perform some <SM1.x benchmarks (3DMark01, UT2003, Doom3).

Which solution will offer better performance? Surely the FX. And does it mean, that R350's architecture is inferior to NV35's architecture? :LOL:
No, it will mean that FX will have architecture better suited per-quad for DX8-style games.

:p , fanATics all over the world unite, X1xxx is loosing in one custom-made benchmark, omg what a disaster .....

btw, driverheaven DID screw the test-setup, according to Unwinder:
Два домена (shader/ROP) - с шагом кварца (27MHz). Третий - практически один в один с целевой частотой.

Частота 450, выставленная через PS/RT/драйвер должна была привести к установке 459/459/490.
Setting frequency to 450 in RT should give 459/459/490 as frequencies for 7800GTX
 
Last edited by a moderator:
Sxotty said:
Walt what exactly do you think the problem was with these specific tests that make them invalid/unreliable? What do you take issue with? I see little discussion actually going on with regards to the methodology. Only a tiny snippet about the %change in bandwith and fillrate. Other than that it seems a bunch of folks saying "this is unfair I don't like it"
There has been some discussion:

- G70 may not actually be clocked as low as it claims.

- G70 may not actually have it's extra quad disabled.

- R520 may be having it's (important) memory controller screwed up because of the nonstandard clock speeds.

- that the test may not be measuring "efficiency". It looks to be trying to measure work per quad per clock cycle.

- the exercise is artificial in the extreme as you will never find R520s clocked that low in the real world. You're just removing one of the main design parameters and architectural advantages of the R520, so it's not a valid test. What if R520 is more efficient at high speed and you've made it less efficient by downclocking it?

- there are no parameters defining "efficiency" (screen size, AA, AF, IQ, SM2/SM3 branching, pipeline stalling etc).


IMO, there's quite a lot of problems in trying to do this test by so dratically changing the operation of the cards and then trying to draw conclusions out of it. I just don't believe the evidence can support the claims, or that the testing methodolgy is sound. I'm not even sure what kind of usefulness this information can have in the real world, as you won't ever get these cards performing in this fashion.
 
Bouncing Zabaglione Bros. said:
There has been some discussion:

- G70 may not actually be clocked as low as it claims.

- G70 may not actually have it's extra quad disabled.

- R520 may be having it's (important) memory controller screwed up because of the nonstandard clock speeds.

- that the test may not be measuring "efficiency". It looks to be trying to measure work per quad per clock cycle.

- the exercise is artificial in the extreme as you will never find R520s clocked that low in the real world. You're just removing one of the main design parameters and architectural advantages of the R520, so it's not a valid test. What if R520 is more efficient at high speed and you've made it less efficient by downclocking it?

- there are no parameters defining "efficiency" (screen size, AA, AF, IQ, SM2/SM3 branching, pipeline stalling etc).


IMO, there's quite a lot of problems in trying to do this test by so dratically changing the operation of the cards and then trying to draw conclusions out of it. I just don't believe the evidence can support the claims, or that the testing methodolgy is sound. I'm not even sure what kind of usefulness this information can have in the real world, as you won't ever get these cards performing in this fashion.
i suggest we never make comparisons with overclocked cards because:

-maybe the card is not clocked as expected
-may be having it's (important) memory controller screwed up because of the nonstandard clock speeds.
-the exercise is artificial in the extreme as you will never find a card clocked that high in the real world. You're just removing one of the main design parameters and architectural advantages of the card, so it's not a valid test.
-IMO, there are quite a lot of problems in trying to do such test by dratically changing the operation of the cards and then trying to draw conclusions out of it. I'm not even sure what kind of usefulness this information can have in the real world, as you won't ever get these cards performing in this fashion.

/sarcasm
 
chavvdarrr said:
i suggest we never make comparisons with overclocked cards because:

/sarcasm
Despite your sarcasm, I'd agree with that. The same things that hold true about underclocking cards hold true about overclocking them. You can't just upclock a card, break the expected timings for things like the memory controller, and then claim it shows you something significant about "efficiency". The testing methodology doesn't stand up unless you can be sure you know what is going on and why, or (as in this case) that your testing is even running the way you think it is.

Modern graphics cards are just too complex for such a simplistic approach.
 
Last edited by a moderator:
Maybe they should make it so each card is using the same number of ALU's or something even more rediculess like same number of transistors. I have no idea what it would prove beyond the obvious, but it would get them some more hits :smile:

Anyway couldn't you calculate the average IPC of each pipe when running a real shader without giveing the cards equal clocks or the same number of pipes? And wouldn't that be a better indication of who has the more powerful pipes clock for clock?
 
- G70 may not actually have it's extra quad disabled.

The G70 does have its quad disabled. Theres no question about this unless the people who disabled it just didnt what they were doing. People have been doing this for months now on the G70 and I even have done some performance tests disabling quads on the G70. Besides any fillrate tester will show it works.
 
Last edited by a moderator:
The question IMHO should be not if that review is made perfectly right, but rather, is this view INTERESTING.
Is it?
It is for me. And it seems its interesting for you too.
 
All of this is pretty pointless since its fairly obvious that they have different pipelines and different structures. Per pipeline its obvious to see that G70 has a higher theoretical instruction per clock, and (other factors aside) is probably able to make better use of that in many cases. ATI's pipelines are still essentially the same as R300, with some tweaks to the ALU's; they have a lower theoretical IPC per pipe, but have a higher flexability in many cases that should enable them to have a higher utilisation of the ALU's across a range of cases. Current tests are also not gong to be making use of some of the other elements such as branch processing.
 
Ragemare said:
Anyway couldn't you calculate the average IPC of each pipe when running a real shader without giveing the cards equal clocks or the same number of pipes? And wouldn't that be a better indication of who has the more powerful pipes clock for clock?
But what would that show? That Nvidia designed more powerful pipes that can't run as fast, and that ATI designed less powerful pipes because it would mean they could run them faster?

Okay, so that would be a bit of info, but what does it mean in real terms? It's just an indication of different design philosophies, it doesn't actually tell us which one is more efficient. I'm not even sure it's valid because the whole point of the ATI design is that it sacrificies some "pipe-power" in order to run those pipes at a higher clockrate. Take those higher clocks out of the equation, and you remove one of the design targets.

You've left the compromise part of the design (simpler pipes) and taken away the reason that compromise was made (faster clocks). You've left the cons and taken out the pros.
 
Dave Baumann said:
All of this is pretty pointless since its fairly obvious that they have different pipelines and different structures.
Of no real practical use, perhaps. Pointless though? Allow me to disagree; and to further strengthen my point of view...
...Per pipeline its obvious to see that G70 has a higher theoretical instruction per clock, and (other factors aside) is probably able to make better use of that in many cases. ATI's pipelines are still essentially the same as R300, with some tweaks to the ALU's; they have a lower theoretical IPC per pipe, but have a higher flexability in many cases that should enable them to have a higher utilisation of the ALU's across a range of cases. Current tests are also not gong to be making use of some of the other elements such as branch processing.
...you proceed to explain what Veridian more or less observed with his (some say limited, but getting a sample from many games was never his intend i believe) testing. What you call "obvious" may be obvious to you, or me, or others, but not for the whole readership - after all, not all people are 3D proficient. So Veridian's observations do have a point, as long as they are not taken the wrong way (for instance condemn the X1800 architecture just because G70 does more per clock, which is totally irrelevant really).
 
Bouncing Zabaglione Bros. said:
But what would that show? That Nvidia designed more powerful pipes that can't run as fast, and that ATI designed less powerful pipes because it would mean they could run them faster?

Okay, so that would be a bit of info, but what does it mean in real terms? It's just an indication of different design philosophies, it doesn't actually tell us which one is more efficient. I'm not even sure it's valid because the whole point of the ATI design is that it sacrificies some "pipe-power" in order to run those pipes at a higher clockrate. Take those higher clocks out of the equation, and you remove one of the design targets.

You've left the compromise part of the design (simpler pipes) and taken away the reason that compromise was made (faster clocks). You've left the cons and taken out the pros.

Well I don't think ATi's plan orginally was to clock this thing to sky is the limit. I would like to see the comparision between the r520 and the nv40, I think efficiency wise in sm 2.0/3.0 will be very similiar per clock other then dynamic branching.

Philosophies change depending on competitors products. Its almost as if ATi had no choice but to clock it up this high.
 
Kombatant said:
Of no real practical use, perhaps. Pointless though? Allow me to disagree; and to further strengthen my point of view...

...you proceed to explain what Veridian more or less observed with his (some say limited, but getting a sample from many games was never his intend i believe) testing. What you call "obvious" may be obvious to you, or me, or others, but not for the whole readership - after all, not all people are 3D proficient. So Veridian's observations do have a point, as long as they are not taken the wrong way (for instance condemn the X1800 architecture just because G70 does more per clock, which is totally irrelevant really).

Thanks Komb,

Pretty much summed up a lot of what i was going to post.

Basically this whole article came from me seeing a few posts (quite poss on this site) wondering about clock for clock performance as well as some conversations i'd had with the DH owner. I was interested to see what happened and originally planned a one pager.

As it turned out i had a little more time to run more than 1 test and so thats how we ended up with 4.

As for configurations, the clock rate was noted as 450mhz to keep it simple for the non technical readers out there however care was taken to make sure the 3 internal clocks were as comparible with the R520 as possible. Also we confirmed the pipelines were disabled.

Just a bit of fun basically which i felt was worth publishing. If you found it interesting, great. If not...no probs.
 
Just looking at 'peak' 32bit programmable Flops for G70 and R520,

VS units:

G70 ~ 10 Flops/cycle (vec4+scalar, madds)
R520 ~ 10 Flops/cycle (vec4+scalar, madds)

PS units:

G70 ~ 16 Flops/cycle (vec2+vec2, vec2+vec2, all madds)
R520 ~ 12 Flops/cycle (vec3+scalar, madds; vec3+scalar, non-madds)

Just looking at these figures for raw shading power, per clock cycle, you'd expect VS power to be pretty much identical for both G70 and R520. However, PS power would favor G70 and the test more or less confirms this...
 
wireframe said:
I think we can assuem designs will become more efficient and clock higher
Netburst. From a technical perspective, I think it's very interesting to what changes have been made per-pipe-per-cycle, though I agree with those who suggest a 520v420 comparo would have been more enlightening.
 
Last edited by a moderator:
Back
Top