G8x/G9x Power Draw under Peak ALU Load

Jawed

Legend
More detailed testing of power draw in RV670 and G92 has thrown up an interesting "anomaly":

http://arstechnica.com/journals/har...-consumption-of-the-ati-radeon-hd-3800-series

There is, however, a slight twist to the 8800GT and 8800GTS' power consumption. When running the games I tested (or the other tests in 3DMark 2006), total system power draw peaks at the "typical" level described above. In 3DMark 2006's Pixel Shader test, however, we see a noticeable rise in power consumption from the 8800GT in both single-card and SLI mode. The 8600GTS' power consumption also rises slightly, though not by nearly the same degree. Most interesting is the fact that the Radeon cards from both the HD 2000 and HD 3000 series run the exact same test without even approaching their respective load power results.
So, is it reasonable to surmise that this test (actually, I am not 100% sure which test) is exercising the NVidia GPUs' ALUs considerably harder than in most (all) games?

Or is it really that ALU/TMUs are effectively running "flat out" in NVidia hardware.

Does anyone know of the performance results for these tests. I'd like to see the "performance per watt" for this test - the large disparity in power draw when comparing the two GPUs may not be anomalous at all.

Jawed
 
Ars said:
Although it's still unclear exactly why the G92 and RV670 behave differently in this particular test, I want to emphasize that the 8800GT's higher power consumption is a curiosity, rather than a flaw. Even at peak load, the 8800GT does not exceed its 105W TDP, and was perfectly happy to loop the PS3 test for several hours.

Honestly, i don't care about a slight power spike on a single test of a single synthetic benchmark suite.
If it was a more widespread anomaly it would be worth looking into with further detail, but as it stands...
 
Honestly, i don't care about a slight power spike on a single test of a single synthetic benchmark suite.
If it was a more widespread anomaly it would be worth looking into with further detail, but as it stands...
What makes you think this contribution to a technical thread is warranted?

Jawed
 
What makes you think this contribution to a technical thread is warranted?

Jawed

Because technically, there's nothing wrong with the card as it is running within the previously defined parameters... ?
In fact, if anything, that review at Ars served to demonstrate that power draw (at peak, at least) isn't that much different from RV670, despite the thinner process technology of the later.
 
Because technically, there's nothing wrong with the card as it is running within the previously defined parameters... ?
In fact, if anything, that review at Ars served to demonstrate that power draw (at peak, at least) isn't that much different from RV670 despite the thinner process technology of the later.
So we should care about your opion in a tech thread? This section is for stuff like this, to get all geeky, your making it a fan thread. sit down for awhile.
 
What's the thread about in the first place? They clearly state that it never exceeded it's given 105W maximum, so what is wrong there in your opinion?

Btw it was the PS3 Test in 3D Mark, so surely the shaders get hit a bit more heavily than usual.
 
What's the thread about in the first place? They clearly state that it never exceeded it's given 105W maximum, so what is wrong there in your opinion?

Precisely what i tried to say, even by using a quote from that very same article to mark my point of view.
But apparently anything that is said that might be somewhat "dissonant" with the perhaps negative tone expected for the thread automatically gets stamped as "fanboyism"...

I'm ok with that, everyone can think what they wish of me because its still a democracy -within Geographic limits :D-, but the quote from Ars is still valid because it's the bottom line of their article.
It's not like i was asking just a few days ago for a personal opinion on buying 2 HD3870's anyway, right ?
 
The topic does not suggest that there is anything wrong with GT or that RV670 is a superior design - it merely points an interesting difference between the two architectures, which suggests that ALUs power draw might be responsible for the bulk of G8x power consumption OR that a particular feature of the test taxes G8x ALUs more than RV670 ALUs. Even if GT had max TDP of 2W and RV670 had TDP of 800W, the question regarding the power draw would still be valid (since it's relative).
 
The topic does not suggest that there is anything wrong with GT or that RV670 is a superior design - it merely points an interesting difference between the two architectures, which suggests that ALUs power draw might be responsible for the bulk of G8x power consumption OR that a particular feature of the test taxes G8x ALUs more than RV670 ALUs. Even if GT had max TDP of 2W and RV670 had TDP of 800W, the question regarding the power draw would still be valid (since it's relative).


The point i tried to make in my first post was that, with only a single manifestation of the spike on a small part of 3DMark 2006 (the pixel shader 3.0 test), and with the card still well within TDP specs, it will very difficult to unquestionably pinpoint this to either a driver issue or perhaps even a hardware issue.
I said nothing about either architecture being superior, as i actually believe both of them are extremely well matched on a price/performance level.
 
The question remains: Why does it happen in this test?

If you have nothing further to contribute to this thread, then leave.
 
More detailed testing of power draw in RV670 and G92 has thrown up an interesting "anomaly":

http://arstechnica.com/journals/har...-consumption-of-the-ati-radeon-hd-3800-series


So, is it reasonable to surmise that this test (actually, I am not 100% sure which test) is exercising the NVidia GPUs' ALUs considerably harder than in most (all) games?

Or is it really that ALU/TMUs are effectively running "flat out" in NVidia hardware.

....
Jawed

My own opinion would be that this is a byproduct of the methods used to test "power draw" in this particular case--something that generally isn't easy to do, imo. The synthetic test in 3dmk06 does a specific thing (tests pixel shaders) and that is all it is doing for the specific amount of time it is running. The test is looking for maximum throughput, so I think it's safe to say that the 3dmk06 test is asking for and getting the maximum performance possible for that software test, and doing so in a sustained fashion that lends itself to more accurate power draw readings under those conditions. Just running games isn't going to stress the hardware in anywhere nearly as consistent a fashion, of course, since the gpu will be doing lots of things to display the game aside from running the pixel shaders flat out--which may be something that a particular game doesn't require at all, or only requires occasionally. Without knowing the game code well enough to know exactly when the pixel shaders might be running flat out I doubt it would be possible to compare the power draws between "running a game" and that particular pixel shader test in 3dmk06. It's this kind of specific testing that makes synthetics like 3dMK interesting, imo. Hence both power draw observations would naturally be different under different software conditions, and both would be correct.

This topic reminds me of how overclocking gpus can produce different results depending on the nature of the 3d software being run. Some games will run forever at a particular gpu clock setting, while other games may fail quickly when the gpu is running the same clocks, or else may fail in predictable places in a particular game at the same clocks. Different portions of the gpus are exercised at different times by the software that is running, hence one piece of software pulls less power to run, while another may either consistently pull more power, or pull more power only at certain times--which can set up a thermal situation that might cause the gpu to fail in that game at that time.

My *guess* here would be that either the ATi cards always run "flat out", or else that when they do run flat out as opposed to not, the smaller process design inherently draws less power than the process used by the nV card tested under the same software conditions, either way.
 
Off the top of my head:

1) Perhaps Ars should rename 3dmark to quak.exe, just in case...

2) If we could pick apart the shaders that make up that subsection, we might find which ones in particular cause a ramp-up. It's plausible there are power spikes with shaders in the other test, but other parts compensate for it.

3) Some kind of random confluence of factors that allows G92 to schedule better than normal. I'm curious of the exact scores the cards got on that subtest for Ars's power numbers.

4) RV670 has finer granularity when it comes to power management and it is able to capture some opportunities for idling units G92 cannot.

5) G92 is simply doing more work per given number of operations, perhaps some kind of register read or PDC gymnastics that it is usually able to avoid.

6) RV670 is able to offload some portion of the work to dedicated (and presumably more power-efficient) hardware, while G92 is tasking the ALUs for it.
If the tests are simple enough, even a minor discrepancy might show up, whereas other factors dwarf the difference in other code.
 
1) Perhaps Ars should rename 3dmark to quak.exe, just in case...

2) If we could pick apart the shaders that make up that subsection, we might find which ones in particular cause a ramp-up. It's plausible there are power spikes with shaders in the other test, but other parts compensate for it.
I'm moderately hopeful that CUDA-based testing would produce more insight here.

3) Some kind of random confluence of factors that allows G92 to schedule better than normal. I'm curious of the exact scores the cards got on that subtest for Ars's power numbers.
I've been looking but so far failed to find performance figures for these cards in this test. For all we know G92 could be running the test 2-4x faster than RV670...

4) RV670 has finer granularity when it comes to power management and it is able to capture some opportunities for idling units G92 cannot.
What could (or should?) be idling, bearing in mind that G92's peak power consumption in this test is higher than in other measured scenarios?

5) G92 is simply doing more work per given number of operations, perhaps some kind of register read or PDC gymnastics that it is usually able to avoid.
Which is getting at the nub of why this subject could be interesting. e.g. this shader might have a high register count, or it might be doing a lot of dynamic branching or ...

6) RV670 is able to offload some portion of the work to dedicated (and presumably more power-efficient) hardware, while G92 is tasking the ALUs for it.
If the tests are simple enough, even a minor discrepancy might show up, whereas other factors dwarf the difference in other code.
We know G92 uses the ALUs for attribute interpolation whereas RV670 has dedicated hardware - so is this test doing a lot of attribute interpolation?

I believe this is the test in question:

http://www.futuremark.com/products/3dmark06/tests/

Perlin Noise
This test computes six octaves of 3-dimensional Perlin simplex noise using a combination of arithmetic instructions and texture lookups. Perlin noise is a basic building block in many procedural texturing and modeling techniques, which are expected to increase in popularity in future games due to both reduced memory and bandwidth requirements as well as the increasing computation power in graphics hardware. This test requires SM3.0.
This, I believe, is the major pixel shader code for it:

Code:
ps_3_0
 
def c1 , -0.500000000000000000000000, 0.039999999105930328000000, 0.025000000372529030000000, 0.333333343267440800000000
def c2 , 2.000000000000000000000000, -1.000000000000000000000000, 0.166666671633720400000000, 0.003906250000000000000000
def c3 , 0.600000023841857910000000, 0.005859375000000000000000, 16.000000000000000000000000, 32.000000000000000000000000
def c4 , 0.001953125000000000000000, 0.000000000000000000000000, 1.000000000000000000000000, -1.500000000000000000000000
def c5 , -2.000000000000000000000000, 3.000000000000000000000000, 0.000000000000000000000000, 0.000000000000000000000000
def c6 , 2.000000000000000000000000, 4.000000000000000000000000, 0.050000000745058060000000, 8.000000000000000000000000
def c7 , 4.000000000000000000000000, 8.000000000000000000000000, 0.100000001490116120000000, 0.000000000000000000000000
def c8 , 8.000000000000000000000000, 16.000000000000000000000000, 0.200000002980232240000000, 0.000000000000000000000000
def c9 , 16.000000000000000000000000, 32.000000000000000000000000, 0.400000005960464480000000, 0.000000000000000000000000
def c10 , 32.000000000000000000000000, 64.000000000000000000000000, 0.800000011920928960000000, 0.000000000000000000000000
dcl_color0  v0.y 
dcl_color1  v1.xy 
dcl_2d s0 
add r0.xy , v1 , c1.xxxx 
add r1.xy , r0 , r0 
mov r0.yz , c1 
mul r0.xy , r0.yzzw , c0.xxxx 
add r3.x , r1.xxxx , r0.xxxx 
add r0.w , r1.yyyy , r3.xxxx 
add r0.w , r0.yyyy , r0.wwww 
mad r3.y , v1.yyyy , c2.xxxx , c2.yyyy 
mul r3.z , r0.zzzz , c0.xxxx 
mad r0.xyz , r0.wwww , c1.wwww , r3 
frc r1.xyz , r0 
add r2.xyz , r0 , -r1 
add r0.w , r2.yyyy , r2.xxxx 
add r0.w , r2.zzzz , r0.wwww 
mad r0.xyz , r0.wwww , -c2.zzzz , r2 
add r4.xyz , r3 , -r0 
add r0 , -r4.xxyy , r4.yzxz 
add r1.xy , -r4.zzzz , r4 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r5.xy , r1 , c4.yyyy , c4.zzzz 
add r1.xy , r0.ywzw , r0.xzzw 
add r1.z , r5.yyyy , r5.xxxx 
add r0.xyz , r1 , c4.wwww 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
add r5.xyz , r4 , -r0 
add r7.xyz , r5 , c2.zzzz 
mul r5.xyz , r2 , c2.wwww 
dp3 r0.w , r7 , r7 
add r1.w , -r0.wwww , c3.xxxx 
add r6.xyz , r1 , c1.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r0 , c2.wwww 
mul r0.w , r0.wwww , r0.wwww 
mul r3.w , r0.wwww , r0.wwww 
add r0.xyz , r5 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
dp3 r0.w , r4 , r4 
add r1.w , -r0.wwww , c3.xxxx 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
max r0.w , r1.wwww , c4.yyyy 
dp3 r1.w , r1 , r7 
mul r0.w , r0.wwww , r0.wwww 
mul r3.w , r3.wwww , r1.wwww 
mul r2.w , r0.wwww , r0.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r6.xyz , r6 , c4.zzzz , c4.yyyy 
add r2.xyz , r4 , -r6 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r4 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r2.wwww , r0.wwww , r3.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r6 , c2.wwww 
mul r3.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r3.wwww , r3.wwww 
dp3 r0.z , r0 , r2 
add r4.xyz , r4 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r5 , c3.yyyy 
add r0.y , r5.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
add r3.y , v1.yyyy , c1.xxxx 
mov r3.z , c0.xxxx 
mul r1.xyz , r3 , c6 
add r0.w , r1.yyyy , r1.xxxx 
add r0.w , r1.zzzz , r0.wwww 
mad r2.xyz , r0.wwww , c1.wwww , r1 
frc r5.xyz , r2 
add r2.xyz , r2 , -r5 
add r0.w , r2.yyyy , r2.xxxx 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
add r0.w , r2.zzzz , r0.wwww 
dp3 r1.w , r0 , r4 
mad r0.xyz , r0.wwww , -c2.zzzz , r2 
dp3 r3.w , r4 , r4 
add r4.xyz , r1 , -r0 
add r0 , -r4.xxyy , r4.yzxz 
add r1.xy , -r4.zzzz , r4 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r5.xy , r1 , c4.yyyy , c4.zzzz 
add r1.xy , r0.ywzw , r0.xzzw 
add r1.z , r5.yyyy , r5.xxxx 
add r3.w , -r3.wwww , c3.xxxx 
add r0.xyz , r1 , c4.wwww 
max r0.w , r3.wwww , c4.yyyy 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
mul r0.w , r0.wwww , r0.wwww 
add r5.xyz , r4 , -r0 
mul r0.w , r0.wwww , r0.wwww 
add r7.xyz , r5 , c2.zzzz 
mad r3.w , r0.wwww , r1.wwww , r2.wwww 
dp3 r0.w , r7 , r7 
mul r5.xyz , r2 , c2.wwww 
add r1.w , -r0.wwww , c3.xxxx 
add r6.xyz , r1 , c1.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r0 , c2.wwww 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r0.xyz , r5 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
dp3 r0.w , r4 , r4 
add r1.w , -r0.wwww , c3.xxxx 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
max r0.w , r1.wwww , c4.yyyy 
dp3 r1.w , r1 , r7 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r4.wwww , r1.wwww 
mul r2.w , r0.wwww , r0.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r6.xyz , r6 , c4.zzzz , c4.yyyy 
add r2.xyz , r4 , -r6 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r4 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r2.wwww , r0.wwww , r4.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r6 , c2.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r4.wwww , r4.wwww 
dp3 r0.z , r0 , r2 
add r4.xyz , r4 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r5 , c3.yyyy 
add r0.y , r5.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mul r1.xyz , r3 , c7 
add r0.w , r1.yyyy , r1.xxxx 
add r0.w , r1.zzzz , r0.wwww 
mad r2.xyz , r0.wwww , c1.wwww , r1 
frc r5.xyz , r2 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
add r2.xyz , r2 , -r5 
dp3 r1.w , r0 , r4 
add r0.z , r2.yyyy , r2.xxxx 
dp3 r0.w , r4 , r4 
add r0.z , r2.zzzz , r0.zzzz 
add r0.w , -r0.wwww , c3.xxxx 
mad r0.xyz , r0.zzzz , -c2.zzzz , r2 
max r4.w , r0.wwww , c4.yyyy 
add r4.xyz , r1 , -r0 
add r0 , -r4.xxyy , r4.yzxz 
add r1.xy , -r4.zzzz , r4 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r5.xy , r1 , c4.yyyy , c4.zzzz 
add r1.xy , r0.ywzw , r0.xzzw 
add r1.z , r5.yyyy , r5.xxxx 
mul r0.w , r4.wwww , r4.wwww 
add r0.xyz , r1 , c4.wwww 
mul r0.w , r0.wwww , r0.wwww 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
mad r0.w , r0.wwww , r1.wwww , r2.wwww 
add r5.xyz , r4 , -r0 
mul r0.w , r0.wwww , c3.zzzz 
add r7.xyz , r5 , c2.zzzz 
mad r3.w , r3.wwww , c3.wwww , r0.wwww 
dp3 r0.w , r7 , r7 
mul r5.xyz , r2 , c2.wwww 
add r1.w , -r0.wwww , c3.xxxx 
add r6.xyz , r1 , c1.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r0 , c2.wwww 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r0.xyz , r5 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
dp3 r0.w , r4 , r4 
add r1.w , -r0.wwww , c3.xxxx 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
max r0.w , r1.wwww , c4.yyyy 
dp3 r1.w , r1 , r7 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r4.wwww , r1.wwww 
mul r2.w , r0.wwww , r0.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r6.xyz , r6 , c4.zzzz , c4.yyyy 
add r2.xyz , r4 , -r6 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r4 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r2.wwww , r0.wwww , r4.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r6 , c2.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r4.wwww , r4.wwww 
dp3 r0.z , r0 , r2 
add r4.xyz , r4 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r5 , c3.yyyy 
add r0.y , r5.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mul r1.xyz , r3 , c8 
add r0.w , r1.yyyy , r1.xxxx 
add r0.w , r1.zzzz , r0.wwww 
mad r2.xyz , r0.wwww , c1.wwww , r1 
frc r5.xyz , r2 
add r2.xyz , r2 , -r5 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
add r0.w , r2.yyyy , r2.xxxx 
dp3 r1.w , r0 , r4 
add r0.z , r2.zzzz , r0.wwww 
dp3 r0.w , r4 , r4 
mad r0.xyz , r0.zzzz , -c2.zzzz , r2 
add r4.w , -r0.wwww , c3.xxxx 
add r4.xyz , r1 , -r0 
add r0 , -r4.xxyy , r4.yzxz 
add r1.xy , -r4.zzzz , r4 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r5.xy , r1 , c4.yyyy , c4.zzzz 
add r1.xy , r0.ywzw , r0.xzzw 
add r1.z , r5.yyyy , r5.xxxx 
max r0.w , r4.wwww , c4.yyyy 
add r0.xyz , r1 , c4.wwww 
mul r0.w , r0.wwww , r0.wwww 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
mul r0.w , r0.wwww , r0.wwww 
add r5.xyz , r4 , -r0 
mad r0.w , r0.wwww , r1.wwww , r2.wwww 
add r7.xyz , r5 , c2.zzzz 
mad r3.w , r0.wwww , c6.wwww , r3.wwww 
dp3 r0.w , r7 , r7 
mul r5.xyz , r2 , c2.wwww 
add r1.w , -r0.wwww , c3.xxxx 
add r6.xyz , r1 , c1.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r0 , c2.wwww 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r0.xyz , r5 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
dp3 r0.w , r4 , r4 
add r1.w , -r0.wwww , c3.xxxx 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
max r0.w , r1.wwww , c4.yyyy 
dp3 r1.w , r1 , r7 
mul r0.w , r0.wwww , r0.wwww 
mul r4.w , r4.wwww , r1.wwww 
mul r2.w , r0.wwww , r0.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r6.xyz , r6 , c4.zzzz , c4.yyyy 
add r2.xyz , r4 , -r6 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r4 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r2.wwww , r0.wwww , r4.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r6 , c2.wwww 
mul r4.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r4.wwww , r4.wwww 
dp3 r0.z , r0 , r2 
add r4.xyz , r4 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r5 , c3.yyyy 
add r0.y , r5.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mul r1.xyz , r3 , c9 
add r0.w , r1.yyyy , r1.xxxx 
add r0.w , r1.zzzz , r0.wwww 
mad r2.xyz , r0.wwww , c1.wwww , r1 
frc r5.xyz , r2 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
add r2.xyz , r2 , -r5 
dp3 r1.w , r0 , r4 
add r0.z , r2.yyyy , r2.xxxx 
dp3 r0.w , r4 , r4 
add r0.z , r2.zzzz , r0.zzzz 
add r0.w , -r0.wwww , c3.xxxx 
mad r0.xyz , r0.zzzz , -c2.zzzz , r2 
max r4.w , r0.wwww , c4.yyyy 
add r5.xyz , r1 , -r0 
add r0 , -r5.xxyy , r5.yzxz 
add r1.xy , -r5.zzzz , r5 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r4.xy , r1 , c4.yyyy , c4.zzzz 
add r1.xy , r0.ywzw , r0.xzzw 
add r1.z , r4.yyyy , r4.xxxx 
mul r0.w , r4.wwww , r4.wwww 
add r0.xyz , r1 , c4.wwww 
mul r0.w , r0.wwww , r0.wwww 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
mad r0.w , r0.wwww , r1.wwww , r2.wwww 
add r4.xyz , r5 , -r0 
mad r4.w , r0.wwww , c6.yyyy , r3.wwww 
add r7.xyz , r4 , c2.zzzz 
mul r3.xyz , r3 , c10 
dp3 r0.w , r7 , r7 
mul r4.xyz , r2 , c2.wwww 
add r1.w , -r0.wwww , c3.xxxx 
add r6.xyz , r1 , c1.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r0 , c2.wwww 
mul r0.w , r0.wwww , r0.wwww 
mul r3.w , r0.wwww , r0.wwww 
add r0.xyz , r4 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
dp3 r0.w , r5 , r5 
add r1.w , -r0.wwww , c3.xxxx 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
max r0.w , r1.wwww , c4.yyyy 
dp3 r1.w , r1 , r7 
mul r0.w , r0.wwww , r0.wwww 
mul r3.w , r3.wwww , r1.wwww 
mul r2.w , r0.wwww , r0.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r6.xyz , r6 , c4.zzzz , c4.yyyy 
add r2.xyz , r5 , -r6 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r5 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r2.wwww , r0.wwww , r3.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r6 , c2.wwww 
mul r3.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r3.wwww , r3.wwww 
dp3 r0.z , r0 , r2 
add r2.xyz , r5 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r4 , c3.yyyy 
add r0.y , r4.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
add r0.w , r3.yyyy , r3.xxxx 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
add r0.w , r3.zzzz , r0.wwww 
dp3 r1.w , r0 , r2 
mad r0.xyz , r0.wwww , c1.wwww , r3 
dp3 r0.w , r2 , r2 
frc r1.xyz , r0 
add r2.z , -r0.wwww , c3.xxxx 
add r1.xyz , r0 , -r1 
max r0.w , r2.zzzz , c4.yyyy 
add r0.z , r1.yyyy , r1.xxxx 
mul r0.w , r0.wwww , r0.wwww 
add r0.z , r1.zzzz , r0.zzzz 
mul r0.w , r0.wwww , r0.wwww 
mad r0.xyz , r0.zzzz , -c2.zzzz , r1 
mad r3.w , r0.wwww , r1.wwww , r2.wwww 
add r4.xyz , r3 , -r0 
add r0 , -r4.xxyy , r4.yzxz 
add r2.xy , -r4.zzzz , r4 
cmp r0 , r0 , c4.yyyy , c4.zzzz 
cmp r2.xy , r2 , c4.yyyy , c4.zzzz 
add r5.xy , r0.ywzw , r0.xzzw 
add r5.z , r2.yyyy , r2.xxxx 
mul r3.xyz , r1 , c2.wwww 
add r0.xyz , r5 , c4.wwww 
dp3 r0.w , r4 , r4 
cmp r0.xyz , r0 , c4.zzzz , c4.yyyy 
add r1.w , -r0.wwww , c3.xxxx 
add r1.xyz , r4 , -r0 
max r0.w , r1.wwww , c4.yyyy 
add r6.xyz , r1 , c2.zzzz 
mul r1.w , r0.wwww , r0.wwww 
dp3 r0.w , r6 , r6 
mul r5.w , r1.wwww , r1.wwww 
add r1.w , -r0.wwww , c3.xxxx 
mul r1.xyz , r0 , c2.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r0.w , r0.wwww , r0.wwww 
add r0.xyz , r3 , c4.xxxx 
add r2.xy , r1 , r0 
add r1.y , r1.zzzz , r0.zzzz 
texld r2 , r2 , s0 
mov r1.x , r2.wwww 
texld r1 , r1 , s0 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
mul r0.w , r0.wwww , r0.wwww 
dp3 r1.w , r1 , r6 
add r2.xyz , r5 , c1.xxxx 
mul r2.w , r0.wwww , r1.wwww 
texld r1 , r0 , s0 
mov r0.w , r1.wwww 
texld r1 , r0.wzzw , s0 
cmp r5.xyz , r2 , c4.zzzz , c4.yyyy 
add r2.xyz , r4 , -r5 
add r2.xyz , r2 , c1.wwww 
mad r1.xyz , c2.xxxx , r1 , c2.yyyy 
dp3 r1.w , r2 , r2 
dp3 r0.w , r1 , r4 
add r1.w , -r1.wwww , c3.xxxx 
mad r2.w , r5.wwww , r0.wwww , r2.wwww 
max r0.w , r1.wwww , c4.yyyy 
mul r1.xyz , r5 , c2.wwww 
mul r5.w , r0.wwww , r0.wwww 
add r1.xy , r0 , r1 
add r0.y , r0.zzzz , r1.zzzz 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r5.wwww , r5.wwww 
dp3 r0.z , r0 , r2 
add r2.xyz , r4 , c1.xxxx 
mad r2.w , r0.wwww , r0.zzzz , r2.wwww 
add r1.xy , r3 , c3.yyyy 
add r0.y , r3.zzzz , c3.yyyy 
texld r1 , r1 , s0 
mov r0.x , r1.wwww 
texld r0 , r0 , s0 
dp3 r0.w , r2 , r2 
add r1.w , -r0.wwww , c3.xxxx 
max r0.w , r1.wwww , c4.yyyy 
mad r0.xyz , c2.xxxx , r0 , c2.yyyy 
mul r0.w , r0.wwww , r0.wwww 
dp3 r0.y , r0 , r2 
mul r0.z , r0.wwww , r0.wwww 
mad r0.w , c2.xxxx , r3.wwww , r4.wwww 
mad r0.z , r0.zzzz , r0.yyyy , r2.wwww 
add r0.w , r0.wwww , r0.zzzz 
mad_sat r0.z , r0.wwww , -c1.xxxx , -c1.xxxx 
mad r0.w , r0.zzzz , c5.xxxx , c5.yyyy 
mul r0.y , v0.yyyy , v0.yyyy 
mul r0.z , r0.zzzz , r0.zzzz 
mul r0.y , r0.yyyy , v0.yyyy 
mad oC0.xy , r0.wwww , r0.zzzz , -r0.yyyy 
mov oC0.zw , -c2.yyyy

This is what GPUSA says:

Code:
Shader Version = 3.0
Instruction Count = 508
ALU Instructions = 447, Texture Instructions = 48, ALU:Texture Ratio = 9.31
Constant Register Count = 5130
Temp Register Count = 8, Sampler Register Count = 16, Input Register Count = 10, Output Register Count = 5
Requires PS3.0
Uses Arbitrary Swizzle

On R600 this uses 16 registers, compiles to 244 instruction slots, of which 48 are texture instructions and runs at 241 MPixels per second.

Jawed
 
My guess, and this is merely a guess, is that NVIDIA has the capability to shutdown ALUs, but little else in the chip. Maybe they can shutdown entire clusters too, but my guess is at leas ton desktop parts they won't power off the TMUs when the ALUs are idle.

What this means is that when you stress the ALUs, none of the chip is powered off. When you stress the TMUs, the ALUs are powered off. When you stress ROPs or memory bandwidth, entire clusters are powered off.

So if this is right (and I won't claim it perfectly is for sure!) it would explain why ALUs are a worst-case scenario for NVIDIA and not ATI. R6xx can likely save power when the TMUs are idling, especially RV670 given its laptop-like power saving features! :)
 
So you're suggesting that the other PS and VS tests merely don't do as good a job of testing pure ALU power?

I'm in no position to do that, but it may be interesting to compare the ALU instructions / total instructions ratio in the other Shader tests... The Perlin test seems to do 88% of ALU operations.
 
So you're suggesting that the other PS and VS tests merely don't do as good a job of testing pure ALU power?
I guess other tests are bottlenecked by other things.

I've found some results:

http://forums.vr-zone.com/showthread.php?t=198459

8800GT - 145.33
8800GTS - 98.889

with scaling from 8800GTS that is in line with the theoretical ALU capabilities (46%).

R600 - 173:

http://www.vr-zone.com/?i=4946&s=14

So RV670 should be in the region of 180 I guess.

I'm in no position to do that, but it may be interesting to compare the ALU instructions / total instructions ratio in the other Shader tests... The Perlin test seems to do 88% of ALU operations.
On R600 it's 197:48 instruction slots (I was 1 short in my earlier posting), that's ~4:1 ALU:TEX, which, because the R600 is a 4:1 architecture, comes out as 1.03:1. Though the conditional statements in the code can increase that ratio somewhat (though not on R5xx for some reason? and hardly at all on RV630 and not at all on RV610).

Assuming 8800GT issues 168B pixel shader instructions per second (112*1500MHz), the 190M pixels per second fillrate (145*1280*1024) means the shader is around 880 instruction slots (similar calculation for R600 produces about 6% error and driver versions aren't the same).

---

I've discovered:

Feature test - Pixel Shader
One of the more complex materials in the graphics tests is the rock face shader. This is separated to a feature test, showing the lighting change on the rough surface. There are no dynamic shadows, which makes some space for additional instructions compared to the similar shader in Canyon Flight. There is also no water surface, just the rock face. Filling the screen with a rock face is naturally fairly fast, compared to the graphics test showing huge amounts of that rock face in addition to water, the air ship and sea monster. This test will be somewhat bandwidth dependent, since any game like material with a complex shader like this will also have a number of lookups to large textures. It seems like most PC games will mainly utilize to normal color maps that have been made during development, instead of burdening the pixel shader with creating procedural textures.
which doesn't show much sign of being an ALU test - it scales by 33% from 8800GTS to 8800GT.

I think this is the code for it:

Code:
ps_3_0
 
def c10 , 2.000000000000000000000000, -1.000000000000000000000000, 0.250000000000000000000000, 16.000000000000000000000000
dcl_texcoord0  v0.xy 
dcl_texcoord1  v1.xyz 
dcl_texcoord2  v2.xyz 
dcl_texcoord3  v3.xyz 
dcl_texcoord4  v4.xyz 
dcl_texcoord7  v5.xyz 
dcl_2d s0 
dcl_2d s1 
dcl_2d s2 
dcl_2d s3 
dcl_2d s4 
dcl_2d s5 
dcl_2d s6 
dcl_2d s7 
dcl_2d s8 
dcl_2d s9 
dcl_cube s10 
dcl_2d s11 
mul r0.xy , v0 , c2 
texld r0 , r0 , s0 
mul r1.xy , v0 , c3 
texld r1 , r1 , s1 
add r0 , r0 , r1 
mul r1.xy , v0 , c4 
texld r1 , r1 , s2 
add r0 , r0 , r1 
mul r1.xy , v0 , c5 
texld r1 , r1 , s3 
mul r2.xy , v0 , c6 
texld r2 , r2 , s4 
mad_pp r3.xyz , c10.xxxx , r2.wyzw , c10.yyyy 
mul r2.xy , v0 , c7 
texld r2 , r2 , s5 
mad_pp r2.xyz , c10.xxxx , r2.wyzw , r3 
add_pp r3.xyz , r2 , c10.yyyy 
mul r2.xy , v0 , c8 
texld r2 , r2 , s6 
mad_pp r2.xyz , c10.xxxx , r2.wyzw , r3 
add_pp r3.xyz , r2 , c10.yyyy 
mul r2.xy , v0 , c9 
texld r2 , r2 , s7 
mad_pp r2.xyz , c10.xxxx , r2.wyzw , r3 
add_pp r2.xyz , r2 , c10.yyyy 
add r0 , r0 , r1 
mul_pp r6.xyz , r2 , c10.zzzz 
mul_pp r2 , r0 , c10.zzzz 
dp3_pp r0.x , r6 , v1 
dp3_pp r0.y , r6 , v2 
dp3_pp r0.z , r6 , v3 
texld_pp r0 , r0 , s10 
texld_pp r1 , v0 , s9 
mul_pp r0 , r0 , r1.xxxx 
mul_pp r0 , r2 , r0 
nrm_pp r3.xyz , v5 
dp3_sat_pp r1.z , r6 , r3 
add_sat_pp r1.w , r3.zzzz , r3.zzzz 
mul_pp r1.z , r1.zzzz , r1.wwww 
nrm_pp r7.xyz , v4 
mov r4.xyz , c0 
mul_pp r5.xyz , r4 , c1.xxxx 
add_pp r8.xyz , r3 , r7 
mul_pp r4.xyz , r1.zzzz , r5 
nrm_pp r3.xyz , r8 
dp3_pp r8.x , r6 , r3 
dp3_pp r8.y , r6 , r6 
texld_pp r3 , r8 , s11 
rsq_pp r1.z , r8.yyyy 
mul_pp r6.xyz , r6 , r1.zzzz 
dp3_sat_pp r1.z , r6 , r7 
mul_pp r1.w , r1.wwww , r3.xxxx 
add_pp r1.z , -r1.zzzz , -c10.yyyy 
mul_pp r2.xyz , r2 , r4 
mul_pp r1.w , r1.wwww , r1.zzzz 
mul_sat_pp r4.xyz , r1.xxxx , r2 
mul_pp r3.xyz , r5 , r1.wwww 
texld r2 , v0 , s8 
mul_pp r5.xyz , r2 , c10.wwww 
add r2.xyz , r0 , r4 
mul_pp r3.xyz , r3 , r5 
mad r2.xyz , r4 , -r0 , r2 
mul_sat_pp r0.xyz , r1.xxxx , r3 
add r1.xyz , r2 , r0 
mov_pp oC0.w , r0.wwww 
mad oC0.xyz , r0 , -r2 , r1

GPUSA says:

Code:
Shader Version = 3.0
Instruction Count = 85
ALU Instructions = 54, Texture Instructions = 12, ALU:Texture Ratio = 4.50
Constant Register Count = 4471
Temp Register Count = 9, Sampler Register Count = 16, Input Register Count = 14, Output Register Count = 5
Requires PS3.0
Uses Partial Precision
Uses Arbitrary Swizzle

It's 45 instruction slots, 12 texture instructions, so an effective ratio of 8.25:12 ALU:TEX (though dynamic branching worst case, according to GPUSA, brings that up to almost 1:1).

So, if that's the test showing "anomalous" power consumption, then we could be simply looking at the higher performance of NVidia GPUs (guessing ~30%), with the ATI GPUs perhaps being texture rate limited...

Jawed
 
I can tell you that in practice in real games that my 8800GTX's heat output varies a lot. I just listen to the fan. It's interesting; depending on what you are looking at in the game, the fan will change speed. :) Some games will make it run full speed almost the entire time, while others may not get it to ramp up much at all. I assume this is all caused by the ALU load....
 
Back
Top