3d Mark - do NVidia shift more load onto the CPU?

g__day

Regular
I just got to wondering with something NVidia said recently. Does anyone know - when you benchmark a GPU using 3d Mark 2003 v340 - comparing top end ATi and NVidia cards - do NVidia manage to offload more "traditional" GPU workloads onto the CPU than ATi do? IS there a way of measuring this?

As 3d Mark 2003 is GPU heavy - a big CPU might be running at times at less of full capacity. Has NVidia realised this and found a way to migrate some of the intended GPU workload to the CPU? When you run a test with ATi vs NVidia does the CPU reach higher utilisation benchmarking for NVidia or is it impossible to tell?

Do Futuremark have any position on IHV's altering the workload of what gets done by the CPU and what gets assigned to the GPU? If you could re-balance 5-10% of the workload if your CPU wasn't maxed - that might give you quite a nice edge I thought.
 
g__day said:
I just got to wondering with something NVidia said recently. Does anyone know - when you benchmark a GPU using 3d Mark 2003 v340 - comparing top end ATi and NVidia cards - do NVidia manage to offload more "traditional" GPU workloads onto the CPU than ATi do? IS there a way of measuring this?

As 3d Mark 2003 is GPU heavy - a big CPU might be running at times at less of full capacity. Has NVidia realised this and found a way to migrate some of the intended GPU workload to the CPU? When you run a test with ATi vs NVidia does the CPU reach higher utilisation benchmarking for NVidia or is it impossible to tell?

Do Futuremark have any position on IHV's altering the workload of what gets done by the CPU and what gets assigned to the GPU? If you could re-balance 5-10% of the workload if your CPU wasn't maxed - that might give you quite a nice edge I thought.
On a Radeon 9700 there would probably be relatively little to gain because the vertex shading performance is extremely high - on parts with weaker vertex shaders there could be some performance gains in 3DMark03 by shifting some geometry processing from the VPU to the CPU. Since 3DMark03 is focussed on trying to do as much work as possible on the VPU (and since you have a known, guaranteed workload) there are certainly opportunities to use some slack time on the CPU. In addition to this, by moving some geometry on to the CPU you might be able to bypass some repeated workload completely (like vertex skinning in shadow volume generation) by performing it statically. Again, on an 9700 any benefits from this would probably be pretty intangible because the vertex shading throughput is very high in general.

Load balancing optimisations are relatively easy to do if you choose a special case - in the more general case it is hard to load-balance effectively. The workload from frame to frame is not known, and the amount of CPU slack time may vary wildly with the amount of physics/AI being processed at the time. Unless you are very careful you could end up with significant performance pops, or jerky frame rates as you alter the amount of work done on the CPU. Alternatively you might simply hurt your performance by picking a poor distribution of work.
 
andypsi - totally agree. Given NVidia have the opportunity to monitor exact CPU / GPU loads with their cards in this benchmark - probably better than second by second - I wonder if their team found slack spots on big CPUs during some of the test and used a clever driver to spot thos shaders and transfer workloads to the CPU?
 
andypski said:
Load balancing optimisations are relatively easy to do if you choose a special case - in the more general case it is hard to load-balance effectively. The workload from frame to frame is not known, and the amount of CPU slack time may vary wildly with the amount of physics/AI being processed at the time. Unless you are very careful you could end up with significant performance pops, or jerky frame rates as you alter the amount of work done on the CPU. Alternatively you might simply hurt your performance by picking a poor distribution of work.

Aren't there surprising drops in GFFX framerates in some games? Could it be conceivable that what you mentioned above is the reason for this?
 
AFAIK most 3d mark cheats this time come from vertex shaders replacements. I doubt that they would replace a shader to have it run on a CPU. and looking at workstation performance they have pretty good geometry units. Which brings up a nice question: Are they DX9 compliant?
There was a question some time ago about 5200 waivers (spelling?) from Microsoft about DX9 compliance, but we (I) don't know if they relate to whole nv3x line. and, maybe, when asked for compliance, Nvidia implemented certain functions in software as it is an easy way out when dealing with geometry.
For "benchmarking purposes" of course they replaced those shaders with something "hand-tuned" to their architecture, running them on hardware. since 340 disabled shaders replacements (the unified compiler technology) they had to run them in software. I think there was a statement of sorts from Gainward, just that they didn't know they were talking about vertex shaders.
Of course everybody laughed about running PS on CPU.....
 
vb said:
AFAIK most 3d mark cheats this time come from vertex shaders replacements. I doubt that they would replace a shader to have it run on a CPU. and looking at workstation performance they have pretty good geometry units.
Workstation tests measure fixed-function TL performance, not vertex shader performance. The two cases are not necessarily equivalent.
 
andypski said:
Workstation tests measure fixed-function TL performance, not vertex shader performance. The two cases are not necessarily equivalent.

Vertex shader units have high performance in a very specific situation. I doubt they are not highly programable and I have a hunch they are not very DX9 compliant. nv1x and nv2x only used geometry units at their capacity as a workstation card. they had no reason to "tune" vertex units to dx performance. All this just asks for vertex shader replacement.
 
andypski said:
Workstation tests measure fixed-function TL performance, not vertex shader performance. The two cases are not necessarily equivalent.

Actually, that's a very good point. Some T&L and VS NV30/NV35/NV38 numbers never made sense in my book.
Either they got some additionnal fixed T&L units, or they got some amazing VS power in certain circumstances, among which fixed T&L, but that those aren't particularly easy to expose automatically; and then that'd mean they could do some replacements with hardware microcode improving VS speed quite a bit...

I'd suspect extra T&L units, although manually maybe it's possible to program the hardware so the T&L units are used to do specific things which could traditionally also be done in fixed T&L ( so if part can be done in fixed T&L, and part not, T&L+VS power is used for the part which can be done in it and VS power only is used for the other part ).

Just speculating here. Plus, anyway, I shouldn't be posting all this, gotta have to begin monitoring how much time I waste on these forums now ;) :p :)


Uttar
 
Tridam said:
I checked this last week : no CPU usage difference between v330 and v340 in Mother Nature.

if(Tridam has done the tsting correctly)
{
Thank You;
}
else
{
BOOOOOOOO! HISSSSSSSS! Go back and redo the testing! :mad:
}

:D

So how did you do the testing?
Don't tell me all you had done was run 3dmark in the background and view the task manager's performance tab?

That sounds like something I would do. :LOL:
 
What amount of overhead does the driver's compiler place on the CPU?

When it comes to analyzing instruction and register usage, is most of that done by the processor, or is some of the work done on the GPU?
 
3dilettante said:
What amount of overhead does the driver's compiler place on the CPU?

When it comes to analyzing instruction and register usage, is most of that done by the processor, or is some of the work done on the GPU?
I'm not too sure how that's relevant, since it doesn't do this during runtime, but when the shader is tossed to the driver, which is during the scene's initialization. Anyway, it's probably all done in software. Having such a compiler in silicon would be rather wasteful.
 
K.I.L.E.R said:
Tridam said:
I checked this last week : no CPU usage difference between v330 and v340 in Mother Nature.

if(Tridam has done the tsting correctly)
{
Thank You;
}
else
{
BOOOOOOOO! HISSSSSSSS! Go back and redo the testing! :mad:
}

:D

So how did you do the testing?
Don't tell me all you had done was run 3dmark in the background and view the task manager's performance tab?

That sounds like something I would do. :LOL:

:LOL: You won't have any result with the task manager. I mean it will show you 100% CPU usage whatever you're doing in 3dmark. (yeah I've tried this some months ago :LOL:)

My testing was this one : forcing 3dmark to skip pixel shading and running mother nature in 320x200. Results ? 4% difference between v330 (68,1) and v340 (65,4). It would have been stupid to move some work to the CPU to improve the score by 4% in 320x200 (-> for 1 more point in the 1024x768 global score).
 
May I suggest something? 8)

Measure CPU scaling. Ie take a rig with a fast CPU, run the benchmark, drop in a slow CPU, repeat, compare, voila: scaling. A multiplier-unlocked AMD rig would do the job nicely. Preferably at 640x480.

Then you need a reference to compare to ... ummm ATI? :p
 
Testing this wouldn't be simple - because I am wondering if NVidia use more available CPU power than ATi do. Hard to measure in real time what ATi or NVidia are consuming. Yes scaling a CPU for NVidia and ATi would show you the performance curve or line - but not necessarily the CPU load - measuring CPU temperature might even be more accurate.
 
So it's not 640x480. BOOHOO. 53.03, by the way. It was installed before, and it doesn't really matter (or at least I don't THINK it does... maybe it does, because this score is 24 lower than my 52.16 score).

Anyway, without further ado, 53.03 with a 5700 Ultra and a Barton at 2.2Ghz:
Code:
Game Tests
GT1 - Wings of Fury	154.0 fps
GT2 - Battle of Proxycon	25.9 fps
GT3 - Troll's Lair	21.2 fps
GT4 - Mother Nature	21.7 fps

CPU Tests
CPU Test 1	78.4 fps
CPU Test 2	12.8 fps

Feature Tests
Fill Rate (Single-Texturing)	1155.2 MTexels/s
Fill Rate (Multi-Texturing)	1524.0 MTexels/s
Vertex Shader	17.1 fps
Pixel Shader 2.0	20.9 fps
Ragtroll	14.0 fps

Now, 53.03 with a Barton set to 5 (lowest multipler possible) x 200, so 1Ghz:

Code:
Game Tests
GT1 - Wings of Fury	109.8 fps
GT2 - Battle of Proxycon	24.4 fps
GT3 - Troll's Lair	18.9 fps
GT4 - Mother Nature	20.1 fps

CPU Tests
CPU Test 1	42.2 fps
CPU Test 2	7.1 fps

Feature Tests
Fill Rate (Single-Texturing)	1155.9 MTexels/s
Fill Rate (Multi-Texturing)	1524.3 MTexels/s
Vertex Shader	16.0 fps
Pixel Shader 2.0	20.9 fps
Ragtroll	13.4 fps

Total score--3918 to 3372.

Since I did this, do me a favor and try it on an ATI card before drawing any conclusions. :p I'm off to go add that 1200Mhz back...
 
Best I could do on my K7N2-L for a low was 7x, highest I run her is 12x so I give you both. (Barton 2500+, 175FSB......Bubbles. ;) )

12x: http://service.futuremark.com/compare?2k3=1625924


GT1 - Wings of Fury 127.2
GT2 - Battle of Proxycon 21.3
GT3 - Troll's Lair 19.5
GT4 - Mother Nature 19.7

CPU Test 1 65.0
CPU Test 2 10.3

Fill Rate (Single-Texturing) 952.6
Fill Rate (Multi-Texturing) 1542.6


Vertex Shader 10.4
Pixel Shader 2.0 28.7
Ragtroll 13.9

No sounds 40.2
24 sounds 32.6
60 sounds Not Supported

7x: http://service.futuremark.com/compare?2k3=1625986

GT1 - Wings of Fury 113.5
GT2 - Battle of Proxycon 21.3
GT3 - Troll's Lair 19.0
GT4 - Mother Nature 19.7

CPU Test 1 42.8
CPU Test 2 6.7

Fill Rate (Single-Texturing) 952.5
Fill Rate (Multi-Texturing) 1542.6

Vertex Shader 10.4
Pixel Shader 2.0 28.7
Ragtroll 13.7

No sounds 30.9
24 sounds 22.5
60 sounds Not Supported

I gotta go re-boot, I can't handle the thought of running at 1.225Ghz....me wife's rig is faster! (Hmmm...mebbe I should do a quick 3dm2k1se run just for giggles... :| )
 
DW, the first link doesn't work.

Looking at the results, GT1 showing a difference isn't really surprising given it's DX7 background (Dave even mentions this test is more cpu dependent in his 3DMark article http://www.beyond3d.com/articles/3dmark03/intro/).

However, the fact that all 3 of the other tests give slightly different results on the 5700 is a little suspicious since Digi's 9600 scores don't change by more than .5 of a frame. Since these tests are more shader-bound, they really shouldn't be affected by cpu speed (at least in theory).

I would tentatively propose that either ATi's cards or drivers are less affected by system variables when the gpu/vpu is being stressed or there's something to the whole Forceware offloading to the cpu theory.

If someone could do the same or similar tests with the 52.16, we'll know if the drivers are responsible for the fluctuations.
 
Is off loading to the CPU wrong? well as long as nvidia are doing it at a low priority ( which probably wouldn't work because of high latence in the thread switch ) I don't think there is a problem its like saying all software T&L cards should be abolished. Of course there is one more little condition they would have to do it for all D3D apps.

The obivous solution to look at this is also to set up a preformance log on the cpu ( atleast this can be done in XP ) as well as the other tests that have been done.

Just something for interest the ATI boards general preform worse in the CPU tets so if anything lowering the cpu frequence should have a larger effect on ATI cards.
 
bloodbob said:
Is off loading to the CPU wrong? well as long as nvidia are doing it at a low priority ( which probably wouldn't work because of high latence in the thread switch ) I don't think there is a problem its like saying all software T&L cards should be abolished. Of course there is one more little condition they would have to do it for all D3D apps.

Lightening the workload of the GPU is outlawed by FutureMark if I remember correctly.
 
Back
Top