MDolenc's Fillrate Tester: GF FX FP32/FP16 Performance

Dave -> it seems that by default NVIDIA decrease precision with GeForce FX 5800/600/200 but not with FX5900. Maybe you have not benchmarked the same thing ? So the difference between FX5800 and FX5900 could be bigger ?

Antoher possibility : maybe that the new units of the FX5900 have some limitations. So the shader of mdolenc's fillrate tester could not be able to use these new units.

I think you should use the mandelbrot demo of humus to make sure that the precision is the same :)
 
And I'd suggest you to use my Dawn patches with FRAPS ;) Yes, yes, I know, I'm annoying.

But IMO, it's a better test than all those DX9 things, because of all the doubts about precision with the DetFX in DX9.
With Dawn, however, it's all proprietary extensions. And nVidia obviously wouldn't cheat in their own demos.

Although discovering the exact happenings of precision in general DX9 programs would be a good thing, too...


Uttar
 
Uttar said:
And I'd suggest you to use my Dawn patches with FRAPS ;) Yes, yes, I know, I'm annoying.

But IMO, it's a better test than all those DX9 things, because of all the doubts about precision with the DetFX in DX9.
With Dawn, however, it's all proprietary extensions. And nVidia obviously wouldn't cheat in their own demos.

;)

Using NVIDIA's stuff to discover what they don't really want we know is 8)
 
Very true, Uttar; why wouldn't Nvidia use the most optimal code and extensions for its processors?

By the way, you just won't let go of Dawn will you ;) .

It is also true that Nvidia seems to default to lower precision with anything less than NV35. If these 5800 ultra scores are obtained from defaulting to fX12 or fp16 precision, in the driver, the 5900's scores are much more impressive.
 
MDolenc said:
Dave: Yes this test does use 2 texture lookups (and 3 registers)...

Well, looking at the texture performance differences relative to fillrate, the added bandwidth of NV35 may account for a small increase in the PS tests as well, although, not greatly. However, I think better registry handling may be one of the reasons for the increased performance

MDolenc said:
What does PS_2_0 - Simple test say?

Wassat you say :?:

kid_crisis said:
Dave, were the 5800 and 5900 tested at stock clocks, i.e. 500 and 450 MHz respectively?

Yes.

Tridam said:
Dave -> it seems that by default NVIDIA decrease precision with GeForce FX 5800/600/200 but not with FX5900. Maybe you have not benchmarked the same thing ? So the difference between FX5800 and FX5900 could be bigger ?

From a marketting stand point that makes zero sense. NV30 is dead, burried and they are trying their hardest to forget it - if you want to show that 5900 is 2X better than its predecessor then you would want to do entirely the opposite.

Tridam said:
Antoher possibility : maybe that the new units of the FX5900 have some limitations. So the shader of mdolenc's fillrate tester could not be able to use these new units.

Well, I can think of no other architecture where you add more units and you hardly get much of a performance increase - would there be much point in implementing units that are limited siuch that applications can't take advantage of them?

Tridam said:
I think you should use the mandelbrot demo of humus to make sure that the precision is the same :)

If I can, I think I will.
 
I think Humus' Mandelbrot demo comes in both OpenGL and DirectX versions, so you can see how shader precision is handled on the FX cards in both APIs.
 
Here are some numbers from Wavey's run of MDolnec's application which we can use to evaluate NV35:
__________5800 ultra_____5900 ultra__% Difference (stock)__% Difference (clock for clock)
FP PS 2.0____121.043259____149.738754____ 23.7068096____33.70680965
PP PS 2.0____163.160095____181.771698____ 11.40695769___21.40695769
 
I guess I should mention that my precision test program works only with AA & AF turned off (Actually AF shouldn't matter because I don't use textures in precision test, but AA can bring troubles since R300's AA is gamma corrected). And I have no idea why there are results as 0/8 bits or 245 bits :)

The 16 bits results of Radeon 9700 is correct because Radeon's 24 bits FP is s7e16. 10 bits of GeForce FX means it uses 16 bits FP (s5e10). I remembered my friend tested it on a NV35 and gives 23 bits, means it uses 32 bits FP (s8e23). However, all NV30/31/34 seems to report 10 bits for both (normal and _pp) modes. Driver version is also important. Did anyone test it with the latest detonator?
 
DaveBaumann said:
Tridam said:
Dave -> it seems that by default NVIDIA decrease precision with GeForce FX 5800/600/200 but not with FX5900. Maybe you have not benchmarked the same thing ? So the difference between FX5800 and FX5900 could be bigger ?

From a marketting stand point that makes zero sense. NV30 is dead, burried and they are trying their hardest to forget it - if you want to show that 5900 is 2X better than its predecessor then you would want to do entirely the opposite.

If you only think about NV30, you're right. But NV30/31/34 use the same pipeline and the same "optimizations". NVIDIA don't want that the entire FX line looks desesperate. For this reason, they had to decrease the precision. Every details I have show that it is the case. If you make some precision test with NV30 and NV35, you'll see it. DirectX 9 default precision is FP24 (-> FP32 for the GeFFX boards). NVIDIA change it in FP16 or FX12 (maybe FX12 with a tweaked range).

DaveBaumann said:
Tridam said:
Antoher possibility : maybe that the new units of the FX5900 have some limitations. So the shader of mdolenc's fillrate tester could not be able to use these new units.

Well, I can think of no other architecture where you add more units and you hardly get much of a performance increase - would there be much point in implementing units that are limited siuch that applications can't take advantage of them?

Maybe they are dependency issues ? Maybe the new units (it they exist) can't do every instructions ?
 
Can someone test the 5800 ultra with pcchen's precision app. If the precisin app indicates 10 bits, would that mean that regardless of full precision (i.e. MDolnec's fillrate app) the drivers forces fp16?
 
If the precisin app specifies 10 bits, would that mean that, even when full precision is specified (i.e. MDolnec's fillrate app) the precision remains fp16?

Don't bet on that for this app.
 
DaveBaumann said:
From a marketting stand point that makes zero sense. NV30 is dead, burried and they are trying their hardest to forget it - if you want to show that 5900 is 2X better than its predecessor then you would want to do entirely the opposite.

I gotta agree with you there. Of course, they may want to try to make people believe the NV35 got higher IQ too or somet stuff liek that, but they obviously haven't, so yes, it makes zero sense.

From my understanding, the NV35 cannot really do FX12 operations in the fragment shader. Only FP16/FP32. That means if they defaulted to FX12, you might get FP16 on the NV35: and that would result in increased IQ.

Frankly, it's hard to say exactly what nVidia is doing for DX9 / OpenGL ARB Paths. Heck, they could be doing insane stuff like forcing FX12 when in Performance and forcing FP16 in Quality. There are a LOT of possibilities. I insist that to discover the true performance of FX12/FP16/FP32 on the NV30 & NV35, you've got to use nVidia's proprietary OpenGL extensions.


Uttar
 
Uttar said:
Frankly, it's hard to say exactly what nVidia is doing for DX9 / OpenGL ARB Paths. Heck, they could be doing insane stuff like forcing FX12 when in Performance and forcing FP16 in Quality. There are a LOT of possibilities. I insist that to discover the true performance of FX12/FP16/FP32 on the NV30 & NV35, you've got to use nVidia's proprietary OpenGL extensions.

To get the true performance you need a shader that a) isn't bandwidth limited and b) makes it visually obvious what precision the shader is really running on. That's why I like Humus' Mandelbrot shader for this purpose.
 
Why doesn't someone try out Humus' demo (anyone with the 5800 ultra, 5900 ultra, or both), and record the fps with fraps for a comparison.
 
FYI here are some perfs of Humus Mandelbrot demo in OGL mode on 5600 (standard) & 9600 Pro :

5600 / OGL : Good quality (except the red in the background), 31 fps
9600 Pro / OGL : Good quality, 59 fps
 
Here are some other screenshot

http://www.hardware.fr/marc/megazoom.rar

I use a very big zoom to have this shot (it will be cool if we can have a zoom counter humus :) ). So as you can see, the GFX have better precision in OGL than ATI solution, in fact i think that's FP32 vs FP24. Please do not look the fps counter on this one because i use 9700 for ATI (easier with 2 PC at the same time with this sort of zoom)
 
Back
Top