MDolenc's Fillrate Tester: GF FX FP32/FP16 Performance

Reverend · May 30, 2003

Let's re-visit this topic (specifically wrt NV35) when Det50.xx are released.

pcchen · May 30, 2003

Uttar said:
Pcchen: Just a quick question.
AFAIK, both FP16 and FX12 got a 10-bit mantissa. Can your program differentiate both by not working well on FX or something? Or could we be getting sometimes FP16 results, and sometimes FX12 ones, when seeing 10-bit precision?

Are you sure FX12 got 10-bit mantissa?
Actually my shader may not work if using FX12 with 10 bits mantissa, since I use many large numbers (well, just 1 to 24 for "bits"). If FX12 has 10 bits mantissa, that means its range will be just -2 ~ +2, and it won't be able to store the number "10".

Of course, there are still some methods to differentiate fixed point and floating point. The simplest one is to divide by 2 for n times, and check when it becomes zero. If it's fixed point, it will quickly become zero for small n.

Marc · May 30, 2003

Ante P said:
I just realized what it is that differs from mine and hardware.fr results:

I'm using the 44.10 driver.
These are primarily workstation drivers and this is probably why nVidi didn't dare to do any such optimizations. Or perhaps future driver revision will also have this "fix".
(I'm testing this stuff on my gaming installation so I didn't realize that I had updated to 44.10, still using 44.03 on the test setup)
Sorry about that.

GeForce FX 5800 Ultra

Direct3D 9 Fillrate tester V0.3 [beta]
Copyright(c) 2003 Ping-Che Chen

Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s

Errrrrrrrr
Can you check with 44.03 please ? :]

Arun · May 30, 2003

Reverend said:
Let's re-visit this topic (specifically wrt NV35) when Det50.xx are released.

Hmm, are you saying that DetFX != Det50 ?

Uttar

Reverend · May 30, 2003

"DetonatorFX" is a marketing name for NVIDIA's drivers with the debut of their GeForceFX and CineFX architecture. 44.03 drivers are, for instance, already labelled as "DetonatorFX".

Internally, 50.xx version drivers will be interesting wrt FP precision (compared to the latest public WHQL 44.03s) but I am, ATM, unclear if it applies specifically to NV35 or the entire NV3x lineup (although I am curently trying to find out).

Tridam · May 30, 2003

pcchen said:
Uttar said:

Pcchen: Just a quick question.
AFAIK, both FP16 and FX12 got a 10-bit mantissa. Can your program differentiate both by not working well on FX or something? Or could we be getting sometimes FP16 results, and sometimes FX12 ones, when seeing 10-bit precision?

Click to expand...

Are you sure FX12 got 10-bit mantissa?
Actually my shader may not work if using FX12 with 10 bits mantissa, since I use many large numbers (well, just 1 to 24 for "bits"). If FX12 has 10 bits mantissa, that means its range will be just -2 ~ +2, and it won't be able to store the number "10".

NVIDIA stated that its FX12's range is (-2,2] so it has 10-bit mantissa.

pcchen said:
Of course, there are still some methods to differentiate fixed point and floating point. The simplest one is to divide by 2 for n times, and check when it becomes zero. If it's fixed point, it will quickly become zero for small n.

It could be nice if you can add something like that in your utility

and maybe use it not only to differentiate FX / FP but also to find the range

Neeyik · May 30, 2003

This still doesn't explain why you said you saw banding in the rthdribl demo, if the the 44.10 driver stays in high precision

Without having seen the original statement/question, I can't say whether this explains it or not but the rthdribl demo uses _pp hints for all shaders. On top of that, AFAIK the 44.10s don't offer support for FP render targets which is required for the demo - if it's not available, you end up with visual/performance losses with it scaling up & down from a integer render target.

Marc · May 30, 2003

Hi

So ... with 44.10 (even on XP French

) the quality is OK, NVIDIA use FP32 by default in D3D. I can force the partial precision with an own made mande.fsh with _pp . The quality is the same as with 44.03 by default ... and the performance are the same too !

As for the HDR demo use pp as you said

Clootie · May 30, 2003

Beware: NVIDIA watches for you!

Back to MDolenc's Fillrate Tester. I've made small experiments on GF FX 5600 / WinXP / 44.03.

Initial figures:

Code:

Fillrate Tester
--------------------------
Display adapter: NVIDIA GeForce FX 5600
Driver version: 6.14.10.4403
Display mode: 1024x768x32bpp
--------------------------

Color writes enabled, z-writes disabled:
FFP - Pure fillrate - 1210.959473M pixels/sec
FFP - Single texture - 980.372620M pixels/sec
FFP - Dual texture - 517.444458M pixels/sec
FFP - Triple texture - 241.109070M pixels/sec
FFP - Quad texture - 228.018875M pixels/sec
PS_2_0 - Per pixel lighting - 32.004135M pixels/sec
PS_2_0 PP - Per pixel lighting - 45.682018M pixels/sec
PS_1_1 - Simple - 279.485901M pixels/sec
PS_1_4 - Simple - 287.335602M pixels/sec
PS_2_0 - Simple - 279.499847M pixels/sec

Color writes disabled, z-writes enabled:
FFP - Pure fillrate - 1207.889160M pixels/sec
FFP - Single texture - 1202.526611M pixels/sec
FFP - Dual texture - 1202.577271M pixels/sec
FFP - Triple texture - 1199.467651M pixels/sec
FFP - Quad texture - 1196.579346M pixels/sec
PS_2_0 - Per pixel lighting - 1146.265625M pixels/sec
PS_2_0 PP - Per pixel lighting - 1146.285034M pixels/sec
PS_1_1 - Simple - 1208.223145M pixels/sec
PS_1_4 - Simple - 1208.140381M pixels/sec
PS_2_0 - Simple - 1208.115723M pixels/sec

1) Slightly modify SimplePS20.psh pixel shader:

Code:

ps_2_0
dcl v0
dcl v1

def c0, 0.3f, 0.7f, 0.2f, 0.4f

add_pp r0, c0, -v0   ; was: "add r0, c0, -v0"
add r0, r0, v1
mov oC0, r0

So this shader is potentially fater then previous, now look for performance:

Code:

PS_2_0 - Simple - 185.735550M pixels/sec

Hey, where mine 33% of performance!!!

The same can be done with 1.4 PShader.

2) Slightly modify PerPixelPS20.psh / PerPixelPS20PP.psh pixel shader:

Code:

ps_2_0

dcl t0.xy
dcl t1.xyz
dcl t2.xyz
dcl_2d s0
dcl_2d s1

// Normalize light vector
;dp3_pp r0.w, t1, t1
;rsq_pp r0.w, r0.w
;mul_pp r0.xyz, t1, r0.w
mov_pp r0.xyz, t1

// Normalize halfway vector
dp3_pp r1.w, t2, t2
rsq_pp r1.w, r1.w
mul_pp r1.xyz, t2, r1.w

// Load normal
texld_pp r2, t0, s1

// Specular light
dp3_pp r1, r1, r2
pow_pp r1.w, r1.w, c0.w
mul_pp r1.xyz, c0, r1.w

// Diffuse light
dp3_pp r2, r2, r0

// Load base texture
texld_pp r0, t0, s0
mad_pp r2, r0, r2, r1

mov_pp oC0, r2

You see under "Normalize light vector" 3 commands are commented out and 1 is added to not break the code. So we have 2 instructions smaller than original shader. It should be running faster, isn't it? Results:

Code:

PS_2_0 - Per pixel lighting - 32.257240M pixels/sec
PS_2_0 PP - Per pixel lighting - 32.257969M pixels/sec

So PS_2_0 results up predictable, but what's happened to PS_2_0 PP -- I've loosed 45% performance!!!

Conclusion: Every single high precision pixel shader benchmark is under aim. Beware, and look around often!

Joe DeFuria · May 30, 2003

Reverend said:
Let's re-visit this topic (specifically wrt NV35) when Det50.xx are released.

Of course we will. That's what we do here.

I hope you're not implying that we shouldn't be visiting this topic with the current drivers though...

Internally, 50.xx version drivers will be interesting wrt FP precision (compared to the latest public WHQL 44.03s)

Yeah, well, EVERY driver release from nVidia has been, uh, "interesting." Why break the trend?

but I am, ATM, unclear if it applies specifically to NV35 or the entire NV3x lineup (although I am curently trying to find out).

Hint: If your contact wasn't clear about it impacting more than NV35, chances are, it doesn't.

Luminescent · May 30, 2003

AnteP said:
Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s

Ante, is that the fillrate for the 5800 ultra using partial precision or full precision? The results seem a little weird in comparison to these:

Brent said:
I got this on the 5900 Ultra 44.03 drivers

Direct3D 9 Fillrate tester V0.3 [beta]
Copyright(c) 2003 Ping-Che Chen

Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 28.561 Mpix/s

Can someone run the precision app on the 5900 ultra with the 44.10 dets?

Ante P · May 30, 2003

Luminescent said:
AnteP said:

Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s

Click to expand...

Ante, is that the fillrate for the 5800 ultra using partial precision or full precision?

Can someone run the precision app on the 5900 ultra with the 44.10 dets?

I just ran this:
fillrate9.exe precision.txt

I dunno anything else about it.
Let me know what to try and I'll do it later.

Marc · May 30, 2003

Luminescent said:
AnteP said:

Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s

Click to expand...

Ante, is that the fillrate for the 5800 ultra using partial precision or full precision?

Can someone run the precision app on the 5900 ultra with the 44.10 dets?

Well with precision.txt it seem's that the only interesting number is the precision one

To test fillrate you have to call via a home made txt the ps 1.1 shader or the ps 2.0 shader (you can edit it to ask for partial precision). If you don't call any txt they seems that the probram call config.txt file (mix of PS2 and other things)

9700 Pro 3.4

Fillrate: 826.273 Mpix/s PS 1.1
Fillrate: 830.140 Mpix/s PS 2.0
Fillrate: 829.875 Mpix/s PS 2.0_PP
Fillrate: 860.974 Mpix/s (config.txt)

5800 Ultra 44.10

Fillrate: 993.601 Mpix/s PS 1.1
Fillrate: 310.966 Mpix/s PS 2.0
Fillrate: 310.962 Mpix/s PS 2.0_PP
Fillrate: 304.155 Mpix/s (config.txt)

By the way PC Chen, your test.bmp is absolutely wonderfull

Luminescent · May 30, 2003

Please, can someone run the exact same command as Marc on the 5900 ultra? Here it is again:

Marc said:
To test fillrate you have to call via a home made txt the ps 1.1 shader or the ps 2.0 shader (you can edit it to ask for partial precision). If you don't call any txt they seems that the probram call config.txt file (mix of PS2 and other things)

It would be nice to determine fp fillrate, with PS 2.0 full and partial precision, in comparison to the 5800 ultra, with pcchen's benchmark. Maybe this, along with MDolnec's app, will clue us in some more as to the number of fp units in comparison to NV35.

Marc · May 30, 2003

For the ps 1.1 / 2.0 / 2.0_pp the command is
fillrate9 config11.txt
In config11.txt just call for simple.pss for PS2.0 or simple11.pss for PS1.1
To have pp, edit the simple.pss and put the _pp tag on mov/mad/mul/texld/add/mov

MDolenc · May 30, 2003

Ok, I have just finished a new version of Fillrate Tester.

You can get it here.
For the reference:

Code:

Fillrate Tester
--------------------------
Display adapter: RADEON 9700 PRO
Driver version: 6.14.10.6334
Display mode: 1280x1024 A8R8G8B8 85Hz
Z-Buffer format: D24S8
--------------------------

FFP - Pure fillrate - 2437.309082M pixels/sec
FFP - Z pixel rate - 2215.062256M pixels/sec
FFP - Single texture - 2296.575439M pixels/sec
FFP - Dual texture - 1207.759521M pixels/sec
FFP - Triple texture - 672.142212M pixels/sec
FFP - Quad texture - 544.908936M pixels/sec
PS 1.1 - Simple - 1280.206055M pixels/sec
PS 1.4 - Simple - 1280.600342M pixels/sec
PS 2.0 - Simple - 1280.112915M pixels/sec
PS 2.0 PP - Simple - 1280.137207M pixels/sec
PS 2.0 - Longer - 643.497742M pixels/sec
PS 2.0 PP - Longer - 643.524292M pixels/sec
PS 2.0 - Longer 4 Registers - 643.548096M pixels/sec
PS 2.0 PP - Longer 4 Registers - 643.501831M pixels/sec
PS 2.0 - Per Pixel Lighting - 135.891296M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 135.904160M pixels/sec

And Per Pixel Lighting shader is a bit more complex then it was.

Marc · May 30, 2003

Can we have some test with some mix of FFP / PS 1.1 / PS 2.0 / PS2.0PP ?

Marc · May 30, 2003

Code:

Fillrate Tester
--------------------------
Display adapter: NVIDIA GeForce FX 5800 Ultra
Driver version: 6.14.10.4410
Display mode: 1280x1024 A8R8G8B8 85Hz
Z-Buffer format: D24S8
--------------------------

FFP - Pure fillrate - 1989.336548M pixels/sec
FFP - Z pixel rate - 3548.378174M pixels/sec
FFP - Single texture - 1650.513184M pixels/sec
FFP - Dual texture - 1435.574341M pixels/sec
FFP - Triple texture - 799.460815M pixels/sec
FFP - Quad texture - 785.500244M pixels/sec
PS 1.1 - Simple - 989.660278M pixels/sec
PS 1.4 - Simple - 624.859375M pixels/sec
PS 2.0 - Simple - 628.441650M pixels/sec
PS 2.0 PP - Simple - 628.443176M pixels/sec
PS 2.0 - Longer - 378.415833M pixels/sec
PS 2.0 PP - Longer - 378.290619M pixels/sec
PS 2.0 - Longer 4 Registers - 378.291901M pixels/sec
PS 2.0 PP - Longer 4 Registers - 378.287720M pixels/sec
PS 2.0 - Per Pixel Lighting - 60.525784M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 67.041069M pixels/sec

MDolenc · May 30, 2003

Why would you want mix everything into one test?

YeuEmMaiMai · May 30, 2003

Radeon 9500Pro clocked at at 366/306 running on an AMD Athlon XP 200+ machine with 512MB ram

Direct3D 9 Fillrate tester V0.3 [beta]
Copyright(c) 2003 Ping-Che Chen

percision.txt is not found.
Initialize: 640x480x24
No shader file specified. Perform accuracy test.
Precision: 16 bits
PP Precision: 16 bits
Fillrate: 89.915 Mpix/s
Fillrate Tester
--------------------------
Display adapter: RADEON 9700 PRO
Driver version: 6.14.10.6343
Display mode: 1024x768x32bpp
--------------------------

Color writes enabled, z-writes disabled:
FFP - Pure fillrate - 2009.819946M pixels/sec
FFP - Single texture - 1209.113037M pixels/sec
FFP - Dual texture - 857.301331M pixels/sec
FFP - Triple texture - 435.024872M pixels/sec
FFP - Quad texture - 328.841187M pixels/sec
PS_2_0 - Per pixel lighting - 208.357681M pixels/sec
PS_2_0 PP - Per pixel lighting - 206.452332M pixels/sec
PS_1_1 - Simple - 1401.218018M pixels/sec
PS_1_4 - Simple - 1398.612061M pixels/sec
PS_2_0 - Simple - 1385.526855M pixels/sec

Color writes disabled, z-writes enabled:
FFP - Pure fillrate - 2697.751953M pixels/sec
FFP - Single texture - 2663.690186M pixels/sec
FFP - Dual texture - 2621.514893M pixels/sec
FFP - Triple texture - 2610.928955M pixels/sec
FFP - Quad texture - 2561.338867M pixels/sec
PS_2_0 - Per pixel lighting - 2414.073242M pixels/sec
PS_2_0 PP - Per pixel lighting - 2402.789795M pixels/sec
PS_1_1 - Simple - 2717.369629M pixels/sec
PS_1_4 - Simple - 2726.308838M pixels/sec
PS_2_0 - Simple - 2725.478516M pixels/sec

MDolenc's Fillrate Tester: GF FX FP32/FP16 Performance

Reverend

pcchen

Moderator

Marc

Arun

Unknown.

Reverend

Tridam

Neeyik

Homo ergaster

Marc

Clootie

Joe DeFuria

Luminescent

Ante P

Marc

Luminescent

Marc

MDolenc

Marc

Marc

MDolenc

YeuEmMaiMai

Similar threads