Uttar said:Pcchen: Just a quick question.
AFAIK, both FP16 and FX12 got a 10-bit mantissa. Can your program differentiate both by not working well on FX or something? Or could we be getting sometimes FP16 results, and sometimes FX12 ones, when seeing 10-bit precision?
ErrrrrrrrrAnte P said:I just realized what it is that differs from mine and hardware.fr results:
I'm using the 44.10 driver.
These are primarily workstation drivers and this is probably why nVidi didn't dare to do any such optimizations. Or perhaps future driver revision will also have this "fix".
(I'm testing this stuff on my gaming installation so I didn't realize that I had updated to 44.10, still using 44.03 on the test setup)
Sorry about that.
GeForce FX 5800 Ultra
Direct3D 9 Fillrate tester V0.3 [beta]
Copyright(c) 2003 Ping-Che Chen
Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s
Reverend said:Let's re-visit this topic (specifically wrt NV35) when Det50.xx are released.
pcchen said:Uttar said:Pcchen: Just a quick question.
AFAIK, both FP16 and FX12 got a 10-bit mantissa. Can your program differentiate both by not working well on FX or something? Or could we be getting sometimes FP16 results, and sometimes FX12 ones, when seeing 10-bit precision?
Are you sure FX12 got 10-bit mantissa?
Actually my shader may not work if using FX12 with 10 bits mantissa, since I use many large numbers (well, just 1 to 24 for "bits"). If FX12 has 10 bits mantissa, that means its range will be just -2 ~ +2, and it won't be able to store the number "10".
pcchen said:Of course, there are still some methods to differentiate fixed point and floating point. The simplest one is to divide by 2 for n times, and check when it becomes zero. If it's fixed point, it will quickly become zero for small n.
Without having seen the original statement/question, I can't say whether this explains it or not but the rthdribl demo uses _pp hints for all shaders. On top of that, AFAIK the 44.10s don't offer support for FP render targets which is required for the demo - if it's not available, you end up with visual/performance losses with it scaling up & down from a integer render target.This still doesn't explain why you said you saw banding in the rthdribl demo, if the the 44.10 driver stays in high precision
Fillrate Tester
--------------------------
Display adapter: NVIDIA GeForce FX 5600
Driver version: 6.14.10.4403
Display mode: 1024x768x32bpp
--------------------------
Color writes enabled, z-writes disabled:
FFP - Pure fillrate - 1210.959473M pixels/sec
FFP - Single texture - 980.372620M pixels/sec
FFP - Dual texture - 517.444458M pixels/sec
FFP - Triple texture - 241.109070M pixels/sec
FFP - Quad texture - 228.018875M pixels/sec
PS_2_0 - Per pixel lighting - 32.004135M pixels/sec
PS_2_0 PP - Per pixel lighting - 45.682018M pixels/sec
PS_1_1 - Simple - 279.485901M pixels/sec
PS_1_4 - Simple - 287.335602M pixels/sec
PS_2_0 - Simple - 279.499847M pixels/sec
Color writes disabled, z-writes enabled:
FFP - Pure fillrate - 1207.889160M pixels/sec
FFP - Single texture - 1202.526611M pixels/sec
FFP - Dual texture - 1202.577271M pixels/sec
FFP - Triple texture - 1199.467651M pixels/sec
FFP - Quad texture - 1196.579346M pixels/sec
PS_2_0 - Per pixel lighting - 1146.265625M pixels/sec
PS_2_0 PP - Per pixel lighting - 1146.285034M pixels/sec
PS_1_1 - Simple - 1208.223145M pixels/sec
PS_1_4 - Simple - 1208.140381M pixels/sec
PS_2_0 - Simple - 1208.115723M pixels/sec
ps_2_0
dcl v0
dcl v1
def c0, 0.3f, 0.7f, 0.2f, 0.4f
add_pp r0, c0, -v0 ; was: "add r0, c0, -v0"
add r0, r0, v1
mov oC0, r0
PS_2_0 - Simple - 185.735550M pixels/sec
ps_2_0
dcl t0.xy
dcl t1.xyz
dcl t2.xyz
dcl_2d s0
dcl_2d s1
// Normalize light vector
;dp3_pp r0.w, t1, t1
;rsq_pp r0.w, r0.w
;mul_pp r0.xyz, t1, r0.w
mov_pp r0.xyz, t1
// Normalize halfway vector
dp3_pp r1.w, t2, t2
rsq_pp r1.w, r1.w
mul_pp r1.xyz, t2, r1.w
// Load normal
texld_pp r2, t0, s1
// Specular light
dp3_pp r1, r1, r2
pow_pp r1.w, r1.w, c0.w
mul_pp r1.xyz, c0, r1.w
// Diffuse light
dp3_pp r2, r2, r0
// Load base texture
texld_pp r0, t0, s0
mad_pp r2, r0, r2, r1
mov_pp oC0, r2
PS_2_0 - Per pixel lighting - 32.257240M pixels/sec
PS_2_0 PP - Per pixel lighting - 32.257969M pixels/sec
Reverend said:Let's re-visit this topic (specifically wrt NV35) when Det50.xx are released.
Internally, 50.xx version drivers will be interesting wrt FP precision (compared to the latest public WHQL 44.03s)
but I am, ATM, unclear if it applies specifically to NV35 or the entire NV3x lineup (although I am curently trying to find out).
Ante, is that the fillrate for the 5800 ultra using partial precision or full precision? The results seem a little weird in comparison to these:AnteP said:Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s
Can someone run the precision app on the 5900 ultra with the 44.10 dets?Brent said:I got this on the 5900 Ultra 44.03 drivers
Direct3D 9 Fillrate tester V0.3 [beta]
Copyright(c) 2003 Ping-Che Chen
Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 28.561 Mpix/s
Luminescent said:Ante, is that the fillrate for the 5800 ultra using partial precision or full precision?AnteP said:Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s
Can someone run the precision app on the 5900 ultra with the 44.10 dets?
Well with precision.txt it seem's that the only interesting number is the precision oneLuminescent said:Ante, is that the fillrate for the 5800 ultra using partial precision or full precision?AnteP said:Initialize: 1024x768x24
No shader file specified. Perform accuracy test.
Precision: 23 bits
PP Precision: 10 bits
Fillrate: 33.306 Mpix/s
Can someone run the precision app on the 5900 ultra with the 44.10 dets?
It would be nice to determine fp fillrate, with PS 2.0 full and partial precision, in comparison to the 5800 ultra, with pcchen's benchmark. Maybe this, along with MDolnec's app, will clue us in some more as to the number of fp units in comparison to NV35.Marc said:To test fillrate you have to call via a home made txt the ps 1.1 shader or the ps 2.0 shader (you can edit it to ask for partial precision). If you don't call any txt they seems that the probram call config.txt file (mix of PS2 and other things)
Fillrate Tester
--------------------------
Display adapter: RADEON 9700 PRO
Driver version: 6.14.10.6334
Display mode: 1280x1024 A8R8G8B8 85Hz
Z-Buffer format: D24S8
--------------------------
FFP - Pure fillrate - 2437.309082M pixels/sec
FFP - Z pixel rate - 2215.062256M pixels/sec
FFP - Single texture - 2296.575439M pixels/sec
FFP - Dual texture - 1207.759521M pixels/sec
FFP - Triple texture - 672.142212M pixels/sec
FFP - Quad texture - 544.908936M pixels/sec
PS 1.1 - Simple - 1280.206055M pixels/sec
PS 1.4 - Simple - 1280.600342M pixels/sec
PS 2.0 - Simple - 1280.112915M pixels/sec
PS 2.0 PP - Simple - 1280.137207M pixels/sec
PS 2.0 - Longer - 643.497742M pixels/sec
PS 2.0 PP - Longer - 643.524292M pixels/sec
PS 2.0 - Longer 4 Registers - 643.548096M pixels/sec
PS 2.0 PP - Longer 4 Registers - 643.501831M pixels/sec
PS 2.0 - Per Pixel Lighting - 135.891296M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 135.904160M pixels/sec
Fillrate Tester
--------------------------
Display adapter: NVIDIA GeForce FX 5800 Ultra
Driver version: 6.14.10.4410
Display mode: 1280x1024 A8R8G8B8 85Hz
Z-Buffer format: D24S8
--------------------------
FFP - Pure fillrate - 1989.336548M pixels/sec
FFP - Z pixel rate - 3548.378174M pixels/sec
FFP - Single texture - 1650.513184M pixels/sec
FFP - Dual texture - 1435.574341M pixels/sec
FFP - Triple texture - 799.460815M pixels/sec
FFP - Quad texture - 785.500244M pixels/sec
PS 1.1 - Simple - 989.660278M pixels/sec
PS 1.4 - Simple - 624.859375M pixels/sec
PS 2.0 - Simple - 628.441650M pixels/sec
PS 2.0 PP - Simple - 628.443176M pixels/sec
PS 2.0 - Longer - 378.415833M pixels/sec
PS 2.0 PP - Longer - 378.290619M pixels/sec
PS 2.0 - Longer 4 Registers - 378.291901M pixels/sec
PS 2.0 PP - Longer 4 Registers - 378.287720M pixels/sec
PS 2.0 - Per Pixel Lighting - 60.525784M pixels/sec
PS 2.0 PP - Per Pixel Lighting - 67.041069M pixels/sec