DX Precision Benchmark...

Dave Baumann

Gamerscore Wh...
Moderator
Legend
Is anyone up to the challenge of making a DX Pixel Shader precision benchmark? I'd lik a test where I can select which precision is being used on the shaders so it will default to 'high precision' but I can select 'low' -- the low precision would use the _PP DX precision hint. I might be interesting to see the performance difference between FP16 and FP32.

(oh, and if it would be at all possible to make it such that the precision differences were visble that would be a big boon - although I understand this would be difficult given that both FP16 and FP32 offer FAR greater precision than we've seen before).
 
I was thinking about modifying my fillrate test program to accept pixel shaders. It can test the performance (fillrate) of a certain shader on a certain video card. However, that means I have to modify the program to use DX9, and I am not sure how many people are interested in such "theoretical" benchmark...
 
pcchen said:
However, that means I have to modify the program to use DX9, and I am not sure how many people are interested in such "theoretical" benchmark...

I think you'd be surprised!

But we would for one! :)
 
Actually showing the precision issues shouldn't be too hard....

The biggest issue would be trying to test it, no card honours partial precision AFAIK.

Some thing like a mandlebrot, should visually expose the precision its running at.
 
One quick thing, however:
For shader throughput, it would probably be best to try to make a relatively long shader that doesn't use much memory bandwidth (I'd say 10-50 instructions should do it pretty well). This should test the computational power much better than if you merely translate the fixed-function fillrate test over to the pixel shaders.

In order to attempt to remove driver inefficiencies from the equation (which the FX certainly appears to have many of at the moment), how about a benchmark with a power of 2 number of instructions (shouldn't make any difference, but what the heck? Might as well make sure), and pretty much nothing but madd's.
 
Mandelbrot is exacly a low bandwidth / high computational power example, so that's cool.
Actually you could use Humus's Mandelbrot demo directly.
 
Joe DeFuria said:
I am not sure how many people are interested in such "theoretical" benchmark...

<Raises Hand!>
iwstupidl.gif


That would be great, preferbly have it ready by the beginning of next week so that we can use it when we benchmark the xxxxxx ;)
 
And while you smartie-brains are at it, include a proggie that lets us know how many pixel shader ops (vectors, scalars) can be executed per clock per pipeline.
 
Good ideas. Also, include a benchmark that can tell whether a company is "creatively fudging" its specs. You can have an "overall" score of "Fudgey" or "Holds Water." ;)

/"Pete"
 
Chalnoth said:
For shader throughput, it would probably be best to try to make a relatively long shader that doesn't use much memory bandwidth (I'd say 10-50 instructions should do it pretty well). This should test the computational power much better than if you merely translate the fixed-function fillrate test over to the pixel shaders.

... or even better yet: make 4 shaders while 'we' are at it:

One as Chalnoth suggest for computational power with a fairly long shader with very few texel/data fetches from memory.

One for computational power/bandwidth balance test with a medium long shader with a reasonable number of texel/data fetches from memory.

One for computational power/bandwidth balance test with a medium long shader with some heavy high precision data fetches from memory (like 16bit normal maps etc).

And one for computational power/FP precision test with a shader the demand FP 16/24/32 with a low number of texel/data fetches from memory.
 
DaveBaumann said:
Is anyone up to the challenge of making a DX Pixel Shader precision benchmark? I'd lik a test where I can select which precision is being used on the shaders so it will default to 'high precision' but I can select 'low' -- the low precision would use the _PP DX precision hint. I might be interesting to see the performance difference between FP16 and FP32.

(oh, and if it would be at all possible to make it such that the precision differences were visble that would be a big boon - although I understand this would be difficult given that both FP16 and FP32 offer FAR greater precision than we've seen before).

ShaderMark supports high and low precision dx9 2.0 shaders, however I don't have a geforce fx, so I couldn't test it and see, if it makes a difference (it has no effect on the radeon 9700, but there is still a rendering difference(not the driver bug in the last shader) between the radeon 9700 and the ref. rast.). I'am currently working on 3d-analyze 2.1 with an option, to force existing apps/shaders (2.0 and higher) to change the precision by adding the _pp modifier when the shader is created.

Regards,
Thomas
 
My idea is a "theoretical" shader performance tester, just like my fillrate tester. So it should be like what Reverend described. I want to see how video chips handle some shaders, such as dependent texture access.
 
I hope someone, Dave or anyone with an FX tries this.

Humus' Mandelbrot demo is a shader-compute-bound (a lot of vector multiply/adds, only one texture lookup) demonstration program, written for standard DX9.0. It's pretty easy to tell what precision it is actually running at by looking (i.e., will show more detail at FP32 than FP24, and more at FP24 than FP16).

It would be great to run it on an FX with current drivers to compare performance with a 9700 and to see if current DX9 drivers force shaders to 16 bits (if it has less detail than 9700 it is forcing FP16).

If it is running with FP32, than you can try running it with FP16 by using this version of the shader instead. Just copy this modified file over the mandel.fsh in Humus' demo. This version uses the DX9 partial precision hint to force all the calcs to FP16. If it really running at FP16, there should be visibly less detail than the FP32 version. Then you can compare framerates and get an idea how much faster FP16 runs over FP32 on the FX.

Code:
ps.2.0

dcl		t0
dcl_2d	s0

def		c0, 0.0, 1.0, 8.0, 0.0

mad_pp		r2.xy, t0.x, t0, t0
mad_pp		r1.x, -t0.y, t0.y, r2.x
mad_pp		r1.y,  t0.x, t0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y

mad_pp		r2.xy, r1.x, r1, t0
mad_pp		r0.x, -r1.y, r1.y, r2.x
mad_pp		r0.y,  r1.x, r1.y, r2.y

mad_pp		r2.xy, r0.x, r0, t0
mad_pp		r1.x, -r0.y, r0.y, r2.x
mad_pp		r1.y,  r0.x, r0.y, r2.y
mov_pp		r1.z, c0.x

dp3_sat		r0, r1, r1
mul_pp		r0.x, r0.x, c0.z

exp_pp		r0.x, -r0.x
sub_pp		r0, c0.y, r0

texld	r0, r0, s0

mov		oC0, r0
 
Back
Top