shadermark 2.0 tomorrow!

I benched this Yesterday lunchtime,

It look to be a good test of Pixel Shader performance.

Here is my test on a ATI 9800np (stock clocks) ATI CAT 3.7's.


Code:
ShaderMark v2.0 - DirectX 9 HLSL Pixel Shader Benchmark - ToMMTi-Systems ([url]http://www.tommti-systems.com[/url])

video mode / device info
(1024x768) X8R8G8B8 (D24X8) vram used 137363456
HAL (pure hw vp): RADEON 9800 (Anti-Detect-Mode: off, gamma correction: DAC)



options

pixel shader version: 2_0
partial precision: off
number of render targets: 1

results:
shader  1 (                                           Pixel Shader Precision Test): pixel shader 1.1 full precision: s16e7
          (                                           Pixel Shader Precision Test): pixel shader 1.4 full precision: s16e7
          (                                           Pixel Shader Precision Test): pixel shader 2.0 full precision: s16e7
          (                                           Pixel Shader Precision Test): pixel shader 2.0 partial precision: s16e7

shader  2 (                                            Per Pixel Diffuse Lighting):	 199 fps	  5.0230 mspf	     996 rendered frames
shader  3 (                            Per Pixel Directional Light Shader (Phong)):	 137 fps	  7.2943 mspf	     686 rendered frames
shader  4 (                                  Per Pixel Point Light Shader (Phong)):	 140 fps	  7.1498 mspf	     700 rendered frames
shader  5 (                                   Per Pixel Spot Light Shader (Phong)):	 112 fps	  8.9585 mspf	     559 rendered frames
shader  6 (                                        Per Pixel Anisotropic Lighting):	 146 fps	  6.8533 mspf	     730 rendered frames
shader  7 (                                         Per Pixel Fresnel Reflections):	 127 fps	  7.9020 mspf	     633 rendered frames
shader  8 (                             Per Pixel BRDF-Phong/Anisotropic Lighting):	 109 fps	  9.1533 mspf	     547 rendered frames
shader  9 (                                          Per Pixel Car Surface Shader):	  99 fps	 10.0682 mspf	     497 rendered frames
shader 10 (                                         Per Pixel Environment Mapping):	 204 fps	  4.9077 mspf	    1019 rendered frames
shader 11 (                                    Per Pixel Environment Bump Mapping):	 177 fps	  5.6466 mspf	     886 rendered frames
shader 12 (                                                Per Pixel Bump Mapping):	 120 fps	  8.3587 mspf	     599 rendered frames
shader 13 (                                       Per Pixel Shadowed Bump Mapping):	  76 fps	 13.2301 mspf	     378 rendered frames
shader 14 (                                        Per Pixel Veined Marble Shader):	  77 fps	 12.9118 mspf	     388 rendered frames
shader 15 (                                                 Per Pixel Wood Shader):	 109 fps	  9.2055 mspf	     544 rendered frames
shader 16 (                                                 Per Pixel Tile Shader):	  65 fps	 15.4056 mspf	     325 rendered frames
shader 17 (                                  Fur Shader With Anisotropic Lighting):	  12 fps	 86.5773 mspf	      58 rendered frames
shader 18 (        Per Pixel Refraction and Reflection Shader with Phong Lighting):	  89 fps	 11.2108 mspf	     446 rendered frames
shader 19 (  Dual Depth Shadow Mapping With 3x3 Bilinear Percentage Closer Filter):	  23 fps	 42.6377 mspf	     118 rendered frames
shader 20 (                                High Dynamic Range Shader (cross blur)):	  30 fps	 33.4049 mspf	     150 rendered frames
shader 21 (                             High Dynamic Range Shader (gaussian blur)):	  31 fps	 31.7630 mspf	     158 rendered frames
shader 22 (                          Per Pixel Edge Detection And Hatching Shader):	  30 fps	 32.9012 mspf	     152 rendered frames
shader 23 (                                         Per Pixel Water Colour Shader):	  41 fps	 24.1876 mspf	     207 rendered frames
 
When i have Af enabled in the control panel i get massive corruption in the last test, is my card broke? All other apps work whithout any problems.

using cat 3.6 on 9700pro
 
Your card is okay, the last shader is an experimetal shader, it reads and writes to the same texture at the same time, which could cause problems. Next thing is, it uses an floating point rendertarget, which only support point sampling on the r3xx vpu's, if you want num. correct results, which are required for summed area tables.

Regards,
Thomas
 
Sideshow said:
When i have Af enabled in the control panel i get massive corruption in the last test, is my card broke? All other apps work whithout any problems.

using cat 3.6 on 9700pro

I have the same problem on a 9600p with CAT 3.7, nice to hear that's not a hardware problem :p

I also have problems when AF is enabled in control painel on rthdribl PS 2.0 demo( http://www.daionet.gr.jp/~masa/rthdribl/ ), I get small black squares on the surfaces :(

If you have this demo let me know if you get the same black dots when AF is enaled via CP

Thanks in advance

Sorry by my english :!:
 
tb said:
Your card is okay, the last shader is an experimetal shader, it reads and writes to the same texture at the same time, which could cause problems.
Is that even legal? :oops:
I'd be more inclined to think that source and destinations would have to be separate entities (well apart from the special case of the blends to the framebuffer).
 
Simon F said:
tb said:
Your card is okay, the last shader is an experimetal shader, it reads and writes to the same texture at the same time, which could cause problems.
Is that even legal? :oops:
I'd be more inclined to think that source and destinations would have to be separate entities (well apart from the special case of the blends to the framebuffer).

Yes, R3xx VPU's and Microsoft's Refrast produce the same output and the same way of doing it was also mentioned by NVIDIA.

Thomas
 
tb said:
Simon F said:
tb said:
Your card is okay, the last shader is an experimetal shader, it reads and writes to the same texture at the same time, which could cause problems.
Is that even legal? :oops:
I'd be more inclined to think that source and destinations would have to be separate entities (well apart from the special case of the blends to the framebuffer).

Yes, R3xx VPU's and Microsoft's Refrast produce the same output and the same way of doing it was also mentioned by NVIDIA.

Thomas

No its undefined, i.e. use at your own risk. MS have been threating to catch it in the debug for ages but we keep moaning at them to allow us to keep it.
It is illegal though and so could disappear at any point, and/or any card could do whatever it likes at that point (indeed if you don't get the texel/pixel alignment perfect you will get different results on existing cards).

A search on DIRECTXDEV will find the various threads over the years.

No game or application should rely on this behaviour. Have a fallback method ready for when its stops working.

IHVs hate it and indeed even when there dev-relations have recommanded it (NVIDIA did for some fancy filtering) they have later on changed there mind and advised against it. Yes they know it works at the moment but they don't want to commit to it working in future.
 
DeanoC said:
No its undefined, i.e. use at your own risk. MS have been threating to catch it in the debug for ages but we keep moaning at them to allow us to keep it.

Thanks for that Dean, I thought I was right. That's the 2nd one I owe you.

tb: FWIW you just have to consider all the concurrency issues in HW (given the ultra-deep pipelines) to realise what a pain it would be to safely support read/write to the same texture.
 
DeanoC said:
No its undefined, i.e. use at your own risk. MS have been threating to catch it in the debug for ages but we keep moaning at them to allow us to keep it.
It is illegal though and so could disappear at any point, and/or any card could do whatever it likes at that point (indeed if you don't get the texel/pixel alignment perfect you will get different results on existing cards).

A search on DIRECTXDEV will find the various threads over the years.

No game or application should rely on this behaviour. Have a fallback method ready for when its stops working.

IHVs hate it and indeed even when there dev-relations have recommanded it (NVIDIA did for some fancy filtering) they have later on changed there mind and advised against it. Yes they know it works at the moment but they don't want to commit to it working in future.

It's only one shader and if it stops working in the future I'll release a patch. By the way, did anyone know a different way (here is the shadermark v2.0 way http://www.opengl.org/developers/code/gdc2003/GDC03_SummedAreaTables.ppt ) to calculate a summed area table on the VPU ?

Thomas
 
tb said:
It's only one shader and if it stops working in the future I'll release a patch.
Judging from comments made at Graphics Hardware 2003 I can't imagine many of the hardware vendors would want to support that sort of thing.
By the way, did anyone know a different way (here is the shadermark v2.0 way http://www.opengl.org/developers/code/gdc2003/GDC03_SummedAreaTables.ppt ) to calculate a summed area table on the VPU ?
Obviously, for a start, you should use 2 textures (one the source and one the destination). :D

I was certainly surprised by the presentation's ordering of the summation. I'm only really estimating this but their solution seems to require N+N rendering passes for an NxN texture.

I'm guessing that you'd be able to compute the summed area table (SAT) in less than that by doing a binary divide and conquer approach.

If you just consider one row, all the right half pixels need to add the sum of all their left hand neighbours. If you recurse down you can compute those sums etc etc. Might be worth a try. Come to think of it, if you compute a MIP map (summing but not averaging) you could use that to generate the SAT very quickly.... I think :)
 
Simon F said:
tb said:
It's only one shader and if it stops working in the future I'll release a patch.
Judging from comments made at Graphics Hardware 2003 I can't imagine many of the hardware vendors would want to support that sort of thing.
By the way, did anyone know a different way (here is the shadermark v2.0 way http://www.opengl.org/developers/code/gdc2003/GDC03_SummedAreaTables.ppt ) to calculate a summed area table on the VPU ?
Obviously, for a start, you should use 2 textures (one the source and one the destination). :D

I was certainly surprised by the presentation's ordering of the summation. I'm only really estimating this but their solution seems to require N+N rendering passes for an NxN texture.

I'm guessing that you'd be able to compute the summed area table (SAT) in less than that by doing a binary divide and conquer approach.

If you just consider one row, all the right half pixels need to add the sum of all their left hand neighbours. If you recurse down you can compute those sums etc etc. Might be worth a try. Come to think of it, if you compute a MIP map (summing but not averaging) you could use that to generate the SAT very quickly.... I think :)

I think you're right in terms of the number of passes, but if you do this sort of divide-and-conquer type of rendering, I'm pretty sure you'll be rendering a lot more pixels. This method does only N pixels each pass.

However, I don't think rendering a line is the most optimal thing to be doing, seeing as how most pixel pipelines are arranged in a 2x2 fashion, but his may not matter considering the high bandwidth requirements of floating point (or 16-bit integer) texure reading and writing. My guess is it would take a fair amount of experimenting to figure out the most optimal method, taking into consideration the renderstate changes due to copying back and forth (provided you don't want to do any undefined operations). I don't think the optimizations will be really worth it, though.
 
Mintmaster said:
I think you're right in terms of the number of passes, but if you do this sort of divide-and-conquer type of rendering, I'm pretty sure you'll be rendering a lot more pixels. This method does only N pixels each pass.
Unless my doodles are wrong, divide and conquer will do fewer pixels in each pass (as well as performing fewer passes!).
 
tb said:
DeanoC said:
No its undefined, i.e. use at your own risk. MS have been threating to catch it in the debug for ages but we keep moaning at them to allow us to keep it.
It is illegal though and so could disappear at any point, and/or any card could do whatever it likes at that point (indeed if you don't get the texel/pixel alignment perfect you will get different results on existing cards).
It's only one shader and if it stops working in the future I'll release a patch.
My unofficial theory on this would be:
1. keep your primitives as large as possible
2. avoid writing the locations you're reading, whether you get 'before' 'after' or 'something else' is not predictable

But that's really just educated guesswork. As you say, it may break at any time
 
Can someone be so kind as to benchmark a

R9800 Pro stock and R9800XT downclocked? or

R9800 Pro overclocked and R9800XT stock?

Yes, I know its the same, but I dunno what you guys have available ;)

I just want to know, if there are any differences that ATi might have added in regards to the XT..

Many thanks guys :)
 
Dio said:
tb said:
DeanoC said:
No its undefined, i.e. use at your own risk. MS have been threating to catch it in the debug for ages but we keep moaning at them to allow us to keep it.
It is illegal though and so could disappear at any point, and/or any card could do whatever it likes at that point (indeed if you don't get the texel/pixel alignment perfect you will get different results on existing cards).
It's only one shader and if it stops working in the future I'll release a patch.
My unofficial theory on this would be:
1. keep your primitives as large as possible
2. avoid writing the locations you're reading, whether you get 'before' 'after' or 'something else' is not predictable

But that's really just educated guesswork. As you say, it may break at any time

The "current pixel", or the one that it appears you are currently reading from, is the only pixel which you can really gaurentee to be safe, and then only with large poly's (or small poly's supplied in a constrained order). The reason why you can't use before or after is that you don't know either the order in which pixels are processed or the depth of the pipeline. Current pixel is only safe as the pipelining invariable means that its long since been read at the point you write it back.

John.
 
Back
Top