Games and Pixel Shader 2.0

Energy

Newcomer
We have two video card brands which both have different PS 2.0 floating point precision capabilities and performance. GFX has FP32 and FP16 but R9500/9700/9800 has only FP24

GFX FP32 is slower than Radeon FP24 but GFX FP16 is faster than Radeon FP24

How this situation is going to affect to PS 2.0 effects quality in games? Both video cards FP capabilities should be fully used but that requires different PS 2.0 code for both cards, right?
 
To put it simply, and in good ole marketing speech...
"Currently, nVidia's GeForce FX family's Pixel Shading speed inferior to ATI's Radeon product family's Pixel Shading pseed, from top to bottom."


Uttar
 
There is a certain DX9 game that will be benchmarked in my forthcoming Triplex Radeon 9600PRO review. It features floating point for a certain effect.

Either Dave or me will be studying the quality of this effect utilizing float textures soonish, comparing ATI's and NVIDIA's offerings.
 
Uttar said:
To put it simply, and in good ole marketing speech...
"Currently, nVidia's GeForce FX family's Pixel Shading speed inferior to ATI's Radeon product family's Pixel Shading pseed, from top to bottom."
Except that's incorrect.

To put it simply, the GeForce FX line is capable of more pixel shader operations per clock, but has many more limitations to prevent it from reaching its peak performance.

Pre-NV35 (everything below the GeForce FX 5900) has to use FP very sparingly for decent performance. This doesn't mean that few shaders should use any FP, but rather that every shader can use little FP. Depending on the calculations being done, this may or may not be a problem.

Unfortunately, due to Microsoft's spec, there is no option for integer processing in PS 2.0. This means that everything less than the FX 5900 is basically screwed in Direct3D. In OpenGL, there is the option to use integer processing, such that the FX architecture (prior to FX 5900) can flex its muscle.

Anyway, all of this aside, the simple fact is that the FX architecture isn't inherently slower than ATI's. Microsoft's spec is holding it back, which will mean that the FX's lower than the 5900 will need to sacrifice quality for any kind of speed (with no integer format, the drivers cannot possibly detect when integer format can be safely used, and it will have to be used anyway for speed).

Other than this, the FX line can be very fast, faster than ATI's in many cases, through optimization. And the FX line's shader performance is very, very complex. One cannot make any simple statement about it, except to look at peak performance and say it's harder to get that peak performance.
 
Other than this, the FX line can be very fast, faster than ATI's in many cases, through optimization. And the FX line's shader performance is very, very complex. One cannot make any simple statement about it, except to look at peak performance and say it's harder to get that peak performance.

How do you figure that?
 
Microsoft's spec is holding it back,

Interesting placement of blame. Especially considering all the hype Nvidia were pumping out. "A dawn of cinematic computing." Not sure whether to laugh or cry.

It was Nvidia's choice to put such an unbalanced design together.
 
Chalnoth said:
Other than this, the FX line can be very fast, faster than ATI's in many cases, through optimization. And the FX line's shader performance is very, very complex. One cannot make any simple statement about it, except to look at peak performance and say it's harder to get that peak performance.

This is the problem. You can have all the theoretical high performance you like, but if you need a lot of hard work to get there in practice, chances are no developers will do it. On the whole, developers will simply not jump through hoops, especially to support unusual, problematic, or unique/minority features. Devs like to write their code to the spec, and have the hardware run it without lots of custom tweaking.
 
I think I just proved in closed, scientifically-accepted conditions than marketing speech encourages hatred ;)

I'd make a comment on this whole darn thing, but I've said this stuff so many times you guys aren't gonna force me to explain it all again, are you?


Uttar
 
Chalnoth said:
Anyway, all of this aside, the simple fact is that the FX architecture isn't inherently slower than ATI's.
When it comes to following the spec, which means floating point all the way through, it sure seems so.
Microsoft's spec is holding it back, which will mean that the FX's lower than the 5900 will need to sacrifice quality for any kind of speed (with no integer format, the drivers cannot possibly detect when integer format can be safely used, and it will have to be used anyway for speed).
MS wanted a full floating point pipeline, there's no room for integer. Sure, nvidia can sacrifice quality for speed, but that's outside of the spec and should not be condoned. Allowing partial precision is enough of a compromise, IMO.

You can't blame MS for nvidia's architecture decisions.
 
On the other hand, OpenGL HLSL fully supports integers in pixel shaders, upcasting to float if you don't have integer support.

So now we have a dichotomy. OGL2.0 will support both 16-bit integer, and 32-bit floating point types in the pipeline (compiled by driver), and DX9 only supports 16-bit floating point, and 32-bit floating point. While the DX9 HLSL supports declaring double (64-bit FP) and int types, there is no way to pass this information through to the driver in ps2.0 or ps3.0.
 
Chalnoth said:
Pre-NV35 (everything below the GeForce FX 5900) has to use FP very sparingly for decent performance. This doesn't mean that few shaders should use any FP, but rather that every shader can use little FP. Depending on the calculations being done, this may or may not be a problem.

Theoretically the NV35 should be twice as fast according to nVidia yes.
But in the actual few DX9 games/demos I've tried out:
Auquanox 2, 3DMark03, rthdribl, Shadermark, Humus mandelbrot demo (and oh the half ass DX9 game GunMetal too ;) ) etc. there does not seem to be any major performance increase at all over the 5800 Ultra. (Especially interesting in the simpler demos which are not affected by the added 128 MB or bandwidth)

Of course I don't know much about the specifics of the shaders in Aquanox 2.
I do know however that I can run the game at 1280x960 with 6x FSAA and 16x Quality Aniso on a 9800 Pro and it runs even a tiny bit smoother than my 5900 Ultra at the same res but with 4x AA and 8x AF.
(That's a 128 MB 9800 Pro and a 256 MB 5900 Ultra btw)
 
Theoritically NV35 will be faster than NV30 on fp shaders with more than a few instructions and may approach twice the speed on complex shaders. Remember (brings back memories, doens't it Uttar), NV35 contains two fp units arranged serially, per pipeline (and 4 pipelines). Even though it houses a total of 8 fp units, each pipeline can only write one color result at a time.
 
Back
Top