Dawn FP16/FX12 VS FP32 performance - numbers inside

♪Anode said:
The only people who will be using these will be the developers. And at this point it makes more sense to give them something which represents the tech when their particular app is released than something else. Then they can do all the optimizations for a fp32 based architecture and then their results will come to realization when their game gets released 2 years from now and people can play them on the latest hardware which will run them in their full glory.
I doubt if you start right now to develop a new game that you want to use shaders which actually *require* FP32. Plus, you could most likely easily develop this with a card only capable of FP24, it just would look slightly worse in worst case, but there really shouldn't be much of a difference for developing. I also can't really see what "optimizations for a fp32 based architecture" would be. That you can have longer shaders is probably a feature which developers like however, that's why ATI has included the F-buffer.
And even if it's true that somehow FP32 (even slow) might be appealing to developers, I think Nvidia makes more money for selling cards to gamers than for giving them to developers :) (not to say this is not important, but if you want to make money now those interesting-for-developers-only features just don't cut it).
I'd agree that pretty much all you need right now is fast DX8 performance for current games. But, for this you could just use a GF4 Ti chip as well. And, more important, ATI shows that you can actually have very good performance for BOTH FX12 and FP24, and actually with a considerable smaller (about 20%) transistor count (true for both R350 vs. NV35 and RV350 vs NV31) too (though I don't know the die size for these chips, since die size is what primarily determines costs of the chip AFAIK this would be more helpful than transistor count).
 
It seems to me that the issue of FP32 vs. FP24 was settled for gaming situations with the DX9 spec requiring only FP24. Certainly one can come up with shaders where accumulated error will start showing artifacts in FP24 but not FP32. But the thing is, because of what the spec says and because ATI only supports FP24, you'll never find any such shaders in a DX9 game.

Now, if DX10, as rumored, moves from seperate PS and VS models to a unified shader model, then presumably that will require FP32 precision throughout. And so, in two years or so when DX10 comes out, you can expect ATI to support FP32 fragment shaders, too. Of course such support will only be in DX10, and as today's NV3x cards won't be able to run those DX10 shaders, their FP32 support seems rather moot.
 
It doesn't seem the 5900 FX is as slow, when using fp32 precision, as others make it out to be. Achieving 1/2 to 2/3's the fp performance of R350 (in the new build of futuremark and rightmark, at least, which seems to force high precision) with fp32 component precision and no register optimizations is no slouch, not to mention fp16 functions at almost twice the speed of fp32(most times). Stupid thing is, that the performance hit from fp32, as opposed to fp24, isn't because of the fp execution units but because of register usage penalties.
 
previously, it's been said that the 5900 will run the ARB path in Doom 3 just as fast as the NVIDIA path. does this all mean that's not true???
 
Josiah said:
previously, it's been said that the 5900 will run the ARB path in Doom 3 just as fast as the NVIDIA path. does this all mean that's not true???

Does ARB_fragment_program support a partial precision hint like PS2.0? If so, is it a hint that applies to the entire shader or can it be used with instruction-level granularity?

(Josiah: if the answers are that it does support partial precision, and on the instruction level, then this means NV35 should be able to run the ARB2 path fine. If not, maybe not...)
 
http://www.cs.ubc.ca/~xgranier/OpenGL/ext/ARB/fragment_program.pdf

Code:
[i](1) Should we provide precision queries?[/i]
RESOLVED: We've decided not to include precision queries.  Implementations are expected to meet or exceed the precision guidelines set forth in the core GL spec, section 2.1.1, p. 6, as ammended by this extension.

To summarize section 2.1.1, the maximum representable magnitude of colors must be at least 2^10, while the maximum representable magnitude of other floating-point values must be at least 2^32. The individual results of floating-point operations must be accurate to about 1 part in 10^5.  Here are the reasons why precision queries were not included:

1. It is unclear what the queries should be:
a) min, max, [0,1) granularity
b) min +, max +, min -, max -, [0,1) granularity
c) IEEE mantissa bits, IEEE exponent bits

2. Due to instruction emulation, there is no way to query the actual precision that can be expected. Should the query return the best-case or worst-case precision?

3. Implementations may support multiple precisions, on a per-instruction basis or across the board. How would this be exposed?

4. Current implementations are able to meet the minimum requirements specified in the core GL, thanks to its sufficiently loose wording "... so that the individual results of floating-point operations are accurate to ABOUT 1 part in 10^5." (Emphasis added.)

5. A conformance test can act as watchdog to ensure implementations are not cutting corners on precision. 

6. Adding precision queries would require a new entrypoint.
 
Interestingly, there should be a significant difference (~7-8 fps) between the 5800 ultra's fp16 performance and the 5900 ultra's (according to the info from Uttar and Ante P). Being that the 5900 ultra is all fp, with fp16 forced, it should yield significantly better performance than with fp32. What strengthens this perspective is the fact that NV30 suffers a great performance loss when it switches from the originally intended mixed precision to forced fp16 or fp16 (30 fps vs. 21 fps) and NV35 does not (29 fps vs. 27 fps). I believe the reason performance between NV30 and NV35, at fp32, is not so great (~1 fps) results from the fact that NV35 is stuck with the same amount of registers as NV30, even though it contains more than two times the fp units.

Any thoughts?
 
I just whish someone would implement damn FP16 framebuffer and 12 bit DAC :/ we are using all this high precession interal rendering but I still see colour banding on 32 bit mode :/ ( of course generaly only on alpha blending ops and fog ).
 
BTW the loading screen for Vulcan just cracks me up:

demosN.gif


(And yeah, that is the real loading screen :) )
 
Dave H said:
Does ARB_fragment_program support a partial precision hint like PS2.0? If so, is it a hint that applies to the entire shader or can it be used with instruction-level granularity?

See

http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt

Yes, the partial precision hints are ARB_precision_hint_nicest and ARB_precision_hint_fastest. They are mandatory program options (but they may be ignored).

They apply to the entire shader.

See Issue 22 and 3.11.4.5.2 in the ARB_fragment_program specification.



On precision in OpenGL.

Precision of operations in described in "The OpenGL Graphics System: A Specification (Version 1.4)" in 2.1.1.

"...floating-point operations are accurate to about 1 part in 10E5."

ARB_fragment_program edits this section to promote texture coordinates to a larger magnitude (at least 2^32), colors remaining lower magnitude (at least 2^10).

On a historical note, ISO/IEC C FLT_EPSILON is less than or equal to 1E-5. (And IEEE 754/IEC 60559 value is 1.19209290E-07F.)

(Finally, "fp24" operations are accurate to about 1 part in 10E5.)

-mr. bill

(edit - fix quote block)
 
UPDATE

Okay, so I finally gave up on the whole COLR thing. Simply using it in the shader files makes the whole thing crash. I tried and tried, and nothing could fix this.

I doubt the NV3x really cares anyway: heck, on the NV30, as AnteP's results showed, FP16 in FP32 registers had the same performance as full FP32 - and the FP16 in FP32 thingy doesn't have the COLR problem.

My guess is really that the only difference is that it makes FP32 framebuffers completely useless if you use COLH. Not 100% sure, but very likely.

Anyway, this version also includes the Full FP12 patch ( which also modifed the Cg files to use Fixed ) - so rejoice! :)


www.notforidiots.com/DawnQuality.zip


Uttar
 
Here's my version of full fp32 precision shaders for dawn. Someone with FX willing to give it a shot? ;)

BTW Uttar: Do you want to know why your fp32 shaders didn't work? 8) The finalLeavesTranslTranspFR.fp30 shader is writing out color on two different locations and you forgot to change one COLH to COLR, meaning you were using both which is illegal.
 
Back
Top