Dawn FP16/FX12 VS FP32 performance - numbers inside

Discussion in 'Architecture and Products' started by Arun, May 25, 2003.

  1. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I doubt if you start right now to develop a new game that you want to use shaders which actually *require* FP32. Plus, you could most likely easily develop this with a card only capable of FP24, it just would look slightly worse in worst case, but there really shouldn't be much of a difference for developing. I also can't really see what "optimizations for a fp32 based architecture" would be. That you can have longer shaders is probably a feature which developers like however, that's why ATI has included the F-buffer.
    And even if it's true that somehow FP32 (even slow) might be appealing to developers, I think Nvidia makes more money for selling cards to gamers than for giving them to developers :) (not to say this is not important, but if you want to make money now those interesting-for-developers-only features just don't cut it).
    I'd agree that pretty much all you need right now is fast DX8 performance for current games. But, for this you could just use a GF4 Ti chip as well. And, more important, ATI shows that you can actually have very good performance for BOTH FX12 and FP24, and actually with a considerable smaller (about 20%) transistor count (true for both R350 vs. NV35 and RV350 vs NV31) too (though I don't know the die size for these chips, since die size is what primarily determines costs of the chip AFAIK this would be more helpful than transistor count).
     
  2. Dave H

    Regular

    Joined:
    Jan 21, 2003
    Messages:
    564
    Likes Received:
    0
    It seems to me that the issue of FP32 vs. FP24 was settled for gaming situations with the DX9 spec requiring only FP24. Certainly one can come up with shaders where accumulated error will start showing artifacts in FP24 but not FP32. But the thing is, because of what the spec says and because ATI only supports FP24, you'll never find any such shaders in a DX9 game.

    Now, if DX10, as rumored, moves from seperate PS and VS models to a unified shader model, then presumably that will require FP32 precision throughout. And so, in two years or so when DX10 comes out, you can expect ATI to support FP32 fragment shaders, too. Of course such support will only be in DX10, and as today's NV3x cards won't be able to run those DX10 shaders, their FP32 support seems rather moot.
     
  3. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    It doesn't seem the 5900 FX is as slow, when using fp32 precision, as others make it out to be. Achieving 1/2 to 2/3's the fp performance of R350 (in the new build of futuremark and rightmark, at least, which seems to force high precision) with fp32 component precision and no register optimizations is no slouch, not to mention fp16 functions at almost twice the speed of fp32(most times). Stupid thing is, that the performance hit from fp32, as opposed to fp24, isn't because of the fp execution units but because of register usage penalties.
     
  4. Josiah

    Newcomer

    Joined:
    Mar 23, 2003
    Messages:
    224
    Likes Received:
    0
    previously, it's been said that the 5900 will run the ARB path in Doom 3 just as fast as the NVIDIA path. does this all mean that's not true???
     
  5. Dave H

    Regular

    Joined:
    Jan 21, 2003
    Messages:
    564
    Likes Received:
    0
    Does ARB_fragment_program support a partial precision hint like PS2.0? If so, is it a hint that applies to the entire shader or can it be used with instruction-level granularity?

    (Josiah: if the answers are that it does support partial precision, and on the instruction level, then this means NV35 should be able to run the ARB2 path fine. If not, maybe not...)
     
  6. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
  7. Neeyik

    Neeyik Homo ergaster
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,231
    Likes Received:
    45
    Location:
    Cumbria, UK
    http://www.cs.ubc.ca/~xgranier/OpenGL/ext/ARB/fragment_program.pdf

    Code:
    [i](1) Should we provide precision queries?[/i]
    RESOLVED: We've decided not to include precision queries.  Implementations are expected to meet or exceed the precision guidelines set forth in the core GL spec, section 2.1.1, p. 6, as ammended by this extension.
    
    To summarize section 2.1.1, the maximum representable magnitude of colors must be at least 2^10, while the maximum representable magnitude of other floating-point values must be at least 2^32. The individual results of floating-point operations must be accurate to about 1 part in 10^5.  Here are the reasons why precision queries were not included:
    
    1. It is unclear what the queries should be:
    a) min, max, [0,1) granularity
    b) min +, max +, min -, max -, [0,1) granularity
    c) IEEE mantissa bits, IEEE exponent bits
    
    2. Due to instruction emulation, there is no way to query the actual precision that can be expected. Should the query return the best-case or worst-case precision?
    
    3. Implementations may support multiple precisions, on a per-instruction basis or across the board. How would this be exposed?
    
    4. Current implementations are able to meet the minimum requirements specified in the core GL, thanks to its sufficiently loose wording "... so that the individual results of floating-point operations are accurate to ABOUT 1 part in 10^5." (Emphasis added.)
    
    5. A conformance test can act as watchdog to ensure implementations are not cutting corners on precision. 
    
    6. Adding precision queries would require a new entrypoint.
    
     
  8. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    Interestingly, there should be a significant difference (~7-8 fps) between the 5800 ultra's fp16 performance and the 5900 ultra's (according to the info from Uttar and Ante P). Being that the 5900 ultra is all fp, with fp16 forced, it should yield significantly better performance than with fp32. What strengthens this perspective is the fact that NV30 suffers a great performance loss when it switches from the originally intended mixed precision to forced fp16 or fp16 (30 fps vs. 21 fps) and NV35 does not (29 fps vs. 27 fps). I believe the reason performance between NV30 and NV35, at fp32, is not so great (~1 fps) results from the fact that NV35 is stuck with the same amount of registers as NV30, even though it contains more than two times the fp units.

    Any thoughts?
     
  9. bloodbob

    bloodbob Trollipop
    Veteran

    Joined:
    May 23, 2003
    Messages:
    1,630
    Likes Received:
    27
    Location:
    Australia
    I just whish someone would implement damn FP16 framebuffer and 12 bit DAC :/ we are using all this high precession interal rendering but I still see colour banding on 32 bit mode :/ ( of course generaly only on alpha blending ops and fog ).
     
  10. Doomtrooper

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,328
    Likes Received:
    0
    Location:
    Ontario, Canada
    How ya doing
     
  11. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    I don't exactly know, or do I?
     
  12. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
  13. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    CRASH BOOM BANG :(
     
  14. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    BTW the loading screen for Vulcan just cracks me up:

    [​IMG]

    (And yeah, that is the real loading screen :) )
     
  15. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Okay, so let's try this AGAIN...

    www.notforidiots.com/Dawn3.zip

    Psst, AnteP, don't insist too much on that or nVidia will realize you got a *beta* version and kill you for sharing that image ;)


    Uttar
     
  16. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    nah I just spoke to them and posting pics and videos is fine as long as I don't post the actual demo =)
     
  17. mrbill

    Newcomer

    Joined:
    Feb 24, 2003
    Messages:
    36
    Likes Received:
    1
    Location:
    Marlborough, MA
    See

    http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt

    Yes, the partial precision hints are ARB_precision_hint_nicest and ARB_precision_hint_fastest. They are mandatory program options (but they may be ignored).

    They apply to the entire shader.

    See Issue 22 and 3.11.4.5.2 in the ARB_fragment_program specification.



    On precision in OpenGL.

    Precision of operations in described in "The OpenGL Graphics System: A Specification (Version 1.4)" in 2.1.1.

    "...floating-point operations are accurate to about 1 part in 10E5."

    ARB_fragment_program edits this section to promote texture coordinates to a larger magnitude (at least 2^32), colors remaining lower magnitude (at least 2^10).

    On a historical note, ISO/IEC C FLT_EPSILON is less than or equal to 1E-5. (And IEEE 754/IEC 60559 value is 1.19209290E-07F.)

    (Finally, "fp24" operations are accurate to about 1 part in 10E5.)

    -mr. bill

    (edit - fix quote block)
     
  18. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    crash...
     
  19. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    UPDATE

    Okay, so I finally gave up on the whole COLR thing. Simply using it in the shader files makes the whole thing crash. I tried and tried, and nothing could fix this.

    I doubt the NV3x really cares anyway: heck, on the NV30, as AnteP's results showed, FP16 in FP32 registers had the same performance as full FP32 - and the FP16 in FP32 thingy doesn't have the COLR problem.

    My guess is really that the only difference is that it makes FP32 framebuffers completely useless if you use COLH. Not 100% sure, but very likely.

    Anyway, this version also includes the Full FP12 patch ( which also modifed the Cg files to use Fixed ) - so rejoice! :)


    www.notforidiots.com/DawnQuality.zip


    Uttar
     
  20. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    Here's my version of full fp32 precision shaders for dawn. Someone with FX willing to give it a shot? :wink:

    BTW Uttar: Do you want to know why your fp32 shaders didn't work? 8) The finalLeavesTranslTranspFR.fp30 shader is writing out color on two different locations and you forgot to change one COLH to COLR, meaning you were using both which is illegal.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...