FP16 and market support

Discussion in 'Architecture and Products' started by radar1200gs, Dec 19, 2003.

  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Is that New Year in Turkey or New Year with a turkey?

    I most certainly do not tend to hang out with my dinner :lol:
     
  2. radar1200gs

    Regular

    Joined:
    Nov 30, 2002
    Messages:
    900
    Likes Received:
    0
    http://www.beyond3d.com/forum/viewtopic.php?t=9982
    Given the insistence by some members of this forum that FP16 is not a valid part of the DX9 spec, I'd like to hear opinions on why, if this was the case, Microsoft would bother doing anything with them and why they didn't instead apply the changes to FP24 (since if you listen to the fanboys FP24 is the logical, obviously superior to anything else out there format)?
     
  3. akira888

    Regular

    Joined:
    Jul 15, 2003
    Messages:
    652
    Likes Received:
    11
    Location:
    Houston
    Heh. Good one. :lol:
     
  4. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    Because what is being talked about here is the external floating point formats, which have always been FP16 and FP32 in DX9 - nothing new in that. Also, people aren't arguing that FP16 isn't part of the spec (or at least they shouldn't be) - when the _pp hint is set the spec defines FP16 as the minimum precision. What it isn't legal for a driver to do is to use FP16 under any circumstances when the _pp hint isn't set, whether it believes it will affect the output noticably or not.

    FP24 is the standard internal high-precision format of the shaders - in terms of external memory accesses it makes sense to restrict the widths of the data to powers of two, but internally you have much more freedom, and so you can be more flexible in tradeoffs between silicon cost and overall precision.

    What Dany is talking about here is a move towards treating external floating point formats as first class citizens - ie. having all the same capabilities as current integer formats with respect to blending and filtering, whereas in current hardware floating point formats can typically only be point-sampled and not blended. Initially Dany states that he expects to see this for FP16 formats because it is cheaper to do the filtering and blending on FP16 than FP32.
     
  5. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    What about strength reductions or substitutions that won't effect the output at all? There are probably a few cases where the compiler can make conservative substitutions, especially on short shaders that do hardly anything, deal only with integer textures, and write to normal integer framebuffers.


    Agreed, but it is nice when HW supports an extended FP32 precision if you are doing some scientific rendering or non-games stuff. I coded up some scientific algorithms on the GPU a few months ago as an experiment, and it kinda sucked that my accuracy turned out alot worse than a C program because my parameters were getting truncated.

    BTW, ATI's support of MRT rocks, but adding NVidia's pack/unpack instructions would be even more awesome when combined with ATI MRT. Sometimes you want a pixel shader that can shove a whole bunch of variables in a FB and pull them in on the next pass. With MRT, you can write 16 different scalar values, or 4 vector, (more if you count oDepth) but with pack/unpack you can double or quadruple that amount if you want to store only some 16-bit or 8-bit values. This works really nice if you want to store lots of loop constants or temporaries.



    Of course, it's a waste a silicon today, but hopefully in a few years, the GPUs will have enough silicon to space to be 100% all-the-way through compliant with the external formats. If internally FP is implemented completely different, but still equivalent precision, that's fine. What "sucks" (relative, since it's not really an issue for most games) is sending in a IEEE float, and having the HW trunc it with no options.

    Hopefully, R500/NV50 or whatever will support FP32 if you "request" it.
     
  6. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    Whether that is acceptable or not is really up to Microsoft to define as the owners of the API. Without guidance on this the only thing you can say is that _pp can be run at partial precision, and not _pp must be run at at least FP24. Perhaps Microsoft would be inclined to allow such subtitutions - but I expect that situations where this is permitted would need to be very clearly defined - clearly any lower-precision substitution that could in any way affect the output value is obviously invalid.

    Personally I would say that it is not within the purview of the driver to attempt such optimisations. Since the _pp hint is provided it is up to the author to decide to make use of it or not if they want to take advantage of any possible performance gains. The compiler or driver should not be making such substitutions - it is akin to having a compiler making lower precision substitutions despite the fact that you turned optimisations off.

    Perhaps there should be some global hinting mechanism to allow Microsoft's HLSL compiler to make such optimisations - I would be more comfortable with this than placing the requirements for legality of substitutions on the drivers of IHVs who always have a vested interest in making their hardware look faster whether by means of legal substitutions or not. There's just too much temptation to play a bit fast and loose with the rules.

    Yes - naturally we will see support for higher precision formats coming along as VPUs become used more frequently in applications outside of entertainment, and also just as a natural consequence of advances in technology.
     
  7. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I agree. I brought up the issue of needing a "hinting" mechanism for the driver a while ago, due to the fact that the drivers now contain optimizers, and sometimes you need to switch optimizations off, especially if the optimizer is doing something bad on some pathological case.


    After all, I can pass lots of command line arguments to GCC's optimizer and many other C compilers or virtual machines, so why not? Inlining heuristics, when to choose branch vs predicate, etc will all become very important in PS3.0+.
     
  8. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Yes, I'm really hoping that we get full FP16 framebuffer (and texture) support.

    But I do have to say that there'd be no reason to do the same for FP32. Remember that the operations that we're wanting to be added to FP16 framebuffers and textures are operations that assume color data. Color data won't need to be stored at greater than FP16 precision unless you're going to do lots of passes ( >16 or so), and the longer and more general shaders that are becoming available should avoid any issues with that sort of thing (so, of course, it would be desirable to have high-speed, high-precision internal FP support moving into the future).
     
  9. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Yeah, it would be really nice to be able to just tell the compiler, "Just use whatever precision you think won't affect the output," and not worry about it. Then just switch some optimizations off if it starts to look ugly.

    An even better solution might be compiler options that have different settings for how conservative the optimizations must be, not just an on/off switch (of course, you'd need well-defined behavior for each setting for the settings to be useful in an environment like we're talking about, which may be difficult to accomplish).
     
  10. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    Heh. I've used a few compilers like that. Except you have to switch off some optimizations at random times when the compiler doesn't like whatever construct you've come up with.

    From an engineering/programming standpoint, that sucks. It should work as designed, per spec, all the time, not some indeterminate output.
     
  11. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    It's not like it's something you'd be forced to use. Remember that this would be a substitute for paying close attention to which precisions are needed where.
     
  12. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    Agreed. It can be just as hairy when a particular CPU architecture (actually, let's cut straight to the chase - it's the x86) decides it's going to use higher precision just because you've given the C-compiler more opportunity to optimise the code.

    I've mentioned this before but I've had no end of problems when I optimised/rewrote some floating-point intensive code used in the texture compressor. When you are trying to determine if something converges and sometimes it's computed at 32-bit and then at other times 80-bit, you get no end of problems. A complete nightmare.
     
  13. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    The x86 will always use 80-bit FP calculations unless you're using some of the SIMD instructions. But that shouldn't be a problem unless you're using the equality operator, and not using the equality operator on floats was one of the first things I learned in programming classes. It's just not something you do.
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    Chalnoth, I suggest you try writing an SVD routine and see what happens :)

    Apart from ==0.0 (for special cases), I'm not using equality tests. The problem occurs because, "randomly" (i.e. at the discretion of the compiler), sometimes a variable remains on the FPU stack (and thus at 80-bit precision) and other times it gets saved out to cache/memory and hence is converted to 32-bit. The calculations will thus change dramatically depending on the luck of the draw with the compiler's variable allocation. It gets even more frustrating because the tendency to use the FPU stack increases with optimised builds and thus a bug in the code will vanish when you try to debug it.

    On a sensible CPU, these problems simply do not occur.
     
  15. Hyp-X

    Hyp-X Irregular
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,170
    Likes Received:
    5
    Wrong.
    You can set the calculating precision of the FPU in the control word to 32, 64 or 80 bits.
    For example D3D sets 32 bit (for the entire program!!!).
    The flags are set to 64 bit by default in MSVC.
     
  16. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    And wrong again.
    As SimonF says the precision is always 80 bit for 'most' operations, the control word precision is used for divides mainly. Long (64 or 80 bit) divides are expensive whereas long multiples aren't (on Intel x86). So you can tell the processor to stop division calculations at the approiate stage (i.e. 23rd bit for floats) but it doesn't change other operations.

    This means that depending on when the compiler decides to flush something from floating point register to memory is when the truncation occurs to the length specified.

    Code:
    For all FPU operations on x86 these 2 code sequences will produce different results in Reg0
    
    Program 1
    Reg2 = Reg0 + Reg1
    Store Reg2 to memory
    Read memory to Reg2
    Reg0 = Reg2 + Reg1
    
    Program 2
    Reg2 = Reg0 + Reg1
    Reg0 = Reg2 + Reg1
    
    
    Most compiler have a mode which will ensure that a float really is a float by transferin and reading back at every operation all floats/doubles etc.

    x86 can be a real pain for numerically sensitive operations. As its largely impossible to determine to what precision its actually operating at (because it will change based on instruction order, phase of moon, etc)
     
  17. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Well, even on sensible CPU's, you can get different results depending on the optimizer due to code motion and reordering, which can change between recompiles of your program. ANSI C/C++ "banned" associativity optimizations (e.g. (a+b)+c cannot be evaluated as a+(b+c) ) , but there are still compilers that offer this, other languages don't have similar bans, and even in the ANSI C case, there are still some pathological optimizer issues. (compiler is still allowed to compute A, B, and C in any order and cache or move the results)

    Java added the "strictfp" keyword to the language to resolve this issue. In "stricftp", all operations are done according to the IEEE standard, no under or over precision is allowed, and evaluation order has to be maintained. Otherwise, upcasting to 80-bit is allowed as well as some "harmless" reordering optimizations.
     
  18. sonix666

    Regular

    Joined:
    Mar 31, 2003
    Messages:
    595
    Likes Received:
    3
    But still, most compilers enable 'aggressive' floating point optimizations by default. The simple rule is to include error margins for any comparison when using floating point. Someone suggested that they'd better remove the equality and inequality operators from the language, to prevent 'stupid' programmers to fuck up, because they have no clue that floating points don't have unlimited precision. ;)
     
  19. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    Just for the record for those interested, that is an option I wish to avoid as gprof tells me that ~60% of the run time is in this function so it would rather defeat my efforts to optimise that piece of code :)

    Well I'm using GCC (on at least 2~3 different platforms) which is "quite ANSI" so incorrect associativity kludges are unlikely.
     
  20. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    are you absolutely sure about the above?

     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...