FP16? But it's the current year!

Discussion in 'Architecture and Products' started by Markus, Oct 27, 2016.

  1. Markus

    Newcomer

    Joined:
    Jul 26, 2016
    Messages:
    12
    Likes Received:
    6
    I've heard FP16 throughput numbers quoted for both Playstation pro and Vega as if it was a meaningful metric.

    Suddenly it became a feature for full precision shaders to also be able to act in FP16 mode with twice the through put in 2016. I don't get it.

    Are we talking about some new whizz-bang compute shader application like neural networks or whatever? Are we talking about re-enabling the Geforce FX shader path?
     
  2. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,606
    Likes Received:
    5,711
    Location:
    ಠ_ಠ
    nVidia should do Titan FX 5200.
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,120
    Likes Received:
    2,867
    Location:
    Well within 3d
    It comes down to how much precision is sufficient.
    Even when the transition to 32-bit registers began, it was recognized that some workloads did not need that much precision. ATI went with an intermediate 24-bit precision for pixel shaders for a time.
    There are processing steps and workloads that are fine with 16 bits or fewer, such as various postprocessing steps on targets that are already lower-precision, neural network training (with inference even going lower), or algorithms that will iteratively approach a sufficiently accurate result.
    Some things, such as addressing larger memory spaces, position data, or algorithms that accumulate data (and error) from multiple sources, benefit from more bits in the representation.

    That would need to be weighed against the costs of having higher precision: power, hardware, storage, bandwidth.

    For a period of time, the benefit in reaching the same level of precision used for other programmable architectures, larger memories, and more advanced algorithms was high enough to justify bumping everything to a consistent and sufficient internal precision. Then limited bandwidth growth, power consumption, and poorer silicon density+cost gains gave designers a reason to revisit that initial trade-off.

    We may not necessarily be done at FP16, as some workloads can get away with less and there are low-power concepts with FPUs that dynamically vary their precision to shave things down further.
     
  4. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,912
    Likes Received:
    479
    I've been similarly confused about this sudden clamour for 16bit :|
    I guess that makes sense but I can't help feeling its an odd regression.
     
  5. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,909
    Likes Received:
    2,450
    It'd be a regression if it came at the expense of 32bit fp, but it isn't. It's granularity. Granularity is good. Quality isn't gonna suffer unless the developers wants it to, and mostly, 16bits will be used when it makes no difference, so no loss.
    Of course when GPUs went 32, their vendors said it was something absolutely necessary we couldn't live without, because that's what sales people do, but nothing ever is as Black and White as marketing talk makes things out to be.
     
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,780
    Likes Received:
    4,431
    Why did we go from 24bit in DX9 to 32bit in DX10/11, by the way?
     
  7. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,285
    Likes Received:
    8,481
    Location:
    Cleveland
    Precision. You can't use FP16 for everything. There's actually limited situations where this fits in.
     
    Razor1 and I.S.T. like this.
  8. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,142
    Likes Received:
    1,673
    Location:
    Winfield, IN USA
    :lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2:

    I swear, FX jokes just never get old for me!

    :lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2::lol2:
     
    swaaye, I.S.T., BRiT and 2 others like this.
  9. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,780
    Likes Received:
    4,431
    But 24bit. Was 24bit enough?
     
  10. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    It's not. This was prior to unified shaders and even back then vertex shaders were full fp32.
     
    BRiT and pharma like this.
  11. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    I remember 3dlabs saying they used more than FP32 for their vertex shaders. FP36 I think. Others used FP32 as MDolenc said.

    In addition to unified shaders FP32 enabled more GPGPU usage.
     
    Markus likes this.
  12. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    I've heard of 10 or 12-bit integer as a good compromise between 8 and 16-bit for neural networks according to some people. NV30 supported a FX12 4-way dot product (sadly without an extra accumulate, but that'd be easy to add). Therefore, NV30 is the future. /s

    I think the benefit of FP16 was forgotten because of:
    - NVIDIA pushing it inappropriately giving it a bad reputation.
    - Possibly as a consequence of the above, Microsoft pushing FP32 more in DX10.
    - Both NVIDIA and AMD aggressively pushing GPGPU which obviously required FP32.

    Then FP16 was reintroduced on mobile which made people realise the power benefit and how useful it is in general. And now neural networks training also benefit from FP16, which is generating even more interest.
     
    pharma, Lightman, milk and 1 other person like this.
  13. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    All existing games (except a few HDR games) output image at 8 bits per channel (RGB8). Input textures are also commonly 8 bit per channel (and BC compressed = lower quality as 8 bit).

    As your input and output data has only 8 bit precision you don't need to calculate all intermediate math at 32 bit. Games don't store intermediate buffers as 32 bit floats either. Rgba16f is used commonly for HDR data and rgb10 and rgba8 for other intermediate buffers. 16 bit float processing is fine for most math in games. Results cannot be distinguished by naked eye from full 32 bit float pipeline, as long as the developer knows what he/she is doing. Especially if temporal AA is used.

    Unfortunately writing good mixed fp16/fp32 code requires good knowledge about floating point math behaviour and some basic numeric range analysis (inputs/outputs and intermediate values). It is possible to write math in a way that minimizes floating point issues, allowing you to use fp16 more often. Of course if you use fp16 in a wrong way, you get banding and other artifacts.
     
    chris1515, Malo, Pixel and 5 others like this.
  14. milk

    Veteran Regular

    Joined:
    Jun 6, 2012
    Messages:
    2,909
    Likes Received:
    2,450
    I remember you said once, that if a programmer knows what he is doing, even lower bit depth Integers can do the job just fine for a lot of the workloads. Maybe as the cheap wins in silicon become harder to achieve, architectural changes that open more doors for devs to shave off bits and bytes off of their code might be a great part of the performance gains of the future. All layman speculation over here though.
     
  15. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    I could see pathfinding, audio mixing, and possibly geometry culling benefiting from FP16. With the geometry levels we may be seeing with DX12/Vulkan being able to cull at twice the speed could be significant. Accuracy should be less of an issue there as well. In the case of ASW, if near and far objects are in separate render targets there may be some areas where lower precision is practical. Even FP64 for geometry at extreme (celestial) distances might be useful in that scenario. Although stratifying the depth buffers likely addresses that concern. I know some of the space games had issues reprojecting celestials because of distance hacks.

    For a lot of workloads probably, but not all. Too much precision isn't really an issue beyond performance. 24 bits also isn't a size that will pack efficiently. Supporting 24 bit over 32 helps transistor count, but that's about it. I'm not sure anyone has ever used a card with 24 bit memory channels? At least not recently. Maybe on a FPGA with 8 bit registers or something.
     
  16. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Wow, a very long time since I've heard of FX12. IIRC that's in Pixel Shader 1.4 and supported by Radeon 8500/9000/9200. Also what you likely wanted to run on a geforce FX if a renderer code path was available.
    I think Doom 3 ran at 60 fps with FP32 shaders on the FX 5800 Ultra but that was very peculiar.
     
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,716
    Likes Received:
    6,006
    This conversation is making me cringe just slightly. Nvidia borked the 1070 and 1080 to be half rate on 16F IIRC. And if future consoles are going to be full tilt on 16Fp. Lol. Oh man, need to consider selling.
     
    BRiT likes this.
  18. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    Why? What makes you think driver will even expose half precision to DX? As it stands now this is a feature for CUDA. For DX low precision hints are simply ignored.
     
  19. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I am curious how this will be done effectively for multi-platform game engines, or when it comes to porting a game that is making use of a mix of FP16/FP32 at its core/post processing effects.

    Cheers
     
  20. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Was listed on a slide for SM6.0 at GDC.

    Shouldn't be difficult for the compiler to promote to FP32. Performance is obviously less, but the compilers would have determined that to be the better solution. Even at half rate there should still be some bandwidth and memory savings.
     
    #20 Anarchist4000, Oct 30, 2016
    Last edited: Oct 30, 2016
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...