Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Not so sure about that. Rather looks like they might just use those to manage their inventory - and at the same time make a premium. While I obviously cannot be sure, I would guess no new GM200/204 out of TSMC any more.
     
  2. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah I think they will initially continue with GM200 even when they go into production with Tesla GP102 as the performance gap should be large enough for one to be a competitive priced Tesla and the other more about performance at a price.
    One to watch out for IMO is when they stop some of the Kepler K series Tesla products, such as the K80.

    Cheers
     
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    And whatever is supported natively on GTX 1080: AotS is a DirectX title and as such has to make do with what's exposed there. And FP16 isn't on Pascal. As well as fine-grained preemption, I might add.
     
    #1283 CarstenS, Jun 4, 2016
    Last edited: Jun 4, 2016
  4. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    GM200 is on the roadmap (as M40) for a long time still (far into 2017). I have no inside information into Nvidia's production schedules, but at the moment it is much easier to buy a M40 than a P100, and I expect that to be true at least until Q1 2017.
     
    Lightman likes this.
  5. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Well, seems the odds are against me. :)
     
  6. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    So any of the publication members here know if their magazine/site discussed with Oxide the FP16 pipe they use and implemented in AoTS back when they reviewed the game or interviewed them?
    Cheers
     
  7. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I thought the conclusion from discussion/hypotheticals was that FP16 was reduced on 1080, but full with Tesla P100; what was never concluded was how this is done with 1080.
    You also think fine-grained preemption is also missing generally or is your context more around AoTS?

    Did I miss something in the threads or on a site/publication.
    Thanks
     
  8. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    I am only stating what's exposed by the driver in DirectX (see DX Caps Viewer), not making any statements about the hardware.
     
  9. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    OK here is another being a pain question :) ; if fp16 calculations currently has no benefit on existing hardware, then can someone ask Oxide why they went ahead and used it in a way that seems pretty core to the procedural map generation/rendering?
    Or again is this something AMD hardware can take advantage of (this would still tie into your driver-DirectX comment) or something related to render target/texture filtering?
    Cheers

    Edit:
    Context being the Stardock statement and the terrain image issues associated with the 1080FE:
    https://www.reddit.com/r/pcgaming/c...ng_the_aots_image_quality_controversy/d3t9ml4
     
    #1289 CSI PC, Jun 4, 2016
    Last edited: Jun 4, 2016
    pharma likes this.
  10. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    745
    Likes Received:
    39
    Location:
    Copenhagen
    Even though you don't have double-speed fp16 alus, 16 bit operands will ease register pressure and maybe bandwidth. At least GCN takes advantage of this, and I would assume it's the same for maxwell.
     
  11. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah that is a good point, wonder if that is the sole reason they did this or their scope was more.
    Will be interesting as well to see the implications this has with 1080.
    CHeers
     
  12. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    There is general advantage of using fp16 data (memory size, bandwidth,...), but this has been around since floating point was first introduced to GPU pipeline.
    Of current architectures only GCN3 can reduce register pressure by using fp16 (that is from min precision hints in HLSL). This presumably applies to GCN4 (Polaris) and Pascal as well.
    Now it shouldn't need pointing out but since I'm starting to have some serious doubts regarding all this I'll do it any way: taking 3 fp16 registers and doing an actual fp16 multiply add on them can produce significantly different results then taking 3 fp16 registers and actually doing fp32 multiply add on them.

    P.S.: Forgot about Tegra X1, that's current too. :)
     
  13. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    I've been using FP16 on Maxwell to reduce memory allocations and off-chip bandwidth. It works pretty well, but the conversion instructions run quite a bit slower than FP32 ops. Makes it hard to use to reduce register pressure.
     
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    I'm sure, Oxide did their very best to optimize that and keep losses in check.
     
  15. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Well it would be amusing if they went this route solely looking at GCN3 :)
    But then maybe Nvidia should had engaged better *shrug*.
    And yeah I am also curious on the performance benefit/penalty this has for both manufacturers and their various cards along with the potential implications we see with terrain difference using 1080FE.
    Cheers
     
  16. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    Just to add here: this is CUDA and requires manual conversions. It's not exposed to DX. But then again CarstenS says Pascal doesn't expose min precision either.
     
    CSI PC and CarstenS like this.
  17. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,056
    Likes Received:
    1,020
    I may not understand you fully, but are you talking about 32-bit operations on 16-bit operands producing differences in results outside what would be expected from that change of precision?
    Otherwise, numerical differences are to be expected - the question is whether those differences produce significant issues in the real-life use case.

    For someone who majorly belong to another field, this would seem one of the nice things about interactive graphics programming - if it looks fine, then it IS fine.
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Correct. I was talking about DX, since AotS is a DX game and it has to use what the API exposes here. In Cuda, things are different (also from OpenCL and Open GL, where FP16 doesn't seem to be exposed currently either).
    [​IMG]
     
  19. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Full fp16 (ALU + reg) vs fp32 ALU running on fp16 registers (split 32 bit register to upper and lower) should only result in slight additional rounding errors. Assuming of course that the result is stored/loaded to/from 16 bit register after each operation. DX allows 1 ULP error. 32 bit ALU with proper rounding at output to 16 bit register produces ~0.5 ULP max error (mantissa cut would be 1 ULP). Native fp16 ALU results in 1 ULP max error (assuming it follows DX spec). We are talking about 0.5 ULP difference per instruction at most. So GCN3 vs Pascal should be almost identical (assuming Pascal is fp16 ALU and GCN3 is fp16 storage + fp32 ALU).

    Shader compiler is not allowed to reorder floating point math freely. This is especially important to know when writing numerically stable fp16 code. Good article about things that compilers are not doing: http://www.humus.name/index.php?page=Articles&ID=6

    Of course GPUs that do not support fp16 at all have significantly higher precision at math done to min16float variables. But if this results in notable differences in a shipping application, it is most likely the developers fault. You should always check your fp16 code on both fp16 and fp32 to ensure that the image looks the same. #ifdef the type attribute (allows you to disable fp16 from all shaders with a single line code change). Every rendering programmer who has worked with PS3 knows how to deal with this. But fp16 support on modern PC hardware is still very limited, meaning that many developers don't yet have full hardware matrix to test it.
     
    #1299 sebbbi, Jun 5, 2016
    Last edited: Jun 5, 2016
    firstminion, Clukos, Razor1 and 6 others like this.
  20. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    So looks like G-Sync and FastSync are integral to each other, of course it may just be early driver/technology issues or teething problems.
    But a member on another site has reported that for G-Sync to behave correctly with his 1080, he also had to enable FastSync.
    http://www.overclock.net/t/1601922/anybody-having-problems-with-gtx-1080

    Any publications likely to test or investigate G-Sync/FastSync/etc with Pascal cards and also Maxwell 2?
    Cheers
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...