Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

Discussion in 'Architecture and Products' started by Love_In_Rio, May 17, 2016.

  1. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,163
    Likes Received:
    1,453
    Location:
    Beyond3D HQ
    Wouldn't that show up in the SASS though? Or are you saying that an emulated HFMA2 would be transparent via the JIT?

    If I disassemble the binaries that are generated by nvcc for my code, it's just HFMA2s, and the code doesn't even run on pre sm_53 targets as-is, so I'm not seeing transparent emulation.

    I'll check and make sure I'm not building with stripped PTX.
     
  2. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,163
    Likes Received:
    1,453
    Location:
    Beyond3D HQ
    Yep, that's right.
     
  3. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah I appreciate that the only Maxwell GPU to support 5.3 compute is Tegra X1.
    I am talking more about SGEMMex with its partial support for FP16 (also int8).

    I think this was updated to HGEMM with regards to true FP16 and 5.3 compute.
    Still wondering where I read the 1:1, maybe confusing it with AMD *Shrug*.

    Edit:
    So why did NVIDIA bother artifically bother changing this to 1/6th, when one just uses SgemmEX instead (yeah appreciate more limited)?
    Rys any chance to test and compare both operations on the 1080?
    Cheers
     
    #283 CSI PC, Jun 1, 2016
    Last edited: Jun 1, 2016
  4. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    You are definitely seeing an HFMA2 op in the SASS.

    I was just pointing out that the 1080 throughput you report seems very close to the throughput of an FMA built out of the half/half2 data conversion intrinsics which are available on pre-sm_53 GPUs.

    So perhaps it's microcoded?
     
    CSI PC likes this.
  5. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    365
    Likes Received:
    319
    You are still assuming that the instruction is intentionally rate limited. As I said before, it's more likely that Nvidia just included 1-2 "new" CUDA cores per SMM, which are handling the new instructions, and the remainder is just "old" (Maxwell?) cores.
    It would show up.

    But so far your test only reveals bare throughput, not the latency of HFMA2. So you can't tell yet whether the instruction itself is slower, or if there are just fewer cores available for that instruction to be scheduled to.

    Can you please run it with only a single wavefront with only 2 threads, and compare that to FP32 throughput? If HFMA2 is now faster, or at least running at the same speed, then we know for sure that the limitation is not a lack of a fast path, but an asymmetric configuration of the CUDA cores.
     
  6. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I am not the only one who is assuming it is artificially limited though, but then we are all making assumptions here even if one disagrees that is the issue until further tests.

    Cheers
     
  7. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Part of this conversation should also consider ScottGray showing what seemed full throughput instruction of Int8 (sm_61+) on the 1080.
    Cheers
     
    pharma likes this.
  8. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,348
    Likes Received:
    1,970
    #288 pharma, Jun 1, 2016
    Last edited: Jun 1, 2016
    Razor1 likes this.
  9. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    729
    Likes Received:
    221
    Location:
    india
    Pretty pathetic overclocks even from the Founder's Edition.
     
  10. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,348
    Likes Received:
    1,970
    With regard to MSI's GTX 1080 offerings I think the card reviewed is their middle grade, or one step above the FE edition.

    http://www.techpowerup.com/223002/msi-gaming-z-and-gaming-x-differentiated-some-more

    The only card that comes to mind with truly pathetic overclocks is not an Nvidia card.
     
    #290 pharma, Jun 1, 2016
    Last edited: Jun 1, 2016
  11. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    729
    Likes Received:
    221
    Location:
    india
    It's barely third of what 980Ti gaming could do, so it does look pretty pathetic. I doubt a better cooler is going to make that big of a difference unless MSI suddenly gimped their gaming series and are taking nvidia's FE approach. Maybe Pascal does better with voltage than what Maxwell did.
     
  12. A1xLLcqAgt0qc2RyMz0y

    Veteran Regular

    Joined:
    Feb 6, 2010
    Messages:
    1,060
    Likes Received:
    388
    The MSI GeForce GTX 1080 GAMING X 8G is already factory overclocked.

    GTX 1080 FE clocks:

    Base: 1607 MHz
    Boost: 1733 MHz
    Memory: 5005/10010 MHz

    MSI GTX 1080 GAMING X clocks:

    Base: 1709 MHz - 6.3% higher than FE
    Boost: 1848 MHz - 6.6% higher than FE
    Memory: 5005/10010 MHz - same as FE

    MSI GTX 1080 GAMING X Overclocked clocks:

    Base: 1789 MHz - 11.3% higher than FE
    Boost: 1955~2067 MHz - 12.8%-19.3% higher than FE
    Memory: 5622/11244 MHz - 12.3% higher than FE

     
    BRiT likes this.
  13. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Well look like clock speed are allready too close of the power / temp limits ( even at stock speed anyway ), its not much suprising. Could change with a special bios who allow to play outside thoses limit.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,537
    Likes Received:
    589
    Location:
    New York
    What does over clocking nVidia cards actually do? Is there a set limit to the number of boost bins above the set core clock?

    If there was no limit I imagine the card would always boost to the same level regardless of the "base" clock.
     
  15. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I thought maybe I am thinking of Kepler or Tesla so did a bit of checking but only Tesla cards.

    Peak workload figures.
    The M40 has same figure for both FP32 and FP16; 6,844 Gflops
    Checking Tegra K1, had the same figure for both FP32 and FP16; 365 Gflops for both.
    Also the Tesla K40 had the same figure for both FP32 and FP16; 5,040Gflops

    So was this only applicable to Tesla and say the Tegra?
    Context is where I was talking earlier about peak performance and 1:1 ratio and does that mean some older cards may be faster than 1080.
    Cheers
     
  16. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,348
    Likes Received:
    1,970
    You are correct with regard to the bios. My Evga 780's come with 2 different bios, and the GTX1080 Classified will have 3 bios.
     
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    I tried to find some links starting FP16 numbers for older Tesla GPUs but couldn't find anything. Any links?
     
  18. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I want to double check the couple of sources I had to be 100% sure of the upper Tesla models as maybe that is what had me wrong in the 1st place..

    Only one that would be accepted I think for now relates to the Tegra K1 and that is in the Nvidia whitepaper page 13: http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf
    Page 13 shows the figure for the K1 (same fp16 and fp32) and also the X1 (where it is now doubled).

    Cheers
     
  19. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,348
    Likes Received:
    1,970
    Total War WARHAMMER DX12: PC graphics performance benchmark review w/GTX 1080/1070
    http://www.guru3d.com/articles_page..._graphics_performance_benchmark_review,1.html
     
  20. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,474
    Likes Received:
    3,826
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...