Nvidia Turing Product Reviews and Previews: (Super, TI, 2080, 2070, 2060, 1660, etc)

Discussion in 'Architecture and Products' started by Ike Turner, Aug 21, 2018.

  1. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    332
    Likes Received:
    87
    Ow, pricepoints. That's really what Nvidia should be concentrating on for the next arch, rather than new features. There's so much overlap of function in the silicon here.

    But, well, at least it's something in the $2XX price range. So they've got that going for them.
     
    vipa899 likes this.
  2. vipa899

    Regular Newcomer

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    354
    Location:
    Sweden
    Agree, price is the only problem with nvidias gpu's. They need competition from AMD and Intel, if they come with products at about the same performance and features for human prices they will have to adjust.
     
  3. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    611
    Likes Received:
    1,052
    Location:
    PCIe x16_1
    Thanks for pointing that out. I forgot to edit that after NVIDIA confirmed the dedicated FP16 cores and how they work.

    There are numerous good reasons to have the FP16 rate be 2x the FP32 rate, even when using tensor cores. This includes register file bandwidth and pressure, and consistency with Turing parts that don't get tensor cores (since NV has to lay down dedicated FP16 cores on those parts).

    IMO, the whitepaper didn't do a very good job of explaining it. But according to NVIDIA, for TU102/104/106, general (non-tensor) FP16 operations are definitely done on the tensor cores. They are part of the SMs, after all.
     
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Big Turing doing fp16 as part of the tensor core is rather intriguing, but makes sense I suppose. It's just a bunch of fp16 multipliers and adders after all. For non-matrix operations you basically only need 1/4 of them, without any complex cross-lane wiring.
    In that sense dedicated fp16 cores would really be just the remains of the tensor cores.
    I'm wondering though actually what fp16 operations turing can do with twice the rate of single precision, that is, can they do more than mul/add/fma? Obviously for the tensor operations you don't really need anything else, but otherwise things like comparisons would be quite desirable.
     
    Heinrich4 likes this.
  5. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,789
    Likes Received:
    2,596
    I am still quite lost on this. Let's give an example, Far Cry 5 supports RPM, Vega does it on the ALUs, the 2080Ti does it on the Tensor Cores? If so then how is it able to maintain 2x FP32 rate? Are the tensor cores capable of such feat?
     
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,998
    Likes Received:
    4,571
    The tensor cores in "Big Turing" can do "linear" (non-matrix) FP16 at 1/4th their matrix op rate.
    It looks like the dedicated FP16 units in TU116 are stripped down tensor units.

    AFAIK Far Cry 5 doesn't support RPM per se, it just uses FP16 pixel shaders. Vega (and GP100/GV100) uses RPM to process FP16 at 2x FP32 rate, Turing does it differently.
     
    entity279, pharma and Ryan Smith like this.
  7. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,789
    Likes Received:
    2,596
    Thanks. But If Big Turing uses only Tensor cores for FP16, and the tensor cores do it at quarter of their matrix capability then Turing isn't really capable of 2x FP32.
     
  8. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,229
    Likes Received:
    422
    Location:
    Romania
    Depends of just how many tensor cores there are, right ?
     
  9. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,702
    Likes Received:
    117
    Yeah, seems like a decent enough card, but nobody wants to buy a 6 GB card in 2019 for $280 regardless of what benchmarks show. Pretty out of touch...
     
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,789
    Likes Received:
    2,596
    Precisely my point.
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,184
    Likes Received:
    1,841
    Location:
    Finland
    How so? Big Turings Tensor OPS rate is 8x FP32, doing FP16 on those at quarter of matrix speed would result in 2x FP32
     
    DavidGraham likes this.
  12. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,157
    Likes Received:
    5,092
    Hmmm, so the 1660 Ti is basically similar to a 1070 in performance (sometimes a little faster, sometimes a little slower) with slightly lower power consumption and slightly higher noise levels? Oh and 2 less GB of memory (6 GB vs. 8 GB).

    Not bad, although you can still occasionally find 1070's at 299 USD (one on Newegg right now) which may or may not be a better deal. Of course, eventually those will all disappear leaving just the 1660 Ti's.

    Regards,
    SB
     
    BRiT likes this.
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,789
    Likes Received:
    2,596
    It seem I somehow missed that fact. Though this has the implication of limiting DLSS performance in games that heavily utilize FP16 shaders.
     
    #753 DavidGraham, Feb 23, 2019
    Last edited: Feb 23, 2019
  14. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    120
    Likes Received:
    181
    No, Tensor operation will always run alone. DLSS is post processing AA which runs after the creation of the frame.
     
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,184
    Likes Received:
    1,841
    Location:
    Finland
    I believe this would be correct
    Not sure how that changes anything, the time spent on DLSS as post processing the tensor cores could be already crunching FP16 shaders for next frame - it all depends on the loads
     
  16. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    120
    Likes Received:
    181
    Future workload would be overlapping with the current frame creation.
     
  17. jlippo

    Veteran Regular

    Joined:
    Oct 7, 2004
    Messages:
    1,343
    Likes Received:
    443
    Location:
    Finland
    Didn''t Jensen implicate that rest of the GPU would idle when tensor cores are active?
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,184
    Likes Received:
    1,841
    Location:
    Finland
    If my memory serves me correctly, this only applies to DXR denoising, not tensor cores in general?
     
  19. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    120
    Likes Received:
    181
  20. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,789
    Likes Received:
    2,596
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...