Nvidia Turing Speculation thread [2018]

Discussion in 'Architecture and Products' started by Voxilla, Apr 22, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Worth noting that changes to Volta were beyond just minor process/boost such as instructions/cycle,cache structure,compiler; if the application can utilise the improvements then the benefits are greater by a notable margin than scaling.
    Example is Amber that is between 63-75% faster with V100 over the Teslas GP102 even though the FP32 cores scaled increase by 42%.
    Amber was one of the applications looked at by Nvidia/devs for such acceleration improvements with Volta.

    As reference the Tesla GP102 P40 (7% more cores but different cache/SM structure to P100 and with GDDR5) has around same performance as the P100 16GB SXM 300W accelerator in Amber with FP32 Solvent.
    But then not every application sees such scaling with V100, and games generally are well down from the ALU scaling due to the front end (GTC-Polymorph Engine-etc) although some do work well.
     
    Cat Merc, pharma and Picao84 like this.
  2. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    3,984
    Likes Received:
    34
    I'm not familiar with the instruction mix of the workload you cite - does it benefit from the presence of tensor cores on Volta which are absent on Pascal? If so, I'm not so sure that it is a good analogy for Turing, which I expect to not feature tensor cores. As you say, gaming workloads tend to make use of other fixed function units of GPUs so arithmetic scaling from one GPU SKU to another does not yield linear performance gains, especially between architectures. As an example of this, AMD has maintained an FP32 (let alone FP16 or 64) lead over Nvidia for quite some time now, yet their graphics cards continually fall behind NV's in the majority of gaming workloads.
     
  3. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Amber Solvent is straight up FP32 (for the 'official' benchmarks anyway) and importantly without Tensor cores, it is one of the applications that can benefit from the design of Volta beyond core scaling and not Tensor cores, for the factors I briefly mentioned.
    Cache/register/SM do have a benefit as can be seen when comparing the P100 to GP102 for such applications, where it can be seen the Tesla GP102 with 7% more cores has comparable performance to the 16GB SXM P100.
    Anyway the gains seen with V100 go quite a bit beyond just that when weighing factors involved, even allowing for the cache architecture/simplification improvements (context L1/L2 with Volta).
     
    #123 CSI PC, Jun 20, 2018
    Last edited: Jun 20, 2018
  4. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    People like to declare a primary bottleneck, like the front end, without proof. The reality is likely that the bottleneck shifts multiple times per frame and any areas that don't perfectly scale compound each other. Sometimes performance doesn't scale with ALU count because there's not enough bandwidth to feed the ALUs or the system can't make 100% use of the ALUs for various reasons like waiting on memory when there are some spare ALU cycles.
     
    Kej, Silent_Buddha and Lightman like this.
  5. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    The closest to this was the testing Arun did with his tool looking at Geometry performance tool that historically showed a 1:1 relationship between SM-TPC-Polymorph engine including with Pascal, more recent testing with V100 indicated this has now reduced, which makes sense considering how much the architecture is being scaled up while maintaining the same front end, and that was even allowing for the SM structure with 64 CUDA cores instead of 128 design.
    Although I agree for 100% proof it would be great if Arun could test the P100 to see how the 64 CUDA design affects the relationship (in theory geometry tool performance should still be 1:2 ratio or better but it was worst than this for V100).
    Somewhere in the Pascal or Volta thread (I think it was the Volta one) you can find the discussion on this.

    This is further backed up by what we see with games and their performance that varies between 5% to 35% with average in the 20s, and very rare over 40%.
    Drivers could be a factor for the very lowest ansd also the use of 64 CUDA cores per SM and all it entails (I mentioned in the Volta thread that I remember an Nvidia enginer mentioning it is not ideal with gaming for now), but the trend is still well below the 40% scaling of the architecture for the games that work well generally.
     
    #125 CSI PC, Jun 21, 2018
    Last edited: Jun 21, 2018
  6. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I think the whole point of 3dcgi is that there is nothing to prove, because the types of workload change multiple times per frame and thus the location of the bottleneck changes just the same.

    It only makes sense to say: x% of the time, the bottleneck is here, and y% of the time is somewhere else.
     
    3dcgi likes this.
  7. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    There is, you can look at the ratio of SM-TPC-Polymorph Engine-Raster Engine and actual game frame behaviour.
    Historically this has been a 1:1 performance relationship (see Arun's tool) but as the SM/CUDA cores scale while other aspects remain static it puts more pressure on the front end IF looking to use the idea of the architecture scaling from say Pascal to Volta; a 42% increase but the 1:1 relationship in context of geometry is now broken.
    This is further seen with games that are measured either with PresentMon or other time based derivative solution, you see the influence on frames.

    3dcgi was picking up my post that it was speculation with no foundation; actually it does have a foundation and is backed up with what is seen with nearly every game so far on TitanV and instead of 42% improvement in games we are at average of 18-25% or mostly below and a very rare few either in low 30s% or at times over 40%.
    Look at Arun's tool and what was discussed, then look at games monitored from a frame behaviour perspective.
    If one wants to argue semantics, then one can say there is no bottlenecks anywhere as workload changes for anything; point is context was in response to scaling of compute/TFLOPs/cores and gaming (geometry aspects that can be proved to be less than before in terms of ratio with the architecture fundamental to Nvidia).
    And that then leads into by your context you might as well say games are fine on TitanV and scaling well if we look at the 1% of times it is fine over Y period rather than more real world and how it is behaving 98% of time in the game, in reality games are not scaling well and it comes back so far (no other explanation identified) to what Arun has identified with his tool and was discussed in that thread.

    But like I mentioned to be 100% satisfied with Arun's tool results we need to see the behaviour on P100 due to the SM/CUDA structure, like I said in theory the tool should identify it as 1:2 or better, for V100 it is quite a lot worse than that.
    Still this gives us some indicator (Arun's tool showing front end performance ratio has reduced) combined with what we are seeing with game behaviour trends on Titan V when the cores scaled by 42%.

    Edit:
    Worth noting as well that even with the reduced ROPs in compute applications requiring B/W such as Amber the TitanV still hits over 40% performance increase, so relative to comparing scaling performance with say GP102 it is fair to say it is still not a limitation relative to the core scaling.
    That said it would be even higher with the full HBM2 bit/BW but it is not limiting to below the core scaling.
     
    #127 CSI PC, Jun 22, 2018
    Last edited: Jun 22, 2018
  8. Jupiter

    Veteran Newcomer

    Joined:
    Feb 24, 2015
    Messages:
    1,391
    Likes Received:
    921
  9. Jupiter

    Veteran Newcomer

    Joined:
    Feb 24, 2015
    Messages:
    1,391
    Likes Received:
    921
  10. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,297
    Likes Received:
    464
    676 mm^2? LOL if true.
     
  11. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,416
    Likes Received:
    534
    Location:
    Texas
    I think that die size is certainly in the real of possibility for a GT102 type GPU. For comparison:

    GV100 815mm.

    GP100 551mm
    GP102 471mm
    GP104 314mm
     
  12. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    322
    Likes Received:
    82
    Well it's definitely 12nm if that's true. Besides, Nvidia seems to love large dies recently, and this would explain the rumored $1k pricepoint for the lower end version.

    Not sure how much money they'd expect to make off that of course, but hell maybe it's yet another non gaming focused chip and gamers are SOL again. Why bother serving them after all if the current lineup still sells and there's plenty of buyers for AI and HPC stuff?
     
  13. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,128
    Likes Received:
    903
    Location:
    still camping with a mauler
    GT102 weaksauce I had a GT200 9 years ago.
     
    snarfbot, ImSpartacus, Newguy and 3 others like this.
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    https://www.digitimes.com/news/a20180716PD211.html

    "In addition, shipments for Nvidia's new-generation GPUs will play another driver of TSMC's revenue growth in the fourth quarter, the sources identified."
    I know Digitimes sources are hit and miss, but if true that would mean holiday season at the earliest
     
  15. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,907
    Likes Received:
    1,607
    GeForce GTX 1180 announced late August - 1180+, 1170 and 1160 to follow
    http://www.guru3d.com/news-story/ge...-late-august-11801170-and-1160-to-follow.html
     
    ImSpartacus likes this.
  16. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    entity279 likes this.
  17. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,376
    Likes Received:
    8,594
    Location:
    Cleveland
    Or some stock room guy at Best Buy showed a POS inventory screen.
     
  18. Babel-17

    Veteran Regular

    Joined:
    Apr 24, 2002
    Messages:
    1,004
    Likes Received:
    245
    What's the thinking, irrespective of unconfirmed leaks, Founders Edition first?
     
  19. McHuj

    Veteran Regular Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,416
    Likes Received:
    534
    Location:
    Texas
    That would be my guess. If the leak has any validity, it reads like the release of the founders of the 1080 on 8/30 followed by the non-founders boards in a month on 9/30.

    Had this been launching 6 months ago in Q1, I would have definitely been buying one. Now for some odd reason, I'm not that hyped. I'll either wait for a bundle with games or see how the 1180Ti shapes up (maybe that will be a 7nm product?)
     
    Babel-17 and pharma like this.
  20. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,907
    Likes Received:
    1,607
    HWInfo Diagnostic tool adds new NVIDIA GPUs Support - Volta

    http://www.guru3d.com/news-story/hwinfo-diagnostic-tool-adds-new-nvidia-gpus-support-volta.html
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...