Nvidia Volta Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 19, 2013.

Tags:
  1. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    #861 CSI PC, Dec 8, 2017
    Last edited: Dec 8, 2017
    Shortbread and pharma like this.
  2. rcf

    rcf
    Regular

    Joined:
    Nov 6, 2013
    Messages:
    430
    Likes Received:
    355
    Found this on reddit, don't know how accurate it is:
    Code:
    GPU             Price     FP64    FP64/price
    Titan V         $ 2.999   6900    2,301
    Titan Black     $ 999     1707    1,709
    Quadro GP100    $ 6.999   5200    0,743
    Tesla P100      $ 7.999   5200    0,650
    1060 3GB        $ 199     123     0,618
    1070 Ti         $ 449     256     0,570
    1060 6GB        $ 249     137     0,550
    1070            $ 379     202     0,533
    1080 Ti         $ 699     354     0,506
    1080            $ 549     277     0,505
    Titan Xp        $ 1.200   380     0,317
     
    Shortbread, Grall and pharma like this.
  3. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    Hm, I gotta say that a NVLink bridge costing 600 fricken dollars for a piece of board with some plastic and a few connectors on it is beyond monster cable pricing territory. Nevermind that it is aimed at pro users and doesn't work with the T V; a thing like that should come with the card itself, not be a silly expensive extra.
     
    Shortbread and Lightman like this.
  4. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    87
    Likes Received:
    48
    Judging by the spec, I have a hard time to believe this Titan V can achieve 110T flops of DL, the memory bandwidth on V100 is barely sufficient to feed the mixed precision computation, and now they cut 1/4 of them off.
     
  5. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    How many bytes are required per DL OP?
     
  6. Dayman1225

    Newcomer

    Joined:
    Sep 9, 2017
    Messages:
    77
    Likes Received:
    169
  7. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    If you look at the real-world workload of ResNet-50, you see a speed-up factor of 2.4x when using tensor cores for training (compared to P100 FP32) and 3.7x for inference (compared to P100 FP16).

    Titan V will be slower, of course, but it should be enough of a speedup to be worth it, especially since P100 was never available as a 'cheap' Titan product to begin with.
     
    Shortbread likes this.
  8. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    87
    Likes Received:
    48
    According to their own doc, they did the DL thing actually through 256x256 matrix mul within a warp, for a mixed precision mul, note that there is insufficient storage in either register (warp-wide) or shared-memeory to store temp results, so they have to write back results to main memory.

    Which means, even with the best case scenario, they can only achieve a 256*256*256*2/(256*256*4)=128 DLops/byte

    Therefore, to achieve 110T flops, you will need roughly about 1TB/sec of memory bandwidth, althrough L2 cache can reduce such requirement a little bit (which depend alot on accessing pattern, but given Nvidia's previous GPU archs, I doubt it will help much in most GEMM cases since usually you need much larger Matrix in the first place to achieve high efficiency).
     
  9. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    If I'm not mistaken, the graphics score is about the same or slightly better than a factory OC 1080 Ti.
    But physics score seems less than half, good bit lower than 980 Ti even.
     
  10. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Isn't the physics score CPU dependent?
     
  11. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    Yes I believe that is correct.
    I looked at systems with the same Skylake 6700K that this mystery 'Generic VGA' score uses.
     
  12. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    RAM speed and single vs dual channel... other tasks running in the background.

    But you confused me since you said 'than a '980ti even', I thought you were saying it was GPU dependent.
     
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    According to WCCFTech, the card reaches 1.9GHz boost clock. Oc'ed easily to 2.0GHz just like Pascal. Some benchmarks:

    FireStrike:
    TitanV (stock): 32K
    TitanXp(stock): 28K

    Unigine Superpoisition:
    TitanV (stock): 9431
    TitanXp(stock): ~6000
    1080Ti (OC'ed to 2.6GHz): 8642

    https://wccftech.com/nvidia-titan-v-volta-gaming-benchmarks/

    Some gaming comparisons:

    Gears of War 4:
    Titan V OC - 166 fps
    1080 Ti OC - 124 fps
     
    #873 DavidGraham, Dec 9, 2017
    Last edited: Dec 9, 2017
    Shortbread, Kyyla, nnunn and 4 others like this.
  14. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Worth noting though part of the performance limitation is the interconnect meaning it is a bit more difficult to really know the limitation of the HBM2 memory in this context; NVlink2 125 TFLOPS, PCIe3 112 TFLOPS - both with same HBM2 bandwidth.
    But I agree one would also think 110 TFLOPS still seems optimistic with the HBM2 BW/bus reduction on the Titan V.

    Edit:
    Good grief can tell I am hung over and brain on planet Elsewhere, yeah different core GPU clocks sigh :)
    Still shows the full bus/BW manages 125 TFLOPs with full spec core clocks (NVLink2 card) and we cannot tell what the limitation is at that.
     
    #874 CSI PC, Dec 9, 2017
    Last edited: Dec 9, 2017
  15. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Estimates put it around $600 to $1K and that is just the HW related costs, Jensen I think confirmed it more than most thought and put it closer to the upper $1k estimates.
     
  16. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    VC have posted OCed synthetic benchmarks but barring superposition 1080p extreme, where it's ~50% faster, it doesn't show much improvement over 1080Ti OC, :-?

    https://videocardz.com/74382/overclocked-nvidia-titan-v-benchmarks-emerge

    Superposition 1080p extreme is different only in shader quality, so maybe that's doing something. They mention it showing clocks over 2Ghz with overclocked HBM so probably not throttling enough to explain the low difference.
     
  17. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Worth remembering though this is more of a science/dev GPU Titan rather than a gaming/visual card, a lot of space will be taken up by FP64 and some with the Tensor cores.
    While not a great indicator for gaming, worth noting the Titan V has 13.8 TFLOPs FP32 while the TitanxP was 12.1 TFLOPS FP32 with both having similar GPC/Polymorph count; would be much greater performance difference across the board if it was not for the fact it is a huge mixed precision GPU.
    I would need to find the documentation but I thought as well the higher SM count of these HPC models is not that efficient towards gaming workloads; basically doubles the SM per GPC and TPC, or another way to look at it is 64 CUDA cores per SM rather than 128 as found with all other recent Nvidia GPUs.
     
    Malo likes this.
  18. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    It's 14.8 TFLOPS for the Titan V. And the number is not even true when the chip is running close to 2.0GHz, that effectively makes it a 20 TFLOPS GPU.
     
    Shortbread, pharma and nnunn like this.
  19. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    You can say the same about TitanxP when it comes to clocks spec and OCing.
    You are probably right but some reports have it as 13.8 TFLOPs such as Anandtech *shrug* - also to break over 14 TFLOPs by a fair bit it may need to go up to the 300W spec of the NVLink model rather than the 250W spec it is sold as, possible but I assume it means changing power settings.
    Anyway point is it is not massive over the TitanxP (21.5% more relevant CUDA cores with the Titan V), which is reflected in some of those scores for the reason I mentioned, the larger gap scores can possibly make better use of some of that arch change but like I mentioned the double number of SMs per GPC/TPC may be detrimental to gaming type workloads.

    It is a really great card for the price do not get me wrong, but it is not necessarily perfect for most prosumers.
     
    #879 CSI PC, Dec 9, 2017
    Last edited: Dec 9, 2017
  20. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    Those are the numbers NV gave me. Their perf figures seem to be calculated against a ~1350MHz clockspeed, rather than the boost clock.
     
    Shortbread and CSI PC like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...