Nvidia Volta Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 19, 2013.

Tags:
  1. nnunn

    Newcomer

    Joined:
    Nov 27, 2014
    Messages:
    28
    Likes Received:
    23
    Guessing Compubench 2 "Ocean Surface Simulation" test involves bandwidth limited kernels?
    Just noticed a V100-PCIE-16GB gives a nice boost, https://compubench.com/result.jsp

    Tesla_V100_PCIE.jpg
     
    pharma likes this.
  2. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    322
    Likes Received:
    82
    It does feel like it. For training GPUs are fairly great as it is. Nvidia will no doubt keep optimizing for AI training, as will AMD and Intel with it's new GPUs most likely. There's already heavy competition there for price and advancement. It doesn't feel like the TPU is a particularly necessary project.
     
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    It probably all depends on cost. Google will likely want to install tens on thousands of TPU racks.
    If they can produce them at a much lower cost than a DXG appliance, then it might still make sense?
     
  4. BoMbY

    Newcomer

    Joined:
    Aug 31, 2017
    Messages:
    68
    Likes Received:
    31
    Ehh, yeah. They are probably not paying more than $50-$100 per TPU, but $5000-$10000 per NVidia V100 ... So 100x TPU or 1x V100 - you chose.
     
  5. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,056
    Likes Received:
    1,020
    We’ll see about the TPU/siblings market. There are a number of interested parties at different tiers.
    I’d like to remind everyone about how, when Intel decided to enter the mobile market, many thought that they would soon come to dominate completely. They had enormous CPU design experience, the best fabs, coffers deep enougn to soak up several billion dollars worth of losses (and they did) in order to crack the market and gain dominance, the full weight of the x86 software stack, ability to strike cross market deals with players like ASUS, Lenovo, Acer...
    I think strong predictions about the TPU market are a bit premature.
     
  6. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I expect that, apples to apples, Google’s TPU cost will be much higher than that, since you’re quoting V100 system prices. (Enclosure, XEON server, infiniband, ...) So let’s say 10x instead of 100x.

    Also, while R&D costs are something Wall Street doesn’t care about, it may be something Google takes into account internally, because there is an alternative that they can simply buy.
     
    pharma and A1xLLcqAgt0qc2RyMz0y like this.
  7. BoMbY

    Newcomer

    Joined:
    Aug 31, 2017
    Messages:
    68
    Likes Received:
    31
    What do you think a 330 mm² chip in 28nm costs these days? Or a small board, and 2x4 GB DDR3 memory? My $50-$100 guess already includes development cost writeof.
     
  8. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    But Google TPU 2nd gen is not a single or basic PC, it is designed for massive scale out high bandwidth Tensor computations, the costs are quite different.
    Worth noting the board is also 4xTPU processors rather than just a single accelerator.

    Separately here is a photo from Google on how it is installed to get a sense of scale and integration required for a design that influences manufacturing cost.

    [​IMG]
     
  9. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I don’t know. But I don’t think the question is very relevant.

    One of the requirements for training is memory bandwidth, so that kind of memory just isn’t going to cut it.

    And the thing must run in a system. So either you compare chip (plus memory) to chip or system to system.

    A $10k V100 is part of a complex system. It makes no sense to compare that against a $100 chip.

    After all, we already know that you can buy a V100 (almost) for $3k.
     
  10. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    Just got my 2 Titan Vs today, have tested on a few kernels, the results are good, but it seems that the boost clock is overrated, in my tests, the GPU boost clock only reach to 1355MHz, vastly lower than my GP102, which can reach to 1850+MHz.

    The most interesting part is GEMM test with CUBLAS_TENSOR_OP_MATH enabled:

    With tensor core enabled for GEMM with fp16 x fp16=fp32, Titan V can reach 83Tflops/sec, which is quite impressive, espeically considering it only have 3/4 of the bandwidth of V100.

    And the most unexpected result is, when tensor core is enabled, it seems that it can accerlate sgemm as well for whatever reason yet to know:

    Without tensor core, the SGEMM test on Titan V can get just ~12Tflops

    But with tensor core enabled, the SGEMM on Titan V can reach to 30-40Tflops.

    I dont know how this is possible, maybe Nvidia forget to mention their tensor core can accerlate sgemm as well? just hope this is a hidden feature, instead of a bug of CUDA 9.1.
     
    #1030 LiXiangyang, Feb 26, 2018
    Last edited: Feb 26, 2018
    iMacmatician and Grall like this.
  11. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Interesting.
    The SGEMM result with it enabled looks more like the FP16 CUDA mixed precision result one could expect (within P100 and V100/Titan V), but I assume you are not using that.
     
  12. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    Never mind, I just checked CUDA 9.1 documents, it seems that cublasSgemm will just covert FP32 to FP16 when tensor core is enabled:

     
    Alexko, Bondrewd, Grall and 1 other person like this.
  13. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Kinda makes sense using mixed-precision function, although I can appreciate it is changing the concept of Sgemm, Tensor is blurring the boundaries.

    BTW if you get the time please could you consider trying Hgemm to compare performance of Sgemm with Tensor enabled.
     
    #1033 CSI PC, Feb 27, 2018
    Last edited: Feb 27, 2018
  14. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,909
    Likes Received:
    1,607
    @LiXiangyang
    I'm curious if you have "Above 4G Decoding" enabled in the motherboard bios for 64-bit decoding above the 4G address space?
     
  15. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    What exactly does that option do? I have it in my current main rig's UEFI, and it doesn't explain (which isn't a surprise, because ASUS fairly sucks.)
     
  16. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,909
    Likes Received:
    1,607
    Supposedly you should set it to Enable when using cards like Tesla or Quadro to allow memory mapped I/O to use greater than 4GB address space for a 64-bit PCIe card/device. Not sure if it does anything but I noticed the bios option in my Asus mb as well.

    Edit:
    Tomshardware mentioned it recently in one of their mining articles:
    http://www.tomshardware.com/news/msi-bios-cryptocurrency-mining-performance,34972.html

    Interesting fact ... enabling the 4GB address option on my motherboard removed some of the OS exclamation marks within Device Manager for some of my devices. Similar to what happened in the Tomshardware article.
     
    #1036 pharma, Feb 27, 2018
    Last edited: Feb 27, 2018
    Grall likes this.
  17. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    Contracted a local nvidia guy, it seems that the boost on Titan V is just that low (1335MHz for my two cards, and the boost is much less flexiable than average Geforce, more like the case in Tesla/Quardo, so maybe the Titan V should be renamed to Tesla V80 instead), but when play games, the card can boost to 1800MHz or so.

    I suspect that must has something do with the FP64 thing, the original Titan will down clock significantly when full speed FP64 is enabled.

    Its a shame the driver can no longer disable full speed FP64 on Titan V.

    I always leave that option enabled.
     
    #1037 LiXiangyang, Feb 28, 2018
    Last edited: Feb 28, 2018
    Lightman, Grall, pharma and 2 others like this.
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Depending on which API you use, the boost can go well beyond that point. In OpenCL I saw 1335 MHz as Boost as well. In D3D (which has a compute part also) it went up there with the Pascal cards. Until of course the cooler could not get rid of the heat anymore, which is also pascal-like.
     
    DavidGraham and pharma like this.
  19. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,909
    Likes Received:
    1,607
    ImSpartacus, BRiT, Lightman and 2 others like this.
  20. Babel-17

    Veteran Regular

    Joined:
    Apr 24, 2002
    Messages:
    1,004
    Likes Received:
    245
    Simply speculation: If nVidia releases a series of new gaming cards that offer poor mining bang for the buck it could create an interesting situation. At current prices, existing owners of GTX 1050s/1060s/1070s who don't mine could sell their cards for what they paid for them, or more, and use that cash to subsidize the upgrade to the the new, hypothetical, gaming card.

    On the other hand, nVidia would be foolish to not sell new cards to miners, so they'd likely release something that would have the effect of lowering the value to miners of existing cards. Interesting times indeed, waiting to see how this all turns out. Interesting times in a bad way for those who need/want a new card.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...