Nvidia Volta Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 19, 2013.

Tags:
  1. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    AFAIK, Kepler (GK110) was much less efficient at sustained transfers from the shared memory, despite the theoretical high rates, compared to Maxwell. I have to check my sources for exact numbers.

    Update: Dissecting GPU Memory Hierarchy through Microbenchmarking
     
    #961 fellix, Dec 19, 2017
    Last edited: Dec 20, 2017
    pharma and CarstenS like this.
  2. Dayman1225

    Newcomer

    Joined:
    Sep 9, 2017
    Messages:
    57
    Likes Received:
    78
  3. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    lol a page for "Can it run Crysis", thanks @Ryan Smith :)

    4k 4xSSAA and 60fps
     
  4. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    372
    Likes Received:
    309
  5. HKS

    HKS
    Newcomer

    Joined:
    Apr 26, 2007
    Messages:
    31
    Likes Received:
    14
    Location:
    Norway
    And we also confirmed that GV100 support fast FP16 math, even though it is currently only exposed in CUDA.
     
    pharma, Ryan Smith, fellix and 2 others like this.
  6. firstminion

    Newcomer

    Joined:
    Aug 7, 2013
    Messages:
    217
    Likes Received:
    46
    Why not make a specialized chip with just those tensors? Google's approach makes more sense.
     
  7. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    There is the half length V100 at 150W, closest one would get to a dedicated Tensor GPU IMO; the design is still integral to the Nvidia core CUDA-GPC-SM-compute level-instruction model.
     
  8. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,170
    Likes Received:
    576
    Location:
    France
    Maybe they can or will too. But I'm sure an "all around" card can be needed and wanted too.

    Plus I guess they can have more feedback this way.
     
  9. HKS

    HKS
    Newcomer

    Joined:
    Apr 26, 2007
    Messages:
    31
    Likes Received:
    14
    Location:
    Norway
    I guess they are. On the new Xavier SoC, they have a hardware-block dedicated to "machine learning" called NVIDIA Deep Learning Accelerator (NVDLA).
    They have also open sourced this implementation.

    http://nvdla.org/
     
  10. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,566
    Likes Received:
    400
    Location:
    Earth
    Not all workloads are tensor. Big driver for Volta was supercomputer deals requiring more generic computing than tensors.
     
    xpea likes this.
  11. rcf

    rcf
    Regular Newcomer

    Joined:
    Nov 6, 2013
    Messages:
    398
    Likes Received:
    322
    Could Tensors be used for graphics?
     
  12. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,566
    Likes Received:
    400
    Location:
    Earth
    Very likely no in in traditional sense. On the other hand could there be new types of algorithms using neural networks to enhance graphics?
     
  13. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Nvidia did a presentation showing how it can be used professionally in that context, one involved rendering quality improvement to a car image (I cannot remember much about that as a little while ago and not sure if it was just Anti-aliasing but thought it involved more), they also talk a lot about AI rendering in general:
    https://blogs.nvidia.com/blog/2017/07/31/nvidia-research-brings-ai-to-computer-graphics/
     
    pharma and manux like this.
  14. firstminion

    Newcomer

    Joined:
    Aug 7, 2013
    Messages:
    217
    Likes Received:
    46
    I mean even if it's nice for a first release there will soon be a faster, more power efficient and much cheaper product. There's so much cruft there.

    A dedicated produtct would make possible for denser, cheaper nodes.

    But that's for another market, we can't put those on the datacenter.
     
  15. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    You do not see the appeal of a dense populated 150W solution from Nvidia competing against other products in the next 12 months?
    Which current product do you see matching this?

    Sure it will be superceded by the next generation from Nvidia, but then so is every generation.

    The profit margin/costs-logistics/R&D probably makes more sense for Nvidia to continue with half-length 150W GPUs going forward that target more of the DL aspects where clients do not require the full hybrid mixed-precision implementation; though there is a large market of HPC-science that require the full hybrid and especially so as AI-DL matures.
    There is the Tegra solution, who knows what will happen down the line with ARM tech as a server solution as Nvidia has never given up that HPC research.
     
    #975 CSI PC, Dec 21, 2017
    Last edited: Dec 21, 2017
  16. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    1,566
    Likes Received:
    400
    Location:
    Earth
    These supercomputers would not have happened if Volta wasn't Volta


    https://blogs.nvidia.com/blog/2017/...uters-to-supercharge-ai-scientific-discovery/

    Amazon&co would have hard time selling GPU cloud if Volta was tensor only. Fp64/HPC performance matters. I guess the misconception is people thinking volta as dnn only chip which it isn't.

    There really isn't yet tensor optimized software. Even Google isn't yet providing tpu to anybody outside Google. What is the publically available and popular tensor only accelerator volta competes with today? Year from now market could be different but volta is out today.
     
    Grall, xpea, DavidGraham and 3 others like this.
  17. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    360
    Likes Received:
    252
    They can be used for graphics via neural networks, which are universal function approximators. Since graphics is all about functions - input data->transform via some function->output, CNNs can approximate any pixel shader - http://deep-shading-datasets.mpi-inf.mpg.de/
     
    Grall and pharma like this.
  18. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I was about to comment on how enjoyable it is to read a well written piece of journalism. And then I stumbled into a repeat of this thing:
     
    tinokun, MDolenc and OlegSH like this.
  19. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    360
    Likes Received:
    252
    The same story, but that sounded more like a Crysis meme to me, like... this new GPU is cool, but does it contain hardware scheduling? :-D
     
  20. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Ryan (or anyone else from Anand) - how did you setup/implement the GEMM tests? I'm guessing it's cuBLAS multiplying two matrices which are *both* very large? I agree memory bandwidth is a key question for efficiency here - I'm thinking the way the tensor cores are used for cuDNN might have different characteristics in terms of "external bandwidth required per amount of computation" (depending on what they're doing). Might also be interesting to try downclocking core and/or memory separately and see what happens.

    BTW on the Beyond3D "Estimated FMA latency" test - it doesn't really make sense for GCN to be 4.5 cycles :( There are possible HW explanations for non-integer latencies but they're not very likely. The test inherently has some overhead (which can be amortised by trading off how long it runs in various ways) so maybe it's just higher on GCN for some reason which makes it "look" like 4.5 cycles when it's really 4 cycles; I'm not quite sure.

    I always thought it'd be interesting to get power consumption numbers when running that test btw (would probably have to be changed to run in a loop) - it's effectively using the GPU as little as theoretically possible, but still keeping it active non-stop (1 lane of 1 warp/wave). So in a way it's the smallest possible step up from "idle" and shows what's the minimum power when you're not allowed to just power gate (or shutdown) everything!

    I'm getting my Titan V in early/mid January - I'll definitely write some microbenchmarks to test a few things I'm curious about, thinking of maybe writing articles describing the deep learning HW landscape too, we'll see...

    (P.S.: Agreed with silent_guy, every instance of "scheduling hardware" should really be replaced by "dependency tracking hardware"!)

    EDIT: And needless to say, thanks for the really nice article with original analysis and tests - happy to see you guys spending the time to do that! :)
     
    #980 Arun, Dec 21, 2017
    Last edited: Dec 21, 2017
    nnunn, DavidGraham and pharma like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...