Nvidia Volta Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 19, 2013.

Tags:
  1. Blazkowicz

    Blazkowicz Legend

    More like 300W per GPU and 300W for the rest of the system?
     
    DavidGraham, CSI PC, pharma and 3 others like this.
  2. xpea

    xpea Regular

    In 12FFN, the N stands for Nvidia. It's a custom mode the for the green team ;-)
     
  3. iMacmatician

    iMacmatician Regular

    Looks like you're right, since NVIDIA mentions in the devblog that the V100 has a 300 W TDP.
     
  4. Blazkowicz

    Blazkowicz Legend

    So a bit like Samsung's 14nm LPE and 10nm LPE, but nvidia bought everything?
     
  5. gamervivek

    gamervivek Regular

    A 800mm2 chip is clocked at 1.4-1.5Ghz, so I think it's not out of the realm of reason to expect desktop chips at 2.0Ghz stock. If nvidia put out a gaming chip with the same CUDA cores as the behemoth, it should do 20TFLOPS, about 70% more than the big gaming Pascal. Hopefully, the improvements add another 10-20% and we have a pretty decent gaming card by next year's end.
     
  6. DavidGraham

    DavidGraham Veteran

    They'll have to discard the Tensor cores and the DP units (like they usually do), Also Volta has a new scheduling hardware to handle all of these cores. We could see a reduction in that section as well.

    https://devblogs.nvidia.com/parallelforall/inside-volta/?ncid=so-twi-vt-13918
     
  7. 3dilettante

    3dilettante Legend Alpha

    The thread scheduling method now tracks thread context per work item, and allows for instructions belonging to both paths to issue rather than the hardware running down one path until it reaches its end and then starting on the other. I'll have to take some time to digest the information. Nvidia seems to describing one benefit to their solution as removing the deadlock threat of synchronization operations being split between diverged paths.
     
    Alexko, pharma and DavidGraham like this.
  8. CSI PC

    CSI PC Veteran

    Yeah.
    More impressive though is staying within 300W with the FP64 while also expanding NVLink 2 performance, that is where a lot of the power demand/TDP comes from (more specifically FP64 but NVLink Mezzanine is pretty demanding).

    It is a very interesting design and impressive also with the spec, I mentioned to someone else awhile ago it is a bit like Kepler->Maxwell repeated this time Pascal->Volta.
    They increase the die by 33.6% while impressively keeping with same 300W and yet they go further:
    FP32 compute increases by 41.5% or 2x (yeah depends upon function with Tensor).
    FP64 compute increases by 41.5%
    FP16 compute increased by 41.5% or 4x (yeah depends upon function with Tensor).
    Squeezing into that 33.6% die increase an extra 41% Cuda cores and importantly with additional functions/units.

    And other important aspects such as a heavily revised Thread Scheduling and Cache performance behaviour:
    Specific sections they are in is: Independent Thread Scheduling, and for L0/L1 Cache both Volta SM (Streaming Multiprocessor) and then ENHANCED L1 DATA CACHE AND SHARED MEMORY
    https://devblogs.nvidia.com/parallelforall/inside-volta/

    More of a monster than I expected TBH, but fits with what was being said quite awhile ago about how it is another jump from Pascal with arch changes (and also critically efficiency looking at those specs).
    It will be interesting to see how GV100 pans out as a Quadro 2nd half next year, shame no-one has yet tested the Quadro GP100 with the dual NVLink to see how well it works with certain Professional applications-devs Nvidia work closely with for Quadros.
    Cheers

    Edit:
    Sorry Graham did not read your post before posting so I see you also reference the additional info on the devblog.
    But I think you will find a version of the Tensor cores on certain other CUDA/Volta GPUs.
    Also forgot to say, NVLink 2 as thought is increasing the number of links supported from 4 to 6 and now 50GB/s individually rather than 40GB/s.

    Edit2:
    Was tired just corrected Tensor specifics on proof read.
     
    Last edited: May 10, 2017
  9. pharma

    pharma Veteran

    [​IMG]
     
    xpea likes this.
  10. itsmydamnation

    itsmydamnation Veteran

    Have they confirmed this is one die? To me at 800mm sq it makes far more sense for this to be two dies. Given that its using HBM, there is already an interposer so having two 400mm chips with a high bandwidth fabric between L2 slices* makes far more sense to me.

    *or similar position in the architecture
     
  11. Razor1

    Razor1 Veteran

    they stated reticle limit so one die.
     
    Lightman and pharma like this.
  12. manux

    manux Veteran

    pharma and Razor1 like this.
  13. itsmydamnation

    itsmydamnation Veteran

    Well it truly is insane, gotta love the ambition, poor old kights-*.
     
    Razor1 likes this.
  14. Razor1

    Razor1 Veteran


    anandtech article, pretty good, not sure why they mentioned less flexibility vs more performance, I know it was talked about in the presentation when comparing to other chip types (FPGA's and CPU's) but I don't think Volta's architecture is going to be less flexible than past GPU architectures from nV, maybe Ryan can explain.
     
  15. itsmydamnation

    itsmydamnation Veteran

    he was talking only about the Tensor Cores which are less flexible, they are targeting a specific subset of workloads.
     
    Deleted member 13524 and Razor1 like this.
  16. Razor1

    Razor1 Veteran

    ah ok thx!
     
  17. DavidGraham

    DavidGraham Veteran

    Maybe they can retain some of them in a GV102 core for the Titan crowd? Judging by previous trends, a GV104 core will most likely discard them completely. Speaking of which, I think we can expect a full GV104 to be roughly 20~30% faster than a TitanXp (full GP102), If NV managed high enough clocks.
     
  18. Ryan Smith

    Ryan Smith Regular

    Yep, one die. It's just insane. And that doesn't even get into the interposer (you can't get a traditional interposer large enough).

    In 5 years when everyone starts throwing these out, I'm going to have to get one to add to the GPU collection...
     
    CSI PC, Lightman, pharma and 2 others like this.
  19. shiznit

    shiznit Regular

    Is there a game rendering application for the Tensor units?
     
  20. Razor1

    Razor1 Veteran

    this kinda shows how much they are expecting in sales from DL and HPC to push the limits like this though.
     
Loading...

Share This Page

Loading...