Nvidia Volta Speculation Thread

They are skipping 10nm, 7nm will be their next node, as for the release timeframe for Volta, it's certainly been accelerated, and since it will still be on 16nm I see no reason for them not to release it, there won't be prohibitive cost increase or anything. I imagine the chips at each tier will be larger, they each have 25% more area to go

Typo..I meant to say 10nm.. Also I should have clarified..I was referring to GV100 specifically. GP100 is pretty much maxed out in terms of die size and GV100 couldn't be much larger. They would probably be able to increase density a bit but would basically have to increase perf/W and/or perf/mm2 to see a substantial gain over GP100. Given how efficient Pascal already is..I wonder how much scope there is.
 
Skipping 10nm correlates well with skipping mobile.

Is Volta pretty much a refinement of Pascal? So far what I remember is known is :
- RAS features (Reliability, Availability and Serviceablility), for Tegra
- NV Link 2.0, as in faster/better/"this time it's a coherent interface, we promise", for HPC

I will compare it to Intel's Haswell-EP vs Haswell-EX :
- RAS features
- Transactional memory features can actually be used

It's a better improvement than GK210 vs GK110.
 
Me neither and if the rumours that NV is skipping 10nm are true and Volta is still on 16nm..NV will have to do another Maxwell as they will be area and power limited. I'm not sure if GDDR6 is a necessity as such. They'll get a bit more out of delta compression (Diminishing returns there though) and GDDR5X will scale to at least 14 Gbps which is 40% higher than what they're shipping today. That should be enough to tide them over.

I've phrased it "GDDR6-whatever" since it's in the general consensus less important what the memory they'll use will be named as long as it delivers the necessary added bandwidth and fits the power & cost bill.
 
volta has 128 bit core ?

p.2-(c) 4-Wide SIMT lane detail
http://research.nvidia.com/sites/default/files/publications/Gebhart_MICRO_2012.pdf

gm104 has 5.2b transistors and 2048 core.
Xavier has 512 core(128bit).
The way I understand it: a 4 wide SIMD with 32-bit operands results in RAM banks that are 128-bit wide each. And you have 4 parallel banks, resulting in the 4x128 that's mentioned in the paper.

I don't see a date in that paper, other than the _2012 in the URL, so this seems pretty old? Maybe part of the Echelon project?
 
some Volta information today from DGX Saturn V presentation:
Here’s the fun bit to contemplate. Nvidia has demonstrated it can deliver 42.5 teraflops of peak performance and about 28.7 teraflops of sustained performance in a 4U server node (ie DGX-1). That is about 10.6 teraflops per 1U of rack form factor space. IBM, Nvidia, and Mellanox are working to get more than 40 teraflops of capacity into a 2U “Witherspoon” system next year with the “Summit” supercomputing they are building for the US Department of Energy facility at Oak Ridge National Laboratory. To our way of thinking, that means the future “Volta” GV100 accelerator cards should have about twice the performance of the Pascal GP100 cards currently shipping, if IBM sticks with the 2U form factor and only puts four GPU accelerators into the Witherspoon box alongside a pair of 24-core Power9 chips as we expect. And by this time next year, Nvidia might be able to build a DGX-2 system sporting Intel “Skylake” Xeon E5 v5 processors paired with eight Voltas and delivering as much as 85 teraflops double precision in a 4U chassis – the same performance density as the Summit machines at the peak level. We still think the Witherspoon node used in the Summit machine could have a slight advantage in harnessing that floating point oomph across four Volta cards for HPC workloads.
source: https://www.nextplatform.com/2016/11/14/nvidias-saturn-v-dgx-1-cluster-stacks/
 
At the end of the article :

If AMD supports NVLink ports on the future “Zen” 16-core Opteron processors, this would be a very interesting development indeed. But that seems unlikely, considering how AMD has joined the CCIX, OpenCAPI, and Gen Z consortiums. That said, it could happen, and that would mean customers wanting to tightly couple CPUs and GPUs in a fat node could have an X86 option that had many of the same benefits that IBM is bringing with Power8 and Power9 chips

But there I'm there, duh, isn't CAPI sort of what IBM calls the NVLink bus on its side of the bridge?
It's a recent announcement, indeed AMD, Xilinx, Mellanox and others are invited to the OpenCAPI fest and they call the bus on POWER9 OpenCAPI.
http://www.anandtech.com/show/10759/opencapi-unveiled-amd-ibm-google-more

So in the end, yes that reads as if AMD will be able to use the NVLink 2.0 bus, literally if not for the fact you can call it by another name. But perhaps in a second generation system ; perhaps on some big dGPU.
 
At the end of the article :



But there I'm there, duh, isn't CAPI sort of what IBM calls the NVLink bus on its side of the bridge?
It's a recent announcement, indeed AMD, Xilinx, Mellanox and others are invited to the OpenCAPI fest and they call the bus on POWER9 OpenCAPI.
http://www.anandtech.com/show/10759/opencapi-unveiled-amd-ibm-google-more

So in the end, yes that reads as if AMD will be able to use the NVLink 2.0 bus, literally if not for the fact you can call it by another name. But perhaps in a second generation system ; perhaps on some big dGPU.

The challenge for them though is the R&D required to provide a much better fabric-connector solution; Intel bought the team & tech from Cray (QLogic) that is integral to their fabric Omni-path work while Nvidia headhunted key engineers related to Cray's interconnect work and core to the NVLink development.
Both of which have been years in development and planning, and can be traced back to the work-experience of Cray's Aries interconnect to some extent.
Cheers
 
At the end of the article :



But there I'm there, duh, isn't CAPI sort of what IBM calls the NVLink bus on its side of the bridge?
It's a recent announcement, indeed AMD, Xilinx, Mellanox and others are invited to the OpenCAPI fest and they call the bus on POWER9 OpenCAPI.
http://www.anandtech.com/show/10759/opencapi-unveiled-amd-ibm-google-more

So in the end, yes that reads as if AMD will be able to use the NVLink 2.0 bus, literally if not for the fact you can call it by another name. But perhaps in a second generation system ; perhaps on some big dGPU.
No - OpenCAPI and NVLINK are separate even if they use same physical parts in the IBMs end

ibm-opencapi-power9-io.jpg


(to my understanding "New CAPI" is what became OpenCAPI)
 
Well, the diagram in this article, as well as the one you provide above, makes it seem like NVLink 2 and OpenCAPI could be using the same physical and link layers. It's not clear to me that NVLink 2 is separated from OpenCAPI by more than branding - do you have any concrete information on their differences that you can share?
 
Well, the diagram in this article, as well as the one you provide above, makes it seem like NVLink 2 and OpenCAPI could be using the same physical and link layers. It's not clear to me that NVLink 2 is separated from OpenCAPI by more than branding - do you have any concrete information on their differences that you can share?
It's more about the protocol than physical interconnection as serial connections are pretty straightforward. While AMD could potentially use NVLink, I'd imagine Nvidia want a significant licensing fee to do so. Even the protocol, I believe, was designed by an AMD Engineer that created GMI that Nvidia hired so it wouldn't be surprising if there were a lot of similarities.
 
So it's a layer cake.

I did read semi-recently about use of ethernet for on-board communications (sorry if I can't find it all, it seems hardly googlable). Sounds crazy, but if (somewhat) high latency/high throughput does the job and if integration of hardware and software is much easier it might have applications.

Alright, that'd be the Gen-Z protocol that would sit on top of ethernet as an initial implementation, but is certainly meant to sit on top of others otherwise, like on top of OpenCAPI - or the other way around, because I can easily make a reading or comprehension mistake and mix up the layers.
https://www.nextplatform.com/2016/10/12/raising-standard-storage-memory-fabrics/

The article talks of memory pools and the ability to often safely pass pointers or metadata (handles) around instead of needlessly copying data from memory A to memory B, which I think is something I understand.

NVLink I can understand it's its own thing, not a big deal if most implementations only use NVLink between GPUs and that's what you needed. (and could afford)
 
NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute
NVIDIA previously stated through their roadmaps that NVIDIA Volta GV100 GPUs will deliver SGEMM (Single precision floating General Matrix Multiply) of 72 GFLOPS/Watt compared to 42 GFLOPs/Watt on Pascal GP100. Using the mentioned ration, a Volta GV100 based GPU with a TDP of 300W can theoretically deliver 9.5 TFLOPs of double precision performance, almost twice that of the current generation GP100 GPU. NVIDIA’s Tesla P100 cards also ship at 300W but the nodes are expected to feature around 40 TFLOPs of compute performance so it is possible that NVIDIA may use TDP configured variants for the Summit supercomputer.

http://wccftech.com/nvidia-volta-gv100-gpu-fast-pascal-gp100/
 
NVIDIA Volta GV100 GPU Chip For Summit Supercomputer Twice as Fast as Pascal P100 – Speculated To Hit 9.5 TFLOPs FP64 Compute


http://wccftech.com/nvidia-volta-gv100-gpu-fast-pascal-gp100/

Or maybe the superbrain of an author should first consider that Volta should come with quite some architectural changes. Even if we would be talking about a Pascal followup his conclusion would be complete nonsense since NV is using dedicated FP64 SPs, for which case it would had been "as much as you can fit in" and not a sterile *2 derived from an ancient projection on single precision throughput increase per Watt.

On another note Xavier seems to have moved from the former 20 DL TOPs @20W recently to 30 DL TOPs @30W, which smells suspiciously like a frequency increase. From whom is NV all of the sudden getting wet feet in the automotive? The blue giants' FPGAs as someone else said?
 
https://translate.google.com/translate?sl=ja&tl=en&js=y&prev=_t&hl=ja&ie=UTF-8&u=http://news.mynavi.jp/articles/2017/01/19/nvidia_volta/&edit-text=

While Xavier 's announcement is the first in Volta, Volta actually comes out as a product through science and technology "GV 100 chips" that are high-end items for calculation will be earlier and GV 100 will be announced at "GTC 2017".
Well if true like I thought but importantly good news for Nvidia, they are doing the same run-launch sequence as they did with Pascal and that makes sense considering they made the unusual choice of manufacturing straight away with the biggest Pascal die possible even though it would have the highest risks/costs involved.
I do stand and think they did that because they needed a technical stepping stone/risk management for their Volta design and obligations.
With regards to the article remember how they announced the Pascal Drive PX2 well before the Pascal P100, and the Drive PX2 did not go into sampling status until very late in the year and well after P100 and even some consumer retail models.

Anyway if Bill Dally is not being misrepresented, looks like we will see the 'V100' sometime summer period in similar accessibility as the P100 was; core supercomputer clients 1st followed by DGX-1 clients followed by certified node clients (basically full stack P100 in an elite providor own 'box') followed lastly by the single card PCIe.
And the numbers are interesing because just one of the supercomputers with the 2017 contract will need over 20,000 'V100' GPUs, by the end of the year they will be producing a fair number (they have several supercomputer contracts on these already agreed) even if they are not seen as individual Tesla GPUs.
Not surprised myself tbh.
I do still wonder if a replacement to the 1080 will also be launched late this year, was always my feeling they would also keep that within the same product cycle as we saw with Pascal and P100 followed by 1080, albeit this time not until very late Q3 at the earliest.
Cheers
 
Last edited:
I'm still skeptical on that, NVIDIAs roadmaps have clearly put Volta as 2018 product, while Pascal was 2016 product
 
Considering Nvidia hat GP100 in their labs around sep/oct 2015 and did not really show working samples at GTC'16, I don't think that they are ready to really show GV100 at GTC'17. Slide-announce, yeah, maybe.

On another note Xavier seems to have moved from the former 20 DL TOPs @20W recently to 30 DL TOPs @30W, which smells suspiciously like a frequency increase.
Impressive, if they can pull a 50% increase in performance and can keep power on the same linear scale as well. Or Xavier1 was massively underspecced from the get go. ;)
 
Back
Top