Nvidia Volta Speculation Thread

I'm still skeptical on that, NVIDIAs roadmaps have clearly put Volta as 2018 product, while Pascal was 2016 product
Volta was always meant to be 2017.
The 2014 'Whitepaper' had it as 2017 (still have a copy), presentations from IBM has it as 2017, HPC presentations from Nvidia has it as 2017, the announcement for several of the 1st Volta supercomputers had it as 2017, one of the project workflows-milestones has 2017 (not saying will complete as these are massive projects with much training/code optimisation/multiple diverse technologies to be implemented/etc), and one of those labs (Oak Ridge National Labs) even in 2016 just after the launch of Pascal in the description section of their Youtube said 2017 for their Volta implementation.
We have also been given the actual revised performance spec now from the initial estimate for both Xavier and also 2 of the supercomputers that Nvidia is contractually obliged to do.
Xavier the 'Tegra' version of Volta will be in limited manufacturing and sampling status very early Q4, Nvidia launch the Tegra model after the dGPU but usually announce it 1st.

The 2018 date came about because of some rumours it would be 10nm, and others looking at the more general product roadmap slide to suggest that was explicitely saying must mean it will launch 2018.

Here is the Oak Ridge National Labs release importantly after Pascal launched.
June 28th 2016:
Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink
Need to go to the Youtube link itself to see the description as they just talk generally about Summit in the vid.
3,400 nodes is over 20,000 'V100' accelerators.
Cheers
 
Last edited:
Considering Nvidia hat GP100 in their labs around sep/oct 2015 and did not really show working samples at GTC'16, I don't think that they are ready to really show GV100 at GTC'17. Slide-announce, yeah, maybe.


Impressive, if they can pull a 50% increase in performance and can keep power on the same linear scale as well. Or Xavier1 was massively underspecced from the get go. ;)
Well if they cannot do it then that means there will be problems providing sampling status Volta Xavier to automobile manufacturers in very late Q3/early Q4 this year :)
The Tegra design comes after the dGPU/Tesla.
But that does not mean it will be available generally, same approach we saw with P100, although we were offered a full DGX-1 in mid-late Q3 2016 with only a 1-week wait for guaranteed delivery and I know of one Nvidia Elite Solutions Providor who had a certified 'node' (their own box with 8xP100) available in under a week by early Q4 2016.
It could slip I agree, but that would be the surprise rather than the launch process of P100 repeated a bit later in 2017 with Volta 'V100' (sometime mid-maybe late summer).
Cheers
 
Last edited:
I'm still trying to figure out how one can get to 20 DL TOPs; going up to 30 is a too advanced lesson for the layman here :p
 
I'm still trying to figure out how one can get to 20 DL TOPs; going up to 30 is a too advanced lesson for the layman here :p
Hehe I must say his claims for Drive PX2 at CES last year were rather misleading (20 DL TOPs back then was a mix of the 2x 'Parker' SoC and dual GPU but was not made clear until full details came out, so was the full PX2 solution that was on offer rather than the smaller modules) and so not putting too much weight on the latest claims of the Xavier performance because they seem too radical even with the double core and the new custom ARM , but the 2 supercomputers with their final spec released by Nvidia/IBM is a different matter as these are contractual obligations and has serious ramifications in many ways not just for those 2 projects but also others and both their reputations in the science-analytics-AI markets.
Cheers
 
Volta was always meant to be 2017.
The 2014 'Whitepaper' had it as 2017 (still have a copy), presentations from IBM has it as 2017, HPC presentations from Nvidia has it as 2017, the announcement for several of the 1st Volta supercomputers had it as 2017, one of the project workflows-milestones has 2017 (not saying will complete as these are massive projects with much training/code optimisation/multiple diverse technologies to be implemented/etc), and one of those labs (Oak Ridge National Labs) even in 2016 just after the launch of Pascal in the description section of their Youtube said 2017 for their Volta implementation.
We have also been given the actual revised performance spec now from the initial estimate for both Xavier and also 2 of the supercomputers that Nvidia is contractually obliged to do.
Xavier the 'Tegra' version of Volta will be in limited manufacturing and sampling status very early Q4, Nvidia launch the Tegra model after the dGPU but usually announce it 1st.

The 2018 date came about because of some rumours it would be 10nm, and others looking at the more general product roadmap slide to suggest that was explicitely saying must mean it will launch 2018.

Here is the Oak Ridge National Labs release importantly after Pascal launched.
June 28th 2016:

Need to go to the Youtube link itself to see the description as they just talk generally about Summit in the vid.
3,400 nodes is over 20,000 'V100' accelerators.
Cheers
Yet on their roadmap they have it on 2018, not 2017.
I'm aware of the supercomputers supposedly being ready towards the end of 2017, but that doesn't necessarily make Volta "2017 product" really
 
The 2018 date came about because of some rumours it would be 10nm, and others looking at the more general product roadmap slide to suggest that was explicitely saying must mean it will launch 2018.
last rumors are custom TSMC 12nm process for Volta with lot of wafers already allocated

Hehe I must say his claims for Drive PX2 at CES last year were rather misleading (20 DL TOPs back then was a mix of the 2x 'Parker' SoC and dual GPU but was not made clear until full details came out, so was the full PX2 solution that was on offer rather than the smaller modules) and so not putting too much weight on the latest claims of the Xavier performance because they seem too radical even with the double core and the new custom ARM , but the 2 supercomputers with their final spec released by Nvidia/IBM is a different matter as these are contractual obligations and has serious ramifications in many ways not just for those 2 projects but also others and both their reputations in the science-analytics-AI markets.
Cheers
most of Xavier TOPS come from CVA (Compute Vision Accelerator), not Volta
 
Yet on their roadmap they have it on 2018, not 2017.
I'm aware of the supercomputers supposedly being ready towards the end of 2017, but that doesn't necessarily make Volta "2017 product" really
It is a general product roadmap-cycle not an explicit timeline of launch - this is what catches people out with AMD launches, read the actual HPC presentations that has specific dates regarding Volta including from IBM.
As I mention every one of them has 2017.
Cheers
 
All nV gens have been launched in their one and half year launch cycles which would be if as always end up at end of 2017 or early (1st Q) 2018. So to expect anything else would mean delayed or something went unexpected.
 
last rumors are custom TSMC 12nm process for Volta with lot of wafers already allocated
most of Xavier TOPS come from CVA (Compute Vision Accelerator), not Volta
Yeah that could be possible as it is not much of a node shrink risk, was meant to be part of the 16nm group options originally by TSMC?
Xavier, after Drive PX2 going to wait and see this time how it all pans out in terms of what the complete architecture-solution is required for the 30DL TOPS.
Thanks
 
Yet on their roadmap they have it on 2018, not 2017.
I'm aware of the supercomputers supposedly being ready towards the end of 2017, but that doesn't necessarily make Volta "2017 product" really
Just to add the supercomputers will probably not go live until early 2018, the amount of work on these projects is massive, the project workflow-update I saw quite awhile ago covers 12 months of activity involving all parties.
For just 3 supercomputers (3 project-linked labs stated for 2017 but as I mention not necessarily going live until early 2018) requires I think just over 40,000 'V100' dGPUS, along with the diverse range of advanced tech that also goes beyond Volta and critically the training-code transition some of which will require large scale nodes.
But to be clear, IMO I see November-December being the point when GV100 is in ramped up manufacturing to meet all core client demands (large scale clients projects directly involving Nvidia with IBM/Cray/etc), but will be supplied like we saw with the NVLink P100 in a similar way but from mid-late summer.
But IMO this will be a 2017 product same way P100 was a Q2-Q3 2016 product even though you could not purchase an individual PCIe P100 until this year (still not sure you can even yet).
Some sites was using the fact you cannot buy an individual PCIe P100 as evidence Nvidia is having issues manufacturing the P100 or it should be classified as not manufacturing status, but as I mentioned I have not seen evidence of that even back in Q3 if looking to buy a full node.
What is less clear is whether Nvidia will stay with the normal linked schedule for 1-2 consumer GPUs or hold them back until early next year from a business-sales product strategy perspective, but even if they maintain some trend of Pascal cycle it would not be until mid-late Q4 earliest.
I doubt Nvidia has decided themselves just yet but if they did release say a 1080 replacement, they could hold back on lower GPUs for 4 months like they did with Maxwell.
The 980 launched 19th September 2014 while the 960 launched 22nd January 2015, with Pascal the gap between 1080 and 1060 was reduced to 7 weeks.
Against this decision though would be a Volta 'V1080' competing with the 1080ti, but then it would be price-margin competitive (both from manufacturing and retail) against Vega.

Cheers
 
Last edited:
Yeah that could be possible as it is not much of a node shrink risk, was meant to be part of the 16nm group options originally by TSMC?
Xavier, after Drive PX2 going to wait and see this time how it all pans out in terms of what the complete architecture-solution is required for the 30DL TOPS.
Thanks

If you want to compare funky TOP numbers, Mobileye claims from its 2nd generation PMAs (EyeQ5) 15 TOPs out of a 5W SoC.
 
Yeah that could be possible as it is not much of a node shrink risk, was meant to be part of the 16nm group options originally by TSMC?
Xavier, after Drive PX2 going to wait and see this time how it all pans out in terms of what the complete architecture-solution is required for the 30DL TOPS.
Thanks
From what I heard, 12nm is an improved 16nm for high performance GPUs (NVDA didnt want to use 10nm as its purely a SoC node for Apple/QC, like was 20nm. And 7nm is too far away)
Xavier CVA may be the commercial version of MIT Eyeriss projet:
http://people.csail.mit.edu/emer/slides/2016.02.isscc.eyeriss.slides.pdf
 
If you want to compare funky TOP numbers, Mobileye claims from its 2nd generation PMAs (EyeQ5) 15 TOPs out of a 5W SoC.

I expected more from Mobileye. Q5 is on 7nm process. Put Xavier on that too and you'll get 30 TOPs in 15W as it's two node jumps. So in theory roughly 7,5 W Nvidia Chips vs 5W Mobileye at 15 TOPs if it would be at the same node. Of course Q5 will be more size efficient, but Xavier will have a lot of other stuff and will be more general purpose. A specialized Chip should be able to do more, like Google Deep Learning Chip which had ~10 times efficiency over GPUs.

From what I heard, 12nm is an improved 16nm for high performance GPUs (NVDA didnt want to use 10nm as its purely a SoC node for Apple/QC, like was 20nm. And 7nm is too far away)
Xavier CVA may be the commercial version of MIT Eyeriss projet:
http://people.csail.mit.edu/emer/slides/2016.02.isscc.eyeriss.slides.pdf

Where are the 12nm rumours coming from? I actually believe this will be another mobile optimized node, like 28nm HPM. Gpus only launched on the major High Performance node and so i think it'll be 16FF+ again.
 
I expected more from Mobileye. Q5 is on 7nm process. Put Xavier on that too and you'll get 30 TOPs in 15W as it's two node jumps. So in theory roughly 7,5 W Nvidia Chips vs 5W Mobileye at 15 TOPs if it would be at the same node. Of course Q5 will be more size efficient, but Xavier will have a lot of other stuff and will be more general purpose. A specialized Chip should be able to do more, like Google Deep Learning Chip which had ~10 times efficiency over GPUs.

Not that it really matters, but I wouldn't be so fast to count 10FF as a node in the reasoning you do. Especially since from all signs both GPU IHVs will skip 10FF for a very good reason. Currently the EyeQ4 is with 2 PMAs at 2.5 TOPs under 28nm FD-SOI at 3W and will launch in cars by the end of this year while Xavier will be only sampling at that timeframe. EyeQ5 is slated for 2020, so it's more a minus/plus speculation when each solution will end up in cars then what each process can or cannot deliver in theory.

I posted it only because obviously NV increased from 20 TOPs@20W to 30 TOPs@30W which most likely comes mostly from CVA additions since power doesn't scale linearly with frequency increases. IHVs don't necessarily pump up specifications without a reason. I didn't know myself but BMW has struck a deal both with Mobileye & Intel for fully autonomous driving for which Intel sounds quite determined lately to strike ground in the automotive market also: http://s2.q4cdn.com/670976801/files/doc_news/2017-BMW-Intel-Mobileye-release_CES.pdf
 
From what I heard, 12nm is an improved 16nm for high performance GPUs (NVDA didnt want to use 10nm as its purely a SoC node for Apple/QC, like was 20nm. And 7nm is too far away)
Xavier CVA may be the commercial version of MIT Eyeriss projet:
http://people.csail.mit.edu/emer/slides/2016.02.isscc.eyeriss.slides.pdf
Yeah that is what I heard as well, although seemed the split decision came from TSMC regarding 12nm and 16nm.
Thanks for the link really interesting.
You sure that has anything to do with Xavier?
Looks to be a very specific CNN and chip design at MIT with Nvidia partial involvement in some way as they do a lot with MIT.
Sort of reminds me when Krashinsky was researching Temporal SIMT while at MIT and then went on to work for Nvidia - also some of his work was presented along with collaboration from Nvidia, or Jan Lucas Temporal SIMD he built on Nvidia hardware to prove his model, although this MIT tech is seriously much further along than either of these.
Thanks.
Edit:
Reading more about Eyeriss, it is funded to some extent by DARPA, makes sense.
 
Last edited:
Volta was always meant to be 2017.
The 2014 'Whitepaper' had it as 2017 (still have a copy), presentations from IBM has it as 2017, HPC presentations from Nvidia has it as 2017, the announcement for several of the 1st Volta supercomputers had it as 2017, one of the project workflows-milestones has 2017 (not saying will complete as these are massive projects with much training/code optimisation/multiple diverse technologies to be implemented/etc), and one of those labs (Oak Ridge National Labs) even in 2016 just after the launch of Pascal in the description section of their Youtube said 2017 for their Volta implementation.
We have also been given the actual revised performance spec now from the initial estimate for both Xavier and also 2 of the supercomputers that Nvidia is contractually obliged to do.
Xavier the 'Tegra' version of Volta will be in limited manufacturing and sampling status very early Q4, Nvidia launch the Tegra model after the dGPU but usually announce it 1st.

The 2018 date came about because of some rumours it would be 10nm, and others looking at the more general product roadmap slide to suggest that was explicitely saying must mean it will launch 2018.

Here is the Oak Ridge National Labs release importantly after Pascal launched.
June 28th 2016:

Need to go to the Youtube link itself to see the description as they just talk generally about Summit in the vid.
3,400 nodes is over 20,000 'V100' accelerators.
Cheers

So much blah, blah, about the need for another 10x in compute power.
Nobody talks about efficient algorithms, these can often solve problems much faster on slower hardware than straightforward algorithms on fast hardware.
 
So much blah, blah, about the need for another 10x in compute power.
Nobody talks about efficient algorithms, these can often solve problems much faster on slower hardware than straightforward algorithms on fast hardware.
It's a little bit ridiculous that assume that those who rent (or are given) precious super computer time, and often have to wait in line for quite a while to get it, would waste that resource by throwing code at it that only runs at a fraction of its potential.

My experience with people who use this kind of thing is that they spend a lot of time optimizing.

But even if the code wastes 80% of the cycles, the new computer would still run 2x faster than the previous theoretically could.
 
Back
Top