Nvidia Volta Speculation Thread

For relax a bit the atmosphere,if the conceptt is different, i have find funny some similarity between both design ... even more when i remember the exascale system presented by AMD some times ago ( who will more related to MCM ). OFC AMD was not directly propose a MCM chips, but i retain that MCM can allow thi on paper )
There's only so many choices for an MCM. In this case, there's 4 chips on a substrate and only so many things that can be done to hook them up.
In this case, AMD's EPYC slide is actually misleading. Nvidia's concept has a mesh or ring on-package, whereas Naples is described as being fully-connected within a socket.
Nvidia's package signalling appears to be 4x as power-efficient as AMD's and with much higher bandwidth.

The AMD HPC concept differs in that it relies heavily on active interposers and includes CPU chiplets. It's not clear if how heavily Nvidia relies on an interposer, whether silicon or organic. It does not appear to be active, at any rate.
The less aggressive integration and more defined power and process numbers may also point at Nvidia's scheme being nearer-term, as in the next generation or the one after.

AMD's timeline and architectural basis is unclear, and might be post-Navi--which itself comes after a Vega refresh.
 
GPU Base, Boost, Typical and Peak clocks, what’s the difference?

For Pascal architecture boost clock is an underestimated frequency because chances are, you will never see such clock in games. The GPU Boost 3.0 introduced a new term called ‘theoretical max clock’ (you can find it on the slides). This clock is not part of the official specs, it exists because the actual clock speed will almost always be higher than official boost clock for Pascal cards. It’s also higher compared to cards based Maxwell architecture, which makes things slightly more confusing.


gpu-base-boost-clocksbfla6.png
 
I'm really not sure where to post this but since the direct competition to Intel Knights Mill is the Nvidia Volta I will try here.

I came across this on the HPC Wire site:

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “pre-exascale” award)

https://www.hpcwire.com/2017/06/22/knights-mill-gets-deep-learning-flops

This is the first I have heard about the Aurora contract possibly being canceled. Searching about the Intel Aurora contract via Google does not show any of the above "delayed, rewritten or potentially canceled".

In the linked article it is interesting that Intel has chosen to split their HPC part:

Customers will have a choice to make based on their precision requirements.

“Knights Mill uses different instruction sets to improve lower-precision performance at the expense of the double-precision performance that is important for many traditional HPC workloads,” Davis continues addressing the differentiation. “This means Knights Mill is targeted at deep learning workloads, while Knights Landing is more suitable for HPC workloads and other workloads that require higher precision.”

Here we see Intel differentiating its products for HPC versus AI, and the Nervana-based Lake Crest neural net processor also follows that strategy. Compare this with Nvidia’s Volta: despite being highly deep learning-optimized with new Tensor cores, the Tesla V100 is also a double-precision monster offering 7.5 FP64 teraflops.

Nvidia’s strategy is one GPU to rule them all, something VP of accelerated computing Ian Buck was clear about when we spoke this week.

“Our goal is to build one GPU for HPC, AI and graphics,” he shared. “That’s what’s we achieved in Volta. In the past we did different products for different segments, FP32-bit optimized products like P40, double-precision with the P100. In Volta, we were able to combine all that, so we have one processor that’s leading performance for double-precision, single-precision and AI, all in one. For folks who are in general HPC they not only get leading HPC double-precision performance, but they also get the benefits of AI in the same processor.”
 
This is the first I have heard about the Aurora contract possibly being canceled. Searching about the Intel Aurora contract via Google does not show any of the above "delayed, rewritten or potentially canceled".

Yes, it's probably postponed, first time i read about it was in may or something like that:

The Aurora system contract is being reviewed for potential changes that would result in a subsequent system in a different timeframe from the original system. But since these are early negotiations we can’t be more specific.”
https://www.nextplatform.com/2017/06/15/american-hpc-vendors-get-government-boost-exascale-rd/
https://www.nextplatform.com/2017/06/15/american-hpc-vendors-get-government-boost-exascale-rd/
Aurora should've come with Knights Hill, but the problem is Knight's Hill might be too weak in DL, whereas it seems Knight's Mill can't do Double Precision so good. But the DoE probably wants a system which is good in both, so either it will become a Knight's Hill, Lake Crest Hybrid system or they wait till the integration of Lake Crest into Knight's Crest.
 
GPU Base, Boost, Typical and Peak clocks, what’s the difference?

For Pascal architecture boost clock is an underestimated frequency because chances are, you will never see such clock in games. The GPU Boost 3.0 introduced a new term called ‘theoretical max clock’ (you can find it on the slides). This clock is not part of the official specs, it exists because the actual clock speed will almost always be higher than official boost clock for Pascal cards. It’s also higher compared to cards based Maxwell architecture, which makes things slightly more confusing.


gpu-base-boost-clocksbfla6.png
But I got this one card here that has low clockspeeds in one game. :D

I had a GTX670 and now a 970 and they both ran/run well above their rated boost clocks in every case I tested. You know what this means? Absolutely nothing! It only matters if you look at a large enough sample size, and if we do that we find that NV cards almost always match or exceed their official boost frequencies in actual games - not to mention their base clocks. It has been this way since NVIDIA implemented Boost clocks as far as I can tell.

I even think it would be fair for NV to market the boost clock as the cards' typical minimum clockspeed with a note saying "it's possible for the card to go under this frequency in certain rare conditions". Kinda like AMD does, except it wouldn't be bullshit.
 
NVIDIA Volta V100 GPU Based Tesla V100 PCI Express Specifications
It has a total of 84 Volta streaming multiprocessor units, 42 TPCs (each including two SMs).
The 84 SMs come with 64 CUDA cores per SM so we are looking at a total of 5376 CUDA cores on the complete die. All of the 5376 CUDA Cores can be used for FP32 and INT32 programming instructions while there are also a total of 2688 FP64 (Double Precision) cores. Aside from these, we are looking at 672 Tensor processors, 336 Texture Units. The core clocks are maintained at a boost clock of around 1370 MHz which delivers 28 TFLOPs of FP16, 14 TFLOPs of FP32 and 7.0 TFs of FP64 compute performance.

The chip also delivers 112 DLOPs (Deep Learning Teraflops) which is the fastest any chip has delivered to date. This is achieved by the separate tensor cores that are dedicated to deep learning tasks. So while the clocks and compute performance is slightly lower than the SXM2 variant, it does feature a TDP of just 250W. Compared to 300W on the SXM2 card, this is an incredible feat that delivers increased efficiency.
....
The other differences is that the Tesla V100 PCI Express doesn’t get NVLINK support like the SXM2 based variant. It comes with a passive dual slot cooler in the gold and black color scheme that was seen earlier. Compared to the competition, NVIDIA is offering much higher compute performance at lower wattage and much higher efficiency.
http://wccftech.com/nvidia-volta-tesla-v100-ai-research/
 
NVIDIA Volta V100 GPU Based Tesla V100 PCI Express Specifications

http://wccftech.com/nvidia-volta-tesla-v100-ai-research/
Erm, everything was already known in June when NVIDIA announced the PCIe version, and for whatever reason they really screwed up with that text there - they talk about full GV100 specs and Tesla V100 PCIe clocks and FLOPS. Tesla V100, both SXM2- and PCIe-versions, use GV100 with 80 SMs enabled, not all 84. At least their table got it right, though.
 
From yesterday's conference call transcript:
Next, data center, revenue of $416 million was up more than 2.5 times from a year ago. This growth, shared across AI, deep learning, high-performance computing, and GRID, is particularly notable, given that we announced and shipped production units of our Volta-based V100 accelerator as we transition from Pascal generation GPUs. Looking ahead, we see inferencing and video transcoding as emerging applications that are well suited for our GPUs. V100 was among the most important launches at this quarter's GPU Technology Conference [GTC]. It provides 10 times the deep learning power of its year-old predecessor, widely outpacing Moore's Law.

We also entered into a wide range of important partnerships based on AI. Among them, Baidu has aligned with us on Volta. It's bringing this new architecture to its cloud and optimizing the Paddle Paddle open source deep learning framework for Volta. Volkswagen is collaborating with us to bring the power of AI across their organization. And we announced a new partner program with Taiwan's top ODMs, including Foxconn, Inventec, Quanta, Wistron, to provide them with the early access to the HGX reference architecture, GPU computing technologies, and design guidelines.

Demand remains strong for our DGX AI supercomputer, as organizations take on multiple systems to build out AI-enabled applications. Facebook disclosed a system incorporating 128 DGXs. We have shipped systems to more than 300 unique customers, with 1,000-plus in the pipeline.

finally JHH about Volta for gaming:
Volta for gaming, we haven't announced anything. And all I can say is that our pipeline is filled with some exciting new toys for the gamers, and we have some really exciting new technology to offer them in the pipeline. But for the holiday season for the foreseeable future, I think Pascal is just unbeatable. It's just the best thing out there. And everybody who's looking forward to playing Call of Duty or Destiny 2, if they don't already have one, should run out and get themselves a Pascal.

I know as a fact that Geforce Volta samples are already working at Nvidia for few weeks. So it looks like that they want to maximize Pascal profit (because they feel no thread from Vega)
 
Sounds like Volta is not on the cheap side to produce yet:
„And so the price of Volta is driven by the fact that, of course, the manufacturing cost is quite extraordinary. These are expensive things to go and design. The manufacturing cost itself, you guys can estimate it, is probably in the several hundred dollars to close to $1,000. “
(earnings call 02/18, Q&A)

And sadly, CEO-math at work again. :(
„The answer to your first question is yes. Volta was a giant leap. It's got 120 teraflops. Another way to think about that is eight of them in one node is essentially one petaflops, which puts it among the top 20 fastest supercomputers on the planet. And the entire world's top 500 supercomputers are only 700 petaflops. “
Not only does he confuse FP16/FP32 and FP64 like AMD did, he throws in Tensor-OPs that are everything but general FLOPS into the mix.
 
Although I would love have a Volta card - of course NVIDIA feels no pressure. So I guess we will have to wait till Q2/18. Sadly.

But at the same time I think the new architecture will be mind-blowing. And I like having a matured product.
 
Since Tensor won't be in a gaming card, what are the differences between Pascal and Volta except core count and clock speed ?
 
From his "unbeatable Pascal" remark ( almost an understatement, I have to admit ) I also read that Volta doesn't convincingly beat it, even in a GeForce incarnation.

(I do realize of course that it is in JHS's interest to suggest that, as it may convince people not to wait and buy now, even more so with current prices)
 
Since Tensor won't be in a gaming card, what are the differences between Pascal and Volta except core count and clock speed ?
NVIDIA is claming a 50% Effiency Advantage in FP32 performance for Volta over Pascal. Gaming-Volta will directly benefit from this.

And why shouldn't be there more changes for gaming cards in volta?

For NVIDIA we have to differenciate between "HPC" and "Gaming" chips since Pascal. We have a general architecture ("Pascal" or "Volta") which are used in HPC chips ("GP100" or "GV100"), but also in gaming Chips ("GP102" etc. or "GV102" etc.). And this general architecture is modifed for the current use in this chip.
 
Perhaps they're also waiting for GDDR6 availability which is supposed to ship in volume early next year. Even a GV104 chip maybe bandwidth starved with GDDR5x if it only has a 256-bit bus like GP104.
 
Back
Top