Nvidia Volta Speculation Thread

Nvidia GPU comparison - Kepler, Maxwell, Pascal, Volta


nvidia-tesla-pascal-volta-comparison-big-table.jpg


https://www.nextplatform.com/2018/02/28/engine-hpc-machine-learning/
 
Nvidia Announces New Volta AI Performance Milestones
Nvidia is promoting their own high-end performance in major AI and machine learning benchmarks, as apparently some kind of floodgate has popped open on companies talking about performance metrics for their own hardware solution.
  • A single V100 Tensor Core GPU achieves 1,075 images/second when training ResNet-50, a 4x performance increase compared with the previous generation Pascal GPU;
  • A single DGX-1 server powered by eight Tensor Core V100s achieves 7,850 images/second, almost 2x the 4,200 images/second from a year ago on the same system;
  • A single AWS P3 cloud instance powered by eight Tensor Core V100s can train ResNet-50 in less than three hours, 3x faster than a TPU instance.
....
Nvidia is also talking up the use of Volta as a potential replacement for ASICs that would otherwise provide superior functionality in a limited set of use-cases or scenarios. It’s not clear — and I genuinely mean that — how such claims should be interpreted.
....
The recent reports on Google’s cloud TPU being more efficient than Volta, for example, were derived from the ResNet-50 tests. The results Nvidia is referring to use the CIFAR-10 data set. The Dawnbench team records no results for TPUs in this test, and fast.ai’s blog post on the topic may explain why this is:

"Google’s TPU instances (now in beta) may also a good approach, as the results of this competition show, but be aware that the only way to use TPUs is if you accept lock-in to all of:

Google’s hardware (TPU)
Google’s software (Tensorflow)
Google’s cloud platform (GCP).
More problematically, there is no ability to code directly for the TPU, which severely limits algorithmic creativity (which as we have seen, is the most important part of performance). Given the limited neural network and algorithm support on TPU (e.g. no support for recurrent neural nets, which are vital for many applications, including Google’s own language translation systems), this limits both what problems you can solve, and how you can solve them."

As hardware and software continue to evolve, we’ll see how these restrictions and capabilities evolve along with them. It’s absolutely clear that Volta is a heavy-hitter in the AI/ML market as a whole, with excellent performance and the flexibility to handle many different kinds of tasks. How this will change as more custom hardware comes online and next-generation solutions debut is still unclear.
https://www.extremetech.com/extreme/268952-nvidia-announces-new-volta-ai-performance-milestones
 
Last edited:
That article doesn't seem that clear, the 8x number for TPU v3 seems to be for pods which contain 4x more TPUs than v2. That coupled with the increased power consumption of the v3...
 
How does it compare to the new Google TPU3 ?
As @entity279 mentioned too much is unknown at this time and would welcome benchmarks, cost and info regarding efficiency when they come out. At this point I'd venture to guess is it might be less attractive due to Google's locked-in approach offering users no flexibility outside the following:
Google’s hardware (TPU)
Google’s software (Tensorflow)
Google’s cloud platform (GCP).
More problematically, there is no ability to code directly for the TPU, which severely limits algorithmic creativity (which as we have seen, is the most important part of performance). Given the limited neural network and algorithm support on TPU (e.g. no support for recurrent neural nets, which are vital for many applications, including Google’s own language translation systems), this limits both what problems you can solve, and how you can solve them.
 
As @entity279 mentioned too much is unknown at this time and would welcome benchmarks, cost and info regarding efficiency when they come out. At this point I'd venture to guess is it might be less attractive due to Google's locked-in approach offering users no flexibility outside the following:

What is then the difference of Nvidia's lock in approach ? TensorRT, ...
 
What is then the difference of Nvidia's lock in approach ? TensorRT, ...
There is none.
Edit: To clarify I don't think users are compelled to use TensorRT if they don't want to. There is no additional cost and it adds performance.
 
Last edited:
What is then the difference of Nvidia's lock in approach ? TensorRT, ...
Not sure I follow; Nvidia offer support for multiple frameworks, albeit this is CUDA and so lock-in that way (but does give scientists/devs greater options than what is on Google TPU cloud although their TensorFlow framework must be one of the biggest used now).
 
Last edited:
Last edited:
Are the frameworks accessible w/o using CUDA?

Edit: The answer is yes. In addition to providing support for other frameworks (MXNet, Caffe2, Tensorflow, PyTorch, Cognitive Toolkit, Theano, etc... Nvidia offers their own frameworks optimized for Maxwell, Pascal and Volta like CUDA and deep learning libraries like cuDNN, NCCL, and TensorRT.
https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks-contributions/

Yes, currently it's still open, let's hope it stays this way, and there comes no GPU/DLA more specialized in inferencing working only with TensorRT
 
Tearing Apart Google’s TPU 3.0 AI Coprocessor
May 10, 2018

We also believe that Google has funded a huge software engineering and optimization effort to get to its current beta Cloud TPU deployment. That gives Google incentive to retain as much of TPUv2’s system interfaces and behavior – hardware abstraction layer and application programming interfaces (APIs) – as possible with the TPUv3. Google offered no information on when TPUv3 will be offered as a service, in Cloud TPUs or in a multi-rack pod configurations. It did show a photo of a TPUv3-based Cloud TPU board and pod photos. The company made the following assertions:


    • The TPUv3 chip runs so hot that for first time Google has introduced liquid cooling in its datacenters
    • Each TPUv3 pod will be eight times more powerful than a TPUv2 pod
    • Each TPUv3 pod will perform at “well over a hundred petaflops”
However, Google also restated that its TPUv2 pod clocks in at 11.5 petaflops. An 8X improvement should land a TPUv3 pod at a baseline of 92.2 petaflops, but 100 petaflops is almost 9X. We can’t believe Google’s marketing folks didn’t round up, so something is not quite right with the math. This might be a good place to insert a joke about floating point bugs, but we’ll move on.

image001-1.jpg

Pods: TPUv2 (top) and TPUv3 (bottom)
https://www.nextplatform.com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/
 
Nvidia GTX 1180 or GTX 2080 - what to expect from team GeForce's next-gen graphics cards
May 18, 2018

If they stick with tradition, and use the existing Nvidia Volta GPU technology, then the GTX 1180 will arrive sporting a GV104 GPU, but just what sort of configuration that chip might have is still up for debate.

The Streaming Multiprocessor (SM) of the current Volta chip is chock full of silicon designed for machine learning and inference, and how much of that will make the transition over to the gaming GPU we don’t yet know.

With the Pascal generation, Nvidia stripped out the double precision cores for the GP104 silicon, and they may do the same with Volta. Historically they would then push the SMs together - with the GP100, for example, there were 10 SMs in a general processing cluster (GPC) and then just five in a GP104 GPC, despite having the same number of CUDA cores in each cluster. Each SM then has double the cores sharing the same instruction cache and shared memory.

I’m not sure that will work out the same for a gaming Volta SM, as there is still some silicon inside the current Volta design which will come in useful in games which take advantage of the new DirectX Raytracing from Microsoft and the Volta-specific RTXtech from Nvidia themselves. That’s not likely to be stripped out, so the final gaming SM structure might be very similar to the current GV100 design.

That’s not just limited to the new Tensor cores, but that new silicon definitely helps in cleaning up a raytraced image. And that means, despite what we initially expected, gaming Volta cards could still come with Tensor cores in the package. With WinMLalso looking to bring machine learning into the gaming space we’re likely to see more pro-level silicon remaining in our gaming GPUs in the future.

But we think it’s probably quite likely Nvidia would stick with the same overall GPC structure, and switch to four GPCs for a potential GV104 design. That would give the GTX 1180 a total of 3,584 CUDA cores and 224 texture units, which would give a nice symmetry with the GTX 1080 Ti it would likely replace.
https://www.pcgamesn.com/nvidia-gtx-1180-release-date-specifications
 
Isn't this more or less repeating what Tom's said earlier without quoting original source (aka Tom's)?
They add their own speculation regarding GPU components that may be included (tensor cores), and mention a later update from Expreview that conflicts Tom's release speculation.
 
NVIDIA GeForce "Volta" Graphics Cards to Feature GDDR6 Memory According to SK Hynix Deal
May 23rd 2018

NVIDIA's upcoming GeForce GTX graphics cards based on the "Volta" architecture, could feature GDDR6 memory, according to a supply deal SK Hynix struck with NVIDIA, resulting in the Korean memory manufacturer's stock price surging by 6 percent. It's not known if GDDR6 will be deployed on all SKUs, or if like GDDR5X, it will be exclusive to a handful high-end SKUs. The latest version of SK Hynix memory catalogue points to an 8 Gb (1 GB) GDDR6 memory chip supporting speeds of up to 14 Gbps at 1.35V, and up to 12 Gbps at 1.25V.
https://www.techpowerup.com/244477/...ature-gddr6-memory-according-to-sk-hynix-deal
 
Last edited:
Since there's nothing productive to talk about at the moment.

NVidia Volta GTX 1180. Not GeForce, Volta! :D

Also, it's only going to come with 1 or 2 GB or memory? ;) "...we could see one or two of these..."

If anyone take anything I just said seriously, imma smack you.

Regards,
SB
 
Latest TitanV review comes from ComputerBase, with latest drivers, and an i9-7890X CPU, not the best CPU for gaming, but what the heck.

Avg performance over 25 titles @4K is 30% above 1080TI FE, with notable oddities such as Prey and Kingdom Come (CryEngine games) exhibiting 40% uplift over 1080Ti. Wolfenstien 2 exhibited almost no uplift whatsoever though.

https://www.computerbase.de/2018-05...t/2/#diagramm-performancerating-fps-3840-2160
 
Back
Top