Nvidia Volta Speculation Thread

Yeah P100 is influenced by being multiple models such as 16GB DGX form and 12GB PCIe, the 12GB in tests can be quite a fair amount behind the 16GB DGX-1 even with a single GPU.
Amber is a good example showing those performance delta differences.
 
Speaking about Benchmarking V100 - someone has done it on LuxMark



Interesting. It's doing very good in almost all combinations, but it seems to have some multi-GPU scaling issues in the heavy scene? There are not many results, and most are unfortunately not really comparable, but I have the feeling it looks like the heavier the workload gets on multiple GPUs, the lower the advantage gets over other devices.
 
Does anyone know at which power consumption that score was achieved?

I mean, I find it hard to believe that almost a factor 2x boost comes without increased power consumption, compared to a P100.
 
What you mean? The top 2 results look fishy - "gfx900" ??? - and the the array of 4x V100 was only beaten by a setup with 7 Pascal + 2 Maxwell cards.

That's the question. The same user has LuxBall HDR (simple scene) results at place 9 and 10 with 4x Vega, less than half of Anaconda's 4xV100. On Hotel (complex scene) it's 1 and 2 for 4x Vega, and that's 1.5x more than Anaconda's results. So I guess something is wrong with the V100 results, the question is: what?

Ahh, the two very good results seem to be using ROCm 1.6.4, the other are under Windows.
 
Last edited by a moderator:
NVIDIA DGX Station Upgraded to Tesla V100
October 28, 2017
At launch, it was powered by four NVIDIA Tesla P100 GPUs but it is now powered by four NVIDIA Tesla V100 GPUs. Here are the key specs of the updated machine.
NVIDIA-DGX-Station-Hardware.jpg


Sporting the new NVIDIA Tesla V100 with Tensor Core technology for $69,000 you can own a dedicated system for about the same price as 1-year of cloud instance pricing. For example, an AWS p3.8xlarge instance with all up-front pricing is $68301 for 1-year.
https://www.servethehome.com/nvidia-dgx-station-upgraded-tesla-v100/
 
Last edited by a moderator:
It's an 815mm GPU, 33% larger than anything I've heard of being mass produced
They kindly asked TSMC to push the reticle limit forward.

Thats nothing groundbreaking or anything. The node is very mature so the yields must be in >1.7% range.
 
Is ahead of the two year cadence that Nvidia updates the architecture.
They need to do something before various ML/DL ASICs flood the market. V100 kinda works. Not the most cost effective solution, but it'll do.
$NVDA is one hell of a bubble fueled by meme learning and they will do anything to not let it pop.
 
They need to do something before various ML/DL ASICs flood the market. V100 kinda works. Not the most cost effective solution, but it'll do.
$NVDA is one hell of a bubble fueled by meme learning and they will do anything to not let it pop.
If you include other critical factors such as HPC/AI large scale power requirements/power budget then it is a pretty important product for many out there or look at those that do DP scientific modelling where no other product touches this.
The closest general AI product to compare would be Google TPU2, and that does not necessarily match performance/efficiency of the V100 that combines a mixed-precision product with AI capability - been covered in the past by a few others here.
There is also the half height V100 which is more stripped down.
Also one pretty important consideration will be V102, just look at how the GP102 outperformed the GP100 in more common situations.
The one product that could cause a ripple for Nvidia is from Intel (beyond Xeon Phi), but depends how long it takes to really get everything aligned (HW arch more broadly and software) with their Nervana techi.
 
Last edited:
That's the question. The same user has LuxBall HDR (simple scene) results at place 9 and 10 with 4x Vega, less than half of Anaconda's 4xV100. On Hotel (complex scene) it's 1 and 2 for 4x Vega, and that's 1.5x more than Anaconda's results. So I guess something is wrong with the V100 results, the question is: what?

Ahh, the two very good results seem to be using ROCm 1.6.4, the other are under Windows.
The V100 results show the expected perfect scaling, with the 4x number being exactly 4x the performance of the 1x.

The V100 single GPU score for Hotel is 12770. Which is “only” about 20% or so faster than a GP100 after correcting for cores and BW. A very good result, but not unexpected or terribly surprising if you look at the general architectural improvements.

The best single GPU score for gfx900 is 6526. It uses OpenCL 2.0 AMD-APP (2482.4). That number more or less matches a 1080Ti, which makes sense given that they have roughly the same amount of TFLOPS and BW.

The 4x score for gfx900 is 77530, or a whopping 19382 per unit, using OpenCL 2.0 AMD-APP (2508.0).

So one way or the other, AMD manages to triple the performance of its OpenCL drivers, way beyond what could reasonably be expected from the amount of TFLOPS and BW.

Or the alternative: those results are completely bogus. :)
 
Back
Top