Nvidia Volta Speculation Thread

CSI PC · Oct 28, 2017

Yeah P100 is influenced by being multiple models such as 16GB DGX form and 12GB PCIe, the 12GB in tests can be quite a fair amount behind the 16GB DGX-1 even with a single GPU.
Amber is a good example showing those performance delta differences.

BoMbY · Oct 29, 2017

Dayman1225 said:
Speaking about Benchmarking V100 - someone has done it on LuxMark

https://www.reddit.com/r/path%3D%252Fr%252Fnvidia%252Fcomments%252F7983zx%252F

Interesting. It's doing very good in almost all combinations, but it seems to have some multi-GPU scaling issues in the heavy scene? There are not many results, and most are unfortunately not really comparable, but I have the feeling it looks like the heavier the workload gets on multiple GPUs, the lower the advantage gets over other devices.

Deleted member 2197 · Oct 29, 2017

Top LuxMark (LuxBall HDR) Scores.
http://www.luxmark.info/top_results/LuxBall HDR/OpenCL/GPU/1

Top Hotel results.
http://www.luxmark.info/top_results/Hotel/OpenCL/GPU/1

Ext3h · Oct 29, 2017

Does anyone know at which power consumption that score was achieved?

I mean, I find it hard to believe that almost a factor 2x boost comes without increased power consumption, compared to a P100.

Ext3h · Oct 29, 2017

BoMbY said:
but it seems to have some multi-GPU scaling issues in the heavy scene?

What you mean? The top 2 results look fishy - "gfx900" ??? - and the the array of 4x V100 was only beaten by a setup with 7 Pascal + 2 Maxwell cards.

Bondrewd · Oct 29, 2017

Ext3h said:
"gfx900"

That's Vega.

Dayman1225 · Oct 29, 2017

https://www.reddit.com/r/path%3D%252Fr%252Fnvidia%252Fcomments%252F7983zx%252Ftesla_v100_luxmark_results%252Fdp0rgqb

According to the man who ran them it was 190w

BoMbY · Oct 30, 2017

Ext3h said:
What you mean? The top 2 results look fishy - "gfx900" ??? - and the the array of 4x V100 was only beaten by a setup with 7 Pascal + 2 Maxwell cards.

That's the question. The same user has LuxBall HDR (simple scene) results at place 9 and 10 with 4x Vega, less than half of Anaconda's 4xV100. On Hotel (complex scene) it's 1 and 2 for 4x Vega, and that's 1.5x more than Anaconda's results. So I guess something is wrong with the V100 results, the question is: what?

Ahh, the two very good results seem to be using ROCm 1.6.4, the other are under Windows.

Deleted member 2197 · Oct 30, 2017

NVIDIA DGX Station Upgraded to Tesla V100
October 28, 2017

At launch, it was powered by four NVIDIA Tesla P100 GPUs but it is now powered by four NVIDIA Tesla V100 GPUs. Here are the key specs of the updated machine.

Sporting the new NVIDIA Tesla V100 with Tensor Core technology for $69,000 you can own a dedicated system for about the same price as 1-year of cloud instance pricing. For example, an AWS p3.8xlarge instance with all up-front pricing is $68301 for 1-year.

https://www.servethehome.com/nvidia-dgx-station-upgraded-tesla-v100/

Deleted member 87499 · Oct 30, 2017

V100 being ready this early is such a great achievement.

Bondrewd · Oct 30, 2017

el etro said:
V100 being ready this early is such a great achievement.

Why?

ieldra · Oct 30, 2017

Bondrewd said:
Why?

It's an 815mm GPU, 33% larger than anything I've heard of being mass produced

Bondrewd · Oct 30, 2017

ieldra said:
It's an 815mm GPU, 33% larger than anything I've heard of being mass produced

They kindly asked TSMC to push the reticle limit forward.

Thats nothing groundbreaking or anything. The node is very mature so the yields must be in >1.7% range.

Deleted member 87499 · Oct 30, 2017

Is ahead of the two year cadence that Nvidia updates the architecture.

Bondrewd · Oct 30, 2017

el etro said:
Is ahead of the two year cadence that Nvidia updates the architecture.

They need to do something before various ML/DL ASICs flood the market. V100 kinda works. Not the most cost effective solution, but it'll do.
$NVDA is one hell of a bubble fueled by meme learning and they will do anything to not let it pop.

manux · Oct 31, 2017

Bondrewd said:
They kindly asked TSMC to push the reticle limit forward.

Thats nothing groundbreaking or anything. The node is very mature so the yields must be in >1.7% range.

Any data to back up this yield number?

CSI PC · Oct 31, 2017

Bondrewd said:
They need to do something before various ML/DL ASICs flood the market. V100 kinda works. Not the most cost effective solution, but it'll do.
$NVDA is one hell of a bubble fueled by meme learning and they will do anything to not let it pop.

If you include other critical factors such as HPC/AI large scale power requirements/power budget then it is a pretty important product for many out there or look at those that do DP scientific modelling where no other product touches this.
The closest general AI product to compare would be Google TPU2, and that does not necessarily match performance/efficiency of the V100 that combines a mixed-precision product with AI capability - been covered in the past by a few others here.
There is also the half height V100 which is more stripped down.
Also one pretty important consideration will be V102, just look at how the GP102 outperformed the GP100 in more common situations.
The one product that could cause a ripple for Nvidia is from Intel (beyond Xeon Phi), but depends how long it takes to really get everything aligned (HW arch more broadly and software) with their Nervana techi.

Ryan Smith · Oct 31, 2017

manux said:
Any data to back up this yield number?

He's being cheeky. 1.7% would be 1 GV100 per wafer.

Bondrewd · Oct 31, 2017

Ryan Smith said:
1.7% would be 1 GV100 per wafer.

Charlie would've liked it. :^)

silent_guy · Oct 31, 2017

BoMbY said:
That's the question. The same user has LuxBall HDR (simple scene) results at place 9 and 10 with 4x Vega, less than half of Anaconda's 4xV100. On Hotel (complex scene) it's 1 and 2 for 4x Vega, and that's 1.5x more than Anaconda's results. So I guess something is wrong with the V100 results, the question is: what?

Ahh, the two very good results seem to be using ROCm 1.6.4, the other are under Windows.

The V100 results show the expected perfect scaling, with the 4x number being exactly 4x the performance of the 1x.

The V100 single GPU score for Hotel is 12770. Which is “only” about 20% or so faster than a GP100 after correcting for cores and BW. A very good result, but not unexpected or terribly surprising if you look at the general architectural improvements.

The best single GPU score for gfx900 is 6526. It uses OpenCL 2.0 AMD-APP (2482.4). That number more or less matches a 1080Ti, which makes sense given that they have roughly the same amount of TFLOPS and BW.

The 4x score for gfx900 is 77530, or a whopping 19382 per unit, using OpenCL 2.0 AMD-APP (2508.0).

So one way or the other, AMD manages to triple the performance of its OpenCL drivers, way beyond what could reasonably be expected from the amount of TFLOPS and BW.

Or the alternative: those results are completely bogus.

Nvidia Volta Speculation Thread

CSI PC

BoMbY

Deleted member 2197

Guest

Ext3h

Ext3h

Bondrewd

Dayman1225

BoMbY

Deleted member 2197

Guest

Deleted member 87499

Guest

Bondrewd

ieldra

Bondrewd

Deleted member 87499

Guest

Bondrewd

manux

CSI PC

Ryan Smith

Bondrewd

silent_guy

Similar threads