Nvidia Volta Speculation Thread

silent_guy · May 18, 2017

Kaotik said:
V100 doesn't do 120 FP16 TFLOPS, it does 120 TFLOPS only with very specific Tensor ops.

So in the context of a TPU2 discussion, it does 120 TFLOPS?

Voxilla said:
Loosing a high profile customer like Google, undoubltly must worry Nvidia.
Also there is the possible prospect that other high profile customers, with deep pockets, may get inspired by this and start fabricating their own ASICs for DL.

Worrying or not, it definitely shouldn't have been a surprise. All big non-semi tech companies are hiring semiconductor engineers. I don't think Jim Keller went to Tesla for his expertise in the suppression of rotary engine induced vibrations.

silent_guy · May 18, 2017

CSI PC said:
The headache for Nvidia like I mentioned earlier is that at some point they will need the node to be able to offer both training and inference to a pretty high performance level (some will still want dedicated training/inference nodes and others will not just like Google and some other large scale deployers), something I have argued about for some time and it will affect how they position the Gx100 and Gx102 in future

It's not as if those tensor cores couldn't be used for inference either. A GV100 using FP16 tensor cores will still be much faster than a GP102 using INT8 math.

CSI PC · May 18, 2017

silent_guy said:
It's not as if those tensor cores couldn't be used for inference either. A GV100 using FP16 tensor cores will still be much faster than a GP102 using INT8 math.

I think we are in agreement just you are thinking now while my point is about product line and what Nvidia will do when there is a notable overlap between their top two Tesla GPU cards for DL ecosystem.
My post is in context of how Nvidia differentiate between Gx100 and GX102 and the headache it is causing them down the road an even a bit now, especially as some want a powerful single node doing both Training and Inference.
What you just said agrees with my post in some ways, you do not need both with their DL ecosystem (I agree others will want independent nodes though for training/inference); and the GV102 without DP cores will be a full uncut GPU meaning more SMs (with 8 Tensor per SM) and slighlty higher clock speed, meaning it will have greater performance than GV100 if one ignores DP.

Nvidia need to decide a better way to differentiate the Gx100 and Gx102 rather than by FP16 and Int8 down the road and probably by next generation.
It does not make sense to limit FP16 to just the GPU that has less SM and lower clocks (due to also supporting DP cores) and lower allowance for yield , especially as the Gx102 is importantly a smaller die with greater performance in this DL and FP32 context.
Market demands and competitors will probably force them to change IMO.
Maybe they can keep NVLink2 and its benefits to the Gx100 as the differentiator, but then again at some point they may have to consider this also on a Gx102 variant.
Would be nice though if they give the general CUDA cores full Vec2 FP16 throughput on the GV102 even if they decide to limit use of Tensor cores in some way.
Cheers

CarstenS · May 18, 2017

silent_guy said:
It's not as if those tensor cores couldn't be used for inference either. A GV100 using FP16 tensor cores will still be much faster than a GP102 using INT8 math.

I wonder whether those Tensorcores can be scheduled in overlap with the regular FP32/2×FP16-Units, i.e. utilized together and if so, how it would impact the power profile - or going further, if this is already factored into the 300 watts TDP.

CSI PC · May 18, 2017

CarstenS said:
I wonder whether those Tensorcores can be scheduled in overlap with the regular FP32/2×FP16-Units, i.e. utilized together and if so, how it would impact the power profile - or going further, if this is already factored into the 300 watts TDP.

Wouldn't the 1:2 ratio FP64 operations be the most demand heavy?
2nd even top demand maybe the parallel operations of In32 and FP32 available with V100?
This was one reason Nvidia used to downclock in the past regarding DP; as an example was it the 1st Titan model that enabling DP to 1:3 dropped the boost (it defaulted to something like 1:24 otherwise and higher clocks)?
Worth remembering the previous P100 Mezzanine was also 300W TDP.
CHeers

silent_guy · May 18, 2017

CarstenS said:
I wonder whether those Tensorcores can be scheduled in overlap with the regular FP32/2×FP16-Units, i.e. utilized together and if so, how it would impact the power profile - or going further, if this is already factored into the 300 watts TDP.

I doubt it. One way or the other, the tensor core needs to be fed with data. It's probably using the register file BW from the regular cores.
A more likely option is that the tensor core itself doesn't have all the HW for 4x4+4, but that it's already using some of the regular units, such as the FP32 for the final add.

CSI PC · May 18, 2017

One of the Nvidia blog writers also a senior Nvidia engineer Olivier Giroux confirmed the following regarding instruction scheduled occupy and mixing/issue slots of various instructions (including precision) and rate/cycle :

Question to Olivier at Nvidia in Blog said:
Bulat Ziganshin • 2 days ago

it seems that each FP32/INT32 instruction sheduled occupy sub-SM for a two cycles, so on the next sycle other type of instruction should be issued - pretty similar to LD/ST instructions on all NVidia GPUs as well as sheduling on SM 1.x GPUs

So, the new architecture allows to run FP32 instructions at full rate, and use remaining 50% of issue slots to execute all other type of istructions - INT32 for index/cycle calculations, load/store, branches, SFU, FP64 and so on. And unlike Maxwell/Pascal, full GPU utilization doesn't need to pack pairs of coissued instructions into the same thread - each next cycle can execute instructions from differemnt thread, so one thread perfroming series of FP32 instructions and other thread perfroming series of INT32 instructions, will load both blocks by 100%

is my understanding correct?

•

Reply

•

Olivier Giroux Bulat Ziganshin • a day ago
That is correct.

Olivier background:
About Olivier Giroux

Olivier Giroux has contributed to eight GPU, and four SM architecture generations released by NVIDIA. Lately, he works to clarify the shapes and semantics of valid GPU programs, present and future. He is a member of WG21, the ISO C++ committee.

Cheers

3dcgi · May 18, 2017

silent_guy said:
So in the context of a TPU2 discussion, it does 120 TFLOPS?

Worrying or not, it definitely shouldn't have been a surprise. All big non-semi tech companies are hiring semiconductor engineers. I don't think Jim Keller went to Tesla for his expertise in the suppression of rotary engine induced vibrations.

I'm surprised to hear Jim is an expert in rotary engine design, even if Tesla didn't hire him for that skill. ;-)

CarstenS · May 19, 2017

CSI PC said:
One of the Nvidia blog writers Olivier Giroux […]

… happens to be also a senior GPU Architect at Nvidia, who probably was hired for his expertise in blogging.

CSI PC · May 19, 2017

CarstenS said:
… happens to be also a senior GPU Architect at Nvidia, who probably was hired for his expertise in blogging.

Yeah good point, why I felt comfortable mentioning it as well, should had made it clearer myself for acceptance - in fact will do that now

Quite a few of the writers or contributors are heavily involved in the design process whether GPU architecture or ecosystem/library/instruction-operation design such as with CUDA/DL/cuBLAS-GEMM/DIGITS/etc.
Yeah far from being a standard blog.
But then most are using content they create and put into their presentations they use over the year on their subject matter, so not too much more work tbh.
Cheers

Deleted member 2197 · May 22, 2017

Modeling Cities in 3D Using Only Image Data
May 19, 2017

ETH Zurich scientists leveraged deep learning to automatically stich together millions of public images and video into a three-dimensional, living model of the city of Zurich.

https://news.developer.nvidia.com/modeling-cities-in-3d-using-only-image-data/

pjbliverpool · May 22, 2017

pharma said:
Modeling Cities in 3D Using Only Image Data
May 19, 2017

https://news.developer.nvidia.com/modeling-cities-in-3d-using-only-image-data/

Next step for Google Earth VR!

Razor1 · May 22, 2017

Quadro has nvlink
Interesting stuff

https://arstechnica.com/gadgets/2017/02/nvidia-quadro-gp100-price-details/

Ryan Smith · May 22, 2017

I thought everyone already knew this? It was part of the original announcement in February.

http://www.anandtech.com/show/11102/nvidia-announces-quadro-gp100

milk · May 22, 2017

This is CTOs

CSI PC · May 22, 2017

Yeah has already been tested with Amber some time ago, and also with the NVLink active and without.

One example, but performance gains with NVLink varies between 15%-27% depending upon the physics model, this of course does not represent all frameworks-applications though and optimisation is not complete (look at 12GB P100 as an example).
It could be interesting with NVLink 2 and Volta.
Cheers

Razor1 · May 23, 2017

https://videocardz.com/69742/rumor-is-this-nvidia-titan-volta

Hmm?

Geeforcer · May 23, 2017

Razor1 said:
https://videocardz.com/69742/rumor-is-this-nvidia-titan-volta

Hmm?

To shamelessly steal from that comment section:

C.J. Muse - Evercore Group LLC

Very helpful. I guess as my follow-up, on the inventory side, that grew I think 3% sequentially. Can you walk through the moving parts there? What's driving that, and is foundry diversification part of that? Thank you.

Jen-Hsun Huang - NVIDIA Corp.

The driving reasons for inventory growth is new products, and that's probably all I ought to say for now. I would come to GTC. Come to the keynote tomorrow. I think it will be fun.

Whether or not this is the example of the aforementioned, the fact that Nvidia has been stockpiling actual inventory for the "New", at that time unannounced products is in of itself noteworthy.

Jupiter · May 29, 2017

NVIDIA Rumored To Debut Enthusiast Volta Graphics Card At Computex Tomorrow – Watch The Live Keynote Here: http://wccftech.com/nvidia-rumored-...cs-card-computex-tomorrow-watch-live-keynote/

Rootax · May 29, 2017

They're on a roll...

Nvidia Volta Speculation Thread

silent_guy

silent_guy

CSI PC

CarstenS

Moderator

CSI PC

silent_guy

CSI PC

3dcgi

CarstenS

Moderator

CSI PC

Deleted member 2197

Guest

pjbliverpool

B3D Scallywag

Razor1

Ryan Smith

milk

Like Verified

CSI PC

Razor1

Geeforcer

Harmlessly Evil

Jupiter

Rootax

Similar threads