Nvidia Pascal Announcement

huebie · Apr 5, 2016

Just in case you want to watch the livestream:

Grall · Apr 5, 2016

BLAAAAAAHHHH, flash player! Wtf, Nvidia!

Razor1 · Apr 5, 2016

Presentation notes

Cuda developers x4 over all

Cuda developers in automotive and hyperscale x10

nV SDK:

Gameworks:

Volumetric lighting

Voxel Accelerated AO

Hybrid frustum traced shadows

Designworks:

Adobe MDL

Iray

Compute Works:

CUDA 8

cuDnn 5

nvGraph

Index plugin Visualization of data quickly

VRworks:

Oculus Rift and HTC Vive

Unreal, Max Play and Unity

Driveworks:

Still working on it, but is available to test with. Early access has already started Q1 of next year is the release.

nV Jetpack

GIE: GPU inference engine coming soon in May (jetson tx1: 24 images/watt), CUDA is the most energy efficient approach for deep learning.

VR:

Going to be able to do design visualization
Going to places where we can't normal go

Photo-real is a necessity, we need more performance.

Iray Vr can do this takes many GPU's and time to do this for photo realism VR but can be done in real time now.

Iray Vr lite, can be used on any hardware types and already has integration into 3dsMax and Maya and Google cardboard box coming in June.

AI:

5 years ago Deep learning started.

Alpha Go, 1000 cpu's and 60 GPU's. Computers powered by deep learning can do more than humans can program for.

New Computing model:

Deep learning Object detection, DNN, Data HPC

No longer have to have different programs written to do different things and it gets better results.

Industry funding is high 5 billion

AI has become a platform

P100 is in volume production.

P100 samples are out and they are being used by OEMs Q1 2017 servers will be available.

Deep learning supercomputer DGX-1
170 TF
3200 watts, 8 GPU's, 7 tb sdd's, etc.
12x faster performance for deep learning from last year.

Pascal with recurrent neuronets:
Interconnect is very important
Capabilities:
Persistent RNN's, keeping everything in the GPU with less
Register file for pascal 14mb vs 8 mb in Maxwell
nV link helps with splitting work across GPU's. Creates a wider model with more processors (30x more)

TensorFlow, DGX-1, easy adaptability, performance is key.

DGX-1 $129k

Already getting colleges and research labs and medicine are targeted

fellix · Apr 5, 2016

trinibwoy · Apr 5, 2016

600 mm^2 on 16nm. Goddam.

fellix · Apr 5, 2016

Yep, Nvidia again simply asked TSMC for the maximum reticle they can put out and shove it with logic as dense as possible.

But 14MB of GPR is curious -- now many MPs is that?
GM200 has 6MB distributed over 24 multiprocessors.

OlegSH · Apr 5, 2016

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Voxilla · Apr 5, 2016

My last Pascal guess wasn't too far off.
(Curiously they seem to have ditched the DLTOPs)

Voxilla said:
From that it now becomes more clear how the next big Pascal i.e. GP200 will look like
4096 SP / 8 TFLOPS SP / 4 TFLOPS DP
Regarding the FP16 performance, NV have created this new metric of
DLTOPs (deep learning tera operations per second)
Which would be at 24 DLTOPS. Given the new name that indicates it's not the same as 24 TFLOPS FP16.

Edit: The announced GP100 has 3584 cores and based on the 1.48Ghz boost clock does 10.6 TF
So up from 3072 cores. Going from 8 to 15 B transistors only 512 more cores ? Most of the speedup comes from the higher clock. Additional going from 250W to 300W.

fellix · Apr 5, 2016

The multiprocessor design is very similar to Maxwell (sort of scaled down), now with more dedicated DP units and an updated ISA for mixed precision support.

The shared memory size has been reduced, though. :???:

Berek · Apr 5, 2016

It sounds like they announced the higher level introduction of Pascal for server use, but not any information about the consumer cards or mobile yet?

Newguy · Apr 5, 2016

OlegSH said:
https://devblogs.nvidia.com/parallelforall/inside-pascal/

So GP100 has 3584 cores but:

"Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units."

Interestingly some left there. Also not nearly as big of a jump in shaders as I would've thought.

OlegSH · Apr 5, 2016

fellix said:
The shared memory size has been reduced, though

The amount of SMs has been doubled in GP100, it has 2x of registers and 1.5x of shared memory per lane

fellix · Apr 5, 2016

Berek said:
It sounds like they announced the higher level introduction of Pascal for server use, but not any information about the consumer cards or mobile yet?

I guess the mid-grade GeForce SKU will use a third smaller chip than the P100, with GDDR5X. That would result in slightly beefier GPU than GM204, but with much better perf/Watt and TurboBoost range.

Ailuros · Apr 5, 2016

fellix said:
The multiprocessor design is very similar to Maxwell (sort of scaled down), now with more dedicated DP units and an updated ISA for mixed precision support.

The shared memory size has been reduced, though.

I hope the diagram isn't misleading considering dedicated DP units; I didn't expect otherwise to be honest but I also didn't expect as many

SimBy · Apr 5, 2016

So nothing consumer grade shown?

Razor1 · Apr 5, 2016

nope nothing for gaming

mpg1 · Apr 5, 2016

weird...no hint as to even a date?

fellix · Apr 5, 2016

That's pretty high Turbo clock for the big Pascal -- 1480MHz. I can only imagine how high the smaller consumer SKUs will reach.

Razor1 · Apr 5, 2016

mpg1 said:
weird...no hint as to even a date?

Well since all the software is going to be released in June and that software will need the next gen GPU to run..... something will be out by June.

McHuj · Apr 5, 2016

Hopefully, they drop the DP stuff for the consumer models and add more shaders instead.

Nvidia Pascal Announcement

huebie

Grall

Invisible Member

Razor1

fellix

trinibwoy

Meh

fellix

OlegSH

Voxilla

fellix

Berek

Newguy

OlegSH

fellix

Ailuros

Epsilon plus three

SimBy

Razor1

mpg1

fellix

Razor1

McHuj

Similar threads