Nvidia Pascal Announcement

Presentation notes


Cuda developers x4 over all

Cuda developers in automotive and hyperscale x10


nV SDK:

Gameworks:

Volumetric lighting

Voxel Accelerated AO

Hybrid frustum traced shadows


Designworks:

Adobe MDL

Iray


Compute Works:

CUDA 8

cuDnn 5

nvGraph

Index plugin Visualization of data quickly

VRworks:

Oculus Rift and HTC Vive

Unreal, Max Play and Unity


Driveworks:

Still working on it, but is available to test with. Early access has already started Q1 of next year is the release.


nV Jetpack

GIE: GPU inference engine coming soon in May (jetson tx1: 24 images/watt), CUDA is the most energy efficient approach for deep learning.

VR:

Going to be able to do design visualization
Going to places where we can't normal go

Photo-real is a necessity, we need more performance.

Iray Vr can do this takes many GPU's and time to do this for photo realism VR but can be done in real time now.

Iray Vr lite, can be used on any hardware types and already has integration into 3dsMax and Maya and Google cardboard box coming in June.

AI:

5 years ago Deep learning started.

Alpha Go, 1000 cpu's and 60 GPU's. Computers powered by deep learning can do more than humans can program for.

New Computing model:

Deep learning Object detection, DNN, Data HPC

No longer have to have different programs written to do different things and it gets better results.

Industry funding is high 5 billion

AI has become a platform

P100 is in volume production.

P100 samples are out and they are being used by OEMs Q1 2017 servers will be available.

Deep learning supercomputer DGX-1
170 TF
3200 watts, 8 GPU's, 7 tb sdd's, etc.
12x faster performance for deep learning from last year.

Pascal with recurrent neuronets:
Interconnect is very important
Capabilities:
Persistent RNN's, keeping everything in the GPU with less
Register file for pascal 14mb vs 8 mb in Maxwell
nV link helps with splitting work across GPU's. Creates a wider model with more processors (30x more)

TensorFlow, DGX-1, easy adaptability, performance is key.

DGX-1 $129k

Already getting colleges and research labs and medicine are targeted
 
Last edited:
fn9WhVG.png

bM9Xt75.png
 
Yep, Nvidia again simply asked TSMC for the maximum reticle they can put out and shove it with logic as dense as possible.

But 14MB of GPR is curious -- now many MPs is that?
GM200 has 6MB distributed over 24 multiprocessors.
 
My last Pascal guess wasn't too far off.
(Curiously they seem to have ditched the DLTOPs)


From that it now becomes more clear how the next big Pascal i.e. GP200 will look like
4096 SP / 8 TFLOPS SP / 4 TFLOPS DP
Regarding the FP16 performance, NV have created this new metric of
DLTOPs (deep learning tera operations per second)
Which would be at 24 DLTOPS. Given the new name that indicates it's not the same as 24 TFLOPS FP16.

Edit: The announced GP100 has 3584 cores and based on the 1.48Ghz boost clock does 10.6 TF
So up from 3072 cores. Going from 8 to 15 B transistors only 512 more cores ? Most of the speedup comes from the higher clock. Additional going from 250W to 300W.
 
Last edited:
dpEqRci.png


The multiprocessor design is very similar to Maxwell (sort of scaled down), now with more dedicated DP units and an updated ISA for mixed precision support.

The shared memory size has been reduced, though. :???:
 
It sounds like they announced the higher level introduction of Pascal for server use, but not any information about the consumer cards or mobile yet?
 
It sounds like they announced the higher level introduction of Pascal for server use, but not any information about the consumer cards or mobile yet?
I guess the mid-grade GeForce SKU will use a third smaller chip than the P100, with GDDR5X. That would result in slightly beefier GPU than GM204, but with much better perf/Watt and TurboBoost range.
 
The multiprocessor design is very similar to Maxwell (sort of scaled down), now with more dedicated DP units and an updated ISA for mixed precision support.

The shared memory size has been reduced, though. :???:

I hope the diagram isn't misleading considering dedicated DP units; I didn't expect otherwise to be honest but I also didn't expect as many ;)
 
That's pretty high Turbo clock for the big Pascal -- 1480MHz. I can only imagine how high the smaller consumer SKUs will reach.
 
Back
Top