Nvidia Pascal Announcement

From WCCFTech: "NVIDIA Rumored To Release Pascal Refresh With GDDR5X and Faster Clocks – Volta To Feature HBM2 and GDDR6 Support, 16 GB Standard Capacity [for a 256-bit bus]."

WCCFTech said:
The refined 16nm process is meant to offer even higher clock speeds and further stability on the Pascal GPUs. We can see the new chips clocking close to 2 GHz and maintaining those speeds under maximum load. Furthermore, GDDR5X has been a main issue for NVIDIA. Micron is under pressure to offer better yields of GDDR5X chips and that has gotten better over the while.
[…]
[At GTC 2017,] Jen-Hsun is also expected to showcase an updated NVIDIA GPU roadmap which will include new codenames and tech details for future chips. NVIDIA is expected to drop 10nm process and go straight for 7nm with their post-Volta GPUs, supporting HBM3 and GDDR6 memory standards.
 
Do we know anything about GDDR6? Superficially, it seems like a refreshed GDDR5X, but there's got to be something deeper than just that.

Not much that what have been shown by Samsung ..

In case you had not see it....

Hotchips 28 via computerbase.de: https://www.computerbase.de/2016-08/samsung-gddr6-14-16-gbps-2018/

2-630.3776517885.png


1-630.306409803.png
3-630.3321462575.png
4-630.711483079.png
 
Last edited:
Neat find.

To save others the effort, see the pertinent slide below.

512 "cores" doesn't seem like a lot for a gpu, but there could be some additional nuance in play.

xavier-gtc-europe-keynote.jpg

It looks like there's going to be an 8-way DPA instruction and that this SoC includes only 64b cores

512 cores x 8 ops per cycle x 2ghz = 20 TOP/s

It's roughly one fifth the size of GP104 so by my very rough estimate it's doubling perf/w (quadruple if you consider new instruction)

Btw I saw 20TOP/s claimed for this SoC but now I can't find the claim lol

edit:
~16 TOP/s not 20 according to my calculation at 2ghz
 
Last edited:
Btw I saw 20TOP/s claimed for this SoC but now I can't find the claim lol
It's on a separate slide (referenced by AnandTech).

Xavier2B.jpg


AnandTech said:
With Xavier, NVIDIA wants to get to 20 Deep Learning Tera-Ops (DL TOPS), which is a metric for measuring 8-bit Integer operations. 20 DL TOPS happens to be what Drive PX 2 can hit, and about 43% of what NVIDIA’s flagship Tesla P40 can offer in a 250W card. And perhaps more surprising still, NVIDIA wants to do this all at 20W, or 1 DL TOPS-per-watt, which is one-quarter of the power consumption of Drive PX 2, a lofty goal given that this is based on the same 16nm process as Pascal and all of the Drive PX 2’s various processors.
 
Did nvidia ever get to the 10x maxwell performance in pascal?

The 10x was a invented metrics... lately Nvidia do slides as they could do it if they was in the smartphone market.. 2x more Vram bandwith, 2x performance / wattts, 4x performance at FP16 + 2x more Vram capacity = 10x more performance ( 2+2+4+2 )
 
Last edited:
It was actually 10x faster at training AlexNet and yes they did achieve that (almost)

https://blogs.nvidia.com/blog/2015/03/17/pascal/
That is what they said but where does it actually show any tests or benchmarks? As far as I can tell they put nvlink as the reason it somehow gets 10x instead of 5x which makes me wonder if they are comparing 2 GPUs vs 1 because their half precision performance doesn't go up nearly 10x on paper.
 
That is what they said but where does it actually show any tests or benchmarks? As far as I can tell they put nvlink as the reason it somehow gets 10x instead of 5x which makes me wonder if they are comparing 2 GPUs vs 1 because their half precision performance doesn't go up nearly 10x on paper.
That 10x number includes a doubling in the number of GPUs due to NVLink. The part of the slide where NVLink is mentioned is cut off in the picture in the blog.

_id1426673783_343178_1.jpg
 
That is what they said but where does it actually show any tests or benchmarks? As far as I can tell they put nvlink as the reason it somehow gets 10x instead of 5x which makes me wonder if they are comparing 2 GPUs vs 1 because their half precision performance doesn't go up nearly 10x on paper.

I think they were comparing racks,so it was a dgx-1
That 10x number includes a doubling in the number of GPUs due to NVLink. The part of the slide where NVLink is mentioned is cut off in the picture in the blog.

_id1426673783_343178_1.jpg

Yeah and they reported a 12x speedup using DGX-1 vs a maxwell based rack which I think had half as many GPUs
 
Back
Top