Nvidia Ampere Discussion [2020-05-14]

Man from Atlantis · May 14, 2020

A100 SXM2
I found the die size around 806mm2

Edit: EETimes went ahead and published the article before its due

Nvidia Reinvents GPU, Blows Previous Generation Out of the Water

The first chip built on Ampere, the A100, has some pretty impressive vital statistics. Powered by 54 billion transistors, it’s the world’s largest 7nm chip, according to Nvidia, delivering more than one Peta-operations per second. Nvidia claims the A100 has 20x the performance of the equivalent Volta device for both AI training (single precision, 32-bit floating point numbers) and AI inference (8-bit integer numbers). The same device used for high-performance scientific computing can beat Volta’s performance by 2.5x (for double precision, 64-bit numbers).
[...]
The Tensor Cores now also natively support double-precision (FP64) numbers, which more than doubles performance for HPC applications.
[...]

http://archive.is/fiMX1

xpea · May 14, 2020

The Ampere GA100 GPU ... to pack 128 SM units, equalling a total of 8192 CUDA cores... For memory, we are looking at six HBM stacks that point out a 6144-bit bus interface. The memory dies are definitely from Samsung who has been NVIDIA's strategic memory partner for HPC-centric GPUs.

Finally, NVIDIA will be announcing its next-generation DGX-A100 system which Jensen Huang teased a few days ago. The DGX-A100 will deliver 5 Petaflops of peak performance with its 8 Ampere based Tesla A100 GPUs. The system itself is 20x faster than the previous DGX based on NVIDIA's Volta GPU architecture. The reference cluster design features 140 DGX-A100 GPUs with a 200 Gbps Mellanox Infiniband interconnect. The whole system is going to start at $199,000 and is shipping as of today.

NVIDIA also confirmed that the DGX A100 systems are already in operation in the US Department of Energy (Aragone National laboratory), where they are used to fight Covid-19.

PS:

A reference design for a cluster of 140 DGX-A100 systems with Mellanox HDR 200Gbps InfiniBand interconnects, the DGX-superPOD, can achieve 700 petaflops for AI workloads. Nvidia has built a DGX-superPOD as part of its own Saturn-V supercomputer, and the system was stood up from scratch within three weeks. Saturn-V now has nearly 5 exaflops of AI compute, making it the fastest AI supercomputer in the world.

For Jensen personal usage :runaway:

Kaotik · May 14, 2020

"Nvidia claims the A100 has 20x the performance of the equivalent Volta device for both AI training (single precision, 32-bit floating point numbers) and AI inference (8-bit integer numbers)." ....if you use new Tensor Float 32 -precision not supported by Volta

techuse · May 14, 2020

Ya i knew there was some type of typical Nvidia shenanigans at play when i read that 20x blurb.

DegustatoR · May 14, 2020

Kaotik said:
"Nvidia claims the A100 has 20x the performance of the equivalent Volta device for both AI training (single precision, 32-bit floating point numbers) and AI inference (8-bit integer numbers)." ....if you use new Tensor Float 32 -precision not supported by Volta

And it'll be "just" 2.5x if you don't.

CarstenS · May 14, 2020

Some DGX A100 Test runs?
https://folding.extremeoverclocking.com/user_summary.php?s=&u=1096093

PSman1700 · May 14, 2020

pjbliverpool said:
Regarding the demo, we've had it heavily implied (possibly stated directly) that it runs on both XSX and PC so it's probably a little premature to start claiming it's only possible because of the PS5's SSD.

That UE5 demo is probably even more impressive compared to what NV can tech demo on a 3080/nvme optane system.

xpea · May 14, 2020

https://twitter.com/i/web/status/1260875972221579264

it's a change from previous Volta / Turing gen

another source:
https://www.marketwatch.com/story/n...is-coronavirus-2020-05-14?link=MW_latest_news

Ampere will eventually replace Nvidia’s Turing and Volta chips with a single platform that streamlines Nvidia's GPU lineup, Huang said in a pre-briefing with media members Wednesday. While consumers largely know Nvidia for its videogame hardware, the first launches with Ampere are aimed at AI needs in the cloud and for research.

“Unquestionably, it’s the first time that we’ve unified the acceleration workload of the entire data center into one single platform,” Huang said.

Nvidia discovered years ago that its gaming hardware was beneficial to machine learning thanks to its parallel-processing design — when researchers attempt to “teach” algorithms with data, GPUs help to push more of that data through at a faster rate. It has steadily developed products based on those needs for high-performance computing, data centers and autonomous driving since.

Deleted member 13524 · May 14, 2020

Man from Atlantis said:
A100 SXM2
I found the die size around 806mm2

Edit: EETimes went ahead and published the article before its due

Nvidia Reinvents GPU, Blows Previous Generation Out of the Water

https://webcache.googleusercontent....n-out-of-the-water/+&cd=9&hl=en&ct=clnk&gl=in

I sure hope this server GPU isn't the only thing they're going to show today, and we'll be able to get a glimpse of the new consumer lineup.

DavidGraham · May 14, 2020

So now that Ampere is the overall arch for consumer and HPC, consumer chips will most likely cut down on Tensor units count, there is also a possibility of a Titan Ampere GPU as well, like Titan V.

The bigAmpere HPC is definitely a 128SM GPU, we need to figure out the frequency and power consumption now so we can infer some info about the rest of the lineup.

szatkus · May 14, 2020

DavidGraham said:
So now that Ampere is the overall arch for consumer and HPC, consumer chips will most likely cut down on Tensor units count, there is also a possibility of a Titan Ampere GPU as well, like Titan V.

The bigAmpere HPC is definitely a 128SM GPU, we need to figure out the frequency and power consumption now so we can infer some info about the rest of the lineup.

To reach 5PTFLOPS it needs a bit more than 128SM. Or really high clocks.

pjbliverpool · May 14, 2020

szatkus said:
To reach 5PTFLOPS it needs a bit more than 128SM. Or really high clocks.

I haven't done the math but is that based on CUDA FLOPS or Tensor FLOPS? As the Tensor cores pump out a lot more.

Kaotik · May 14, 2020

szatkus said:
To reach 5PTFLOPS it needs a bit more than 128SM. Or really high clocks.

pjbliverpool said:
I haven't done the math but is that based on CUDA FLOPS or Tensor FLOPS? As the Tensor cores pump out a lot more.

8 GPUs pushing 5 PFLOPS is around 625 TFLOPS per GPU, clearly they're talking about Tensor-FLOPS and not general FP32 FLOPS

DavidGraham · May 14, 2020

The A100 comes with
6,912 FP32 CUDA Cores @~1.41GHz
3,456 FP64 CUDA Cores,
432 Tensor Cores,
108 streaming multiprocessors
40 GB of GPU memory
within a 400-watt power envelope.

https://www.crn.com/news/components...m-ampere-a100-gpu-to-unify-training-inference
https://www.anandtech.com/show/15801/nvidia-announces-ampere-architecture-and-a100-products

xpea · May 14, 2020

DGX A100 specs

AMD EPIC 7742 !!!

pjbliverpool · May 14, 2020

Kaotik said:
8 GPUs pushing 5 PFLOPS is around 625 TFLOPS per GPU, clearly they're talking about Tensor-FLOPS and not general FP32 FLOPS

Yeah I was being very lazy on the maths. This sounds correctto me then as the 2080Ti is rated at 440 TFLOPS in INT4 and that's with 80SM's. So it seems to be running at a slightly slower clock than the 2080Ti all else being equal.

DavidGraham · May 14, 2020

So the first iteration: Tesla A100 is a cut down one, from 128SM to 108SM.

xpea · May 14, 2020

Big numbers here !!!

manux · May 14, 2020

Omniverse ray tracing stuff looks great. Link should be to right time, if not it starts around 9:42. The actual explanation what omniverse is before the demo part

DavidGraham · May 14, 2020

So NVIDIA effectively traded FP32 CUDA cores with FP32 Tensor Cores, Tesla A100 is really just Ampere optimized for AI.

Regular FP32 is : 19.5 TF
Tensor FP32: 156 TF, accelerated to 312 TF effective through "sparse acceleration"

Consumer Ampere will definitely cut down on the advanced tensor stuff and trade back the lost FP32 CUDA cores.

Nvidia Ampere Discussion [2020-05-14]

Man from Atlantis

xpea

Kaotik

Drunk Member

techuse

DegustatoR

CarstenS

Moderator

PSman1700

xpea

Deleted member 13524

Guest

DavidGraham

szatkus

pjbliverpool

B3D Scallywag

Kaotik

Drunk Member

DavidGraham

xpea

pjbliverpool

B3D Scallywag

DavidGraham

xpea

manux

DavidGraham

Similar threads