Nvidia Hopper Speculation, Rumours and Discussion

troyan · Jul 9, 2022

Kaotik said:
Have you actually read the article you linked? It has absolutely nothing to do with DL or AI or any such thing, it's just telling that AMD has decided to use Google for additional capacity on top of what they had. The hardware they're using in the Google Cloud is nothing special either, just Milan Epycs.

Improved design and operations from applied Google Cloud artificial intelligence and machine learning tools and frameworks

Which other reason exists to pay a company to use their own CPUs?!

Kaotik · Jul 9, 2022

troyan said:
Which other reason exists to pay a company to use their own CPUs?!

Must have missed that part reading it through, but it still in no way indicates or suggests it would be the first they're using such technologies.
As for the economics of it, I can see plenty of scenarios where it's more economical to rent your "own hardware" from outside source rather than build similar sized server farm yourself.

xpea · Jul 9, 2022

https://twitter.com/x/status/1545811643107487744

Took 8 days to train 64b Hopper circuits on Selene. Faster than I expected. Of couse it's just the arithmetic blocks, not logic, so total RTL design was much longer.

TopSpoiler · Aug 18, 2022

Upcoming GTC sessions related to Hopper architecture.

Attendee Portal

register.nvidia.com

Learn about the latest additions to the CUDA platform, Language and Toolkit, and what the new Hopper GPU architecture brings to CUDA. Presented by one of the architects of CUDA, this engineer-focused session covers all the latest developments for NVIDIA's GPU developer ecosystem as well as looking ahead to where CUDA will be going over the coming year.

Attendee Portal

register.nvidia.com

NVIDIA’s new Hopper GPUs contain advanced features that can unleash tremendous application performance, but they can also require some new techniques for different ways of coding your applications. Learn how to access Hopper’s advanced capabilities and squeeze all the juice from your hardware while retaining performance portability. We’ll cover how to opt in for performance features, design patterns for structuring code for compatibility and performance, and strategies for effective testing and QA of high-performance applications with an eye to portability.

Attendee Portal

register.nvidia.com

This session will introduce new features in CUDA for programming Hopper architecture. The new programming model for Hopper is more hierarchical and asynchronous. CUDA programming for Hopper introduces optional level of hierarchy called Thread Block clusters, that enable multiple thread blocks within the cluster to communicate using a common pool of shared memory. The asynchronous data movement is now hardware accelerated in all directions between global and shared memories. We will look at how to exploit the new programming model in applications for performance tuning.

Attendee Portal

register.nvidia.com

The upcoming Hopper-based platforms, as well as the Grace Hopper Superchip are exciting developments for high-performance computing. Taking advantage of this new hardware performance takes great software, and CUDA developer tools are here to help. We'll give a brief overview of the tools available for free to developers, then detail the newest features and explain how they help users identify performance and correctness issues, where they are occurring, and some options to fix them. We'll pay specific attention to features supporting new architectures. CUDA developer tools are designed in lockstep with the CUDA ecosystem, including hardware and software. With new technologies like the Grace Hopper Superchip, visibility and optimization of the entire platform are key to unleashing the next level of accelerated computing performance. This presentation will prepare you for that move to the leading edge.

troyan · Aug 19, 2022

There is no dedicacted Grace thread, so i put this here. nVidia disclosed more information about their server CPU: https://www.tomshardware.com/news/n...superchip-design-144-cores-on-4n-tsmc-process

Deleted member 2197 · Aug 22, 2022

NVIDIA Hopper H100 With 4th Gen Tensor Core Is Twice As Fast Clock-For-Clock, Frequency Delivers 30% Performance Gain

NVIDIA is further dissecting its Hopper H100 GPU at Hot Chips 34, giving us a better taste of the 4th Gen Tensor Core architecture.

wccftech.com

The H100 is built upon the PG520 PCB board which has over 30 power VRMs & a massive integral interposer that uses TSMC's CoWoS tech to combine the Hopper H100 GPU with a 6-stack HBM3 design. Some of the main technologies of the Hopper H100 GPU include:

132 SMs (2x Performance Per Clock)
4th Gen Tensor Cores
Thread Block Clusters
2nd Gen Multi-Instance GPU
Confidential Computing
PCIe Gen 5.0 Interface
World's First HBM3 DRAM
Larger 50 MB L2 Cache
4th Gen NVLink (900 GB/s Total Bandwidth)
New SHARP support
NVLink Network

Out of the six stacks, two stacks are kept to ensure yield integrity. But the new HBM3 standard allows for up to 80 GB capacities at 3 TB/s speeds which are crazy. For comparison, the current fastest gaming graphics card, the RTX 3090 Ti, offers just 1 TB/s of bandwidth and 24 GB VRAM capacities. Other than that, the H100 Hopper GPU also packs in the latest FP8 data format, and through its new SXM connection, it helps accommodate the 700W power design that the chip is designed around. It also offers twice the FP32 and FP64 FMA rates and 256 KB L1 cache (shared memory).
...
Rounding up the performance figures, NVIDIA's GH100 Hopper GPU will offer 4000 TFLOPs of FP8, 2000 TFLOPs of FP16, 1000 TFLOPs of TF32 and 60 TFLOPs of FP64 Compute performance. These record-shattering figures decimate all other HPC accelerators that came before it. For comparison, this is 3.3x faster than NVIDIA's own A100 GPU and 28% faster than AMD's Instinct MI250X in the FP64 compute. In FP16 compute, the H100 GPU is 3x faster than A100 and 5.2x faster than MI250X which is literally bonkers.

Kaotik · Aug 23, 2022

Didn't people here laugh at AMD putting 1:1 FP64, yet here we are next gen and NVIDIA is doing the exact same thing (also Intel)

neckthrough · Aug 23, 2022

Kaotik said:
Didn't people here laugh at AMD putting 1:1 FP64, yet here we are next gen and NVIDIA is doing the exact same thing (also Intel)

According to the specs H100 is 1:2 non-tensor and 1:17 tensor (TF32).

Not that I’m laughing at either NVIDIA or AMD’s decisions here. I’m sure they’ve worked out the math.

Kaotik · Aug 23, 2022

neckthrough said:
According to the specs H100 is 1:2 non-tensor and 1:17 tensor (TF32).

Not that I’m laughing at either NVIDIA or AMD’s decisions here. I’m sure they’ve worked out the math.

Was the link above wrong then? It at least lists 1:1 FP32:FP64 cores?

Deleted member 2197 · Aug 23, 2022

Kaotik said:
Was the link above wrong then? It at least lists 1:1 FP32:FP64 cores?

They are talking about Intel's Ponte Vecchio.

Kaotik · Aug 23, 2022

pharma said:
They are talking about Intel's Ponte Vecchio.

Huh?
https://wccftech.com/nvidia-hopper-h100-4th-gen-tensor-core-twice-as-fast-clock-for-clock/
for sure isn't talking about Ponte Vecchio, could be simple mistake too of course but they list same amount of FP32 and FP64 CUDA cores

TopSpoiler · Aug 23, 2022

Kaotik said:
Huh?
https://wccftech.com/nvidia-hopper-h100-4th-gen-tensor-core-twice-as-fast-clock-for-clock/
for sure isn't talking about Ponte Vecchio, could be simple mistake too of course but they list same amount of FP32 and FP64 CUDA cores

Wccftech is wrong. See official specs here, The number of FP64 CUDA cores are half of FP32.

Deleted member 2197 · Aug 24, 2022

NVIDIA NVLink4 NVSwitch at Hot Chips 34

We got a treat at Hot Chips 34 with a look at the NVLink4 NVSwitch and how NVIDIA is expanding the use of the technology

www.servethehome.com

DegustatoR · Aug 24, 2022

NVIDIA Hopper Features "SM-to-SM" Comms Within GPC That Minimize Cache Roundtrips and Boost Multi-Instance Performance

NVIDIA in its HotChips 34 presentation revealed a defining feature of its "Hopper" compute architecture that works to increase parallelism and help the H100 processor better perform in a multi-instance environment. The hardware component hierarchy of "Hopper" is typical of NVIDIA architectures...

www.techpowerup.com

NVIDIA Grace CPU Specs Remind Us Why Intel Never Shared x86 with the Green Team

NVIDIA designed the Grace CPU, a processor in the classical sense, to replace the Intel Xeon or AMD EPYC processors it was having to cram into its pre-built HPC compute servers for serial-processing roles, and mainly because those half-a-dozen GPU HPC processors need to be interconnected by a...

www.techpowerup.com

DavidGraham · Sep 9, 2022

NVIDIA says H100 is up to 4X times faster than A100.

NVIDIA Clean Sweeps MLPerf AI Benchmarks With Hopper H100 GPU, Up To 4.5x Performance Uplift Over Ampere A100

NVIDIA's Hopper H100 GPU has made its debut on the MLPerf AI Benchmark list and shattered all previous records achieved by Ampere A100.

wccftech.com

Deleted member 2197 · Sep 21, 2022

NVIDIA: H100 Hopper Accelerator Now in Full Production, DGX Shipping In Q1’23

www.anandtech.com

DavidGraham · Sep 27, 2022

Shaheen III HPC Super Computer deploys 2800 of Hopper Grace chips (GPU + CPU), Alongside thousands of Zen 4 CPUs from AMD.

KAUST Goes For Nvidia CPU-GPU, AMD CPU Combo In Shaheen-III Supercomputer

The King Abdullah University of Science and Technology (KAUST) in Saudi Arabia has introduced the latest in its long-standing lineup of successive

www.nextplatform.com

TopSpoiler · Oct 3, 2022

NVIDIA updates Hopper H100 data-center GPU FP32 performance from 60 to 67 TFLOPS - VideoCardz.com

NVIDIA updates H100 specs The preliminary specs have been updated with more accurate data. NVIDIA has updated the specs for the H100, just 2 weeks after confirming its availability date. Hopper H100 Data-Center GPU, Source: NVIDIA NVIDIA is now displaying more accurate single-precision and...

videocardz.com

troyan · Oct 10, 2022

He said that running the HPL benchmark is different from engaging the system while running scientific applications “without a hardware failure, without a hiccup in the network, and getting everything tuned.”

Frontier Testing and Tuning Problems Downplayed by Oak Ridge - High-Performance Computing News Analysis | insideHPC

Everything about the Frontier supercomputer, the world’s first exascale system residing at Oak Ridge National Laboratory, is outsized: its power, its scale and the attention it draws. In HPC circles, the attention and talk have increasingly focused on performance problems as the lab gets the...

insidehpc.com

Distributed computing is hard. Just building a linpack chip and outdated interconnections is not enough. Optical NVLink is so far ahead right know that real scientific workloads will be running so much better on their plattform.

Kaotik · Oct 10, 2022

troyan said:
Frontier Testing and Tuning Problems Downplayed by Oak Ridge - High-Performance Computing News Analysis | insideHPC

Everything about the Frontier supercomputer, the world’s first exascale system residing at Oak Ridge National Laboratory, is outsized: its power, its scale and the attention it draws. In HPC circles, the attention and talk have increasingly focused on performance problems as the lab gets the...

insidehpc.com

Distributed computing is hard. Just building a linpack chip and outdated interconnections is not enough. Optical NVLink is so far ahead right know that real scientific workloads will be running so much better on their plattform.

Seriously, you need to fix your hostility.
Slingshot interconnect launched in 2020 was definitely not outdated when they specced Frontier and the nodes it uses. Optical NVLink was nowhere near ready when those systems were specced.

Nvidia Hopper Speculation, Rumours and Discussion

Drunk Member

Deleted member 2197

Guest

Drunk Member

Drunk Member

Deleted member 2197

Guest

Drunk Member

Deleted member 2197

Guest

Deleted member 2197

Guest

Drunk Member