Nvidia Hopper Speculation, Rumours and Discussion

CarstenS · Jan 29, 2022

>reticle limit.
Is it still monolithic when using stitching?

DegustatoR · Jan 29, 2022

There's supposedly a two die product there though.

DavidGraham · Jan 29, 2022

Hopefully it works better than the lame attempt that is MI200.

Rootax · Jan 29, 2022

DavidGraham said:
Hopefully it works better than the lame attempt that is MI200.

What's the problem with MI200 ?

DegustatoR · Jan 29, 2022

There's also Grace which AFAIU can be coupled with GH as an MCM. Which kinda also makes it a multi-die solution if not a GPU one.

DavidGraham · Jan 29, 2022

Rootax said:
What's the problem with MI200 ?

It featured two dies, but each die works as a separate GPU, essentially two cross-fire GPUs on a single PCB, with all of the associated problems with such confiuration. Not a true MCM die where all the chiplets work together as one coherent big GPU.

Granath · Jan 30, 2022

DavidGraham said:
It featured two dies, but each die works as a separate GPU, essentially two cross-fire GPUs on a single PCB, with all of the associated problems with such confiuration. Not a true MCM die where all the chiplets work together as one coherent big GPU.

yeah, but it would be true for gaming GPU.
HPC consists of thousands of such interconnected nodes, so software can and must split workflow between them. It must, without it there won't be any supercomputer. It's a build-in feature. Two districts GPU doesn't looks as bad as you try to picture it.

Qesa · Jan 30, 2022

Granath said:
yeah, but it would be true for gaming GPU.
HPC consists of thousands of such interconnected nodes, so software can and must split workflow between them. It must, without it there won't be any supercomputer. It's a build-in feature. Two districts GPU doesn't looks as bad as you try to picture it.

It's not bad, no, but it's not worthy of AMD calling it the first MCM GPU

Bondrewd · Jan 30, 2022

DavidGraham said:
Hopefully it works better than the lame attempt that is MI200.

Exact same shit (but more kilowatts per node!) so ugh, NUMA-NUMA yay.

Granath said:
Two districts GPU doesn't looks as bad as you try to picture it.

Bingo.

Qesa said:
but it's not worthy of AMD calling it the first MCM GPU

Oh but it is one.
The next thingy is chiplet tho.

DegustatoR · Jan 30, 2022

Bondrewd said:
Exact same shit (but more kilowatts per node!)

Nope.

Jawed · Jan 30, 2022

I've seen a suggestion that there will be a consumer version of Hopper.

I do wonder whether Lovelace is actually a consumer GPU. There was a suggestion at one point that Lovelace is for a new Nintendo.

NVidia's "Ampere next" and "Ampere next next" games are certainly fun...

DegustatoR · Jan 30, 2022

Jawed said:
I've seen a suggestion that there will be a consumer version of Hopper.

I doubt that it would make much sense as a consumer GPU but then again who knows what they'll do against some $5000 competition product. A 10% win over a $1000 product at $5000 is considered a win in modern days, right?

Bondrewd · Jan 30, 2022

DegustatoR said:
Nope.

Yea.
Was Gracehoppium slideware not clean 'nuff?
You get more NVLink and you're gonna like it.

Jawed said:
I do wonder whether Lovelace is actually a consumer GPU.

That's a whole lineup of cookers, not a single part like DC stuff.

trinibwoy · Jan 31, 2022

Bondrewd said:
You get more NVLink and you're gonna like it.

Useful from a rack density perspective but still 2 GPUs on a stick like MI200. Wake me up when software actually treats these things as a single GPU.

Bondrewd · Jan 31, 2022

trinibwoy said:
Wake me up when software actually treats these things as a single GPU

Never, at least in DC.
It'll be faster/sleeker/lower wattage per bit but...
...NUMA is way of the future.

neckthrough · Jan 31, 2022

trinibwoy said:
Wake me up when software actually treats these things as a single GPU.

The single GPU abstraction would need to be created at some level in the software stack, because the hardware doesn't look like that any more. While providing that abstraction universally (e.g., in the driver) may be useful to get scaling for some software (e.g., legacy code), actually exposing the non-uniformity of the underlying hardware allows more sophisticated software to squeeze out efficiency. The way that silicon scaling is going, we should expect this trend to continue. It doesn't work for all workloads, but is tractable for some.

But I'll play devil's advocate for a second. We've seen this scenario (kinda) play out in the VLIW-vs-OOO/superscalar CPU space. Both architectures expose single-threaded program model to the high-level programmer. VLIW's approach is that hardware provides the parallel substrate and relies on an amazing (and sometimes non-existent) compiler to discover the ILP, while an OOO/superscalar processor does that in silicon. OOO/superscalars won that battle handily and dominated the general-purpose compute space, while VLIWs stayed in their niches (e.g., image-processing processors).

So why do I expect things to be different this time? The simple answer is necessity. We desperately need that efficiency, and the foundries are running out of tricks to play with Mother Physics.

DegustatoR · Jan 31, 2022

Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.

neckthrough · Jan 31, 2022

DegustatoR said:
Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.

Certainly. But I was arguing that the asymmetries of hardware are going to be revealed to more "mainstream" datacenter applications as well, not just HPC.

trinibwoy · Jan 31, 2022

DegustatoR said:
Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.

I agree. There’s little benefit for HPC. The real win will be for games where the programming model is not multi-GPU friendly. Maybe we don’t need it that soon and we can keep maxing out reticle limits on the next process node for a few more years.

troyan · Feb 7, 2022

Videocardz points to a chinese forum post that Hopper could have 140b transistors: https://videocardz.com/newz/nvidia-hopper-gpu-to-rumored-to-feature-140-billion-transistors

Nvidia Hopper Speculation, Rumours and Discussion

CarstenS

Moderator

DegustatoR

DavidGraham

Rootax

DegustatoR

DavidGraham

Granath

Qesa

Bondrewd

DegustatoR

Jawed

DegustatoR

Bondrewd

trinibwoy

Meh

Bondrewd

neckthrough

DegustatoR

neckthrough

trinibwoy

Meh

troyan