Nvidia Hopper Speculation, Rumours and Discussion

It featured two dies, but each die works as a separate GPU, essentially two cross-fire GPUs on a single PCB, with all of the associated problems with such confiuration. Not a true MCM die where all the chiplets work together as one coherent big GPU.

yeah, but it would be true for gaming GPU.
HPC consists of thousands of such interconnected nodes, so software can and must split workflow between them. It must, without it there won't be any supercomputer. It's a build-in feature. Two districts GPU doesn't looks as bad as you try to picture it.
 
I've seen a suggestion that there will be a consumer version of Hopper.

I do wonder whether Lovelace is actually a consumer GPU. There was a suggestion at one point that Lovelace is for a new Nintendo.

NVidia's "Ampere next" and "Ampere next next" games are certainly fun...
 
I've seen a suggestion that there will be a consumer version of Hopper.
I doubt that it would make much sense as a consumer GPU but then again who knows what they'll do against some $5000 competition product. A 10% win over a $1000 product at $5000 is considered a win in modern days, right?
 
Wake me up when software actually treats these things as a single GPU.
The single GPU abstraction would need to be created at some level in the software stack, because the hardware doesn't look like that any more. While providing that abstraction universally (e.g., in the driver) may be useful to get scaling for some software (e.g., legacy code), actually exposing the non-uniformity of the underlying hardware allows more sophisticated software to squeeze out efficiency. The way that silicon scaling is going, we should expect this trend to continue. It doesn't work for all workloads, but is tractable for some.

But I'll play devil's advocate for a second. We've seen this scenario (kinda) play out in the VLIW-vs-OOO/superscalar CPU space. Both architectures expose single-threaded program model to the high-level programmer. VLIW's approach is that hardware provides the parallel substrate and relies on an amazing (and sometimes non-existent) compiler to discover the ILP, while an OOO/superscalar processor does that in silicon. OOO/superscalars won that battle handily and dominated the general-purpose compute space, while VLIWs stayed in their niches (e.g., image-processing processors).

So why do I expect things to be different this time? The simple answer is necessity. We desperately need that efficiency, and the foundries are running out of tricks to play with Mother Physics.
 
Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.
 
Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.
Certainly. But I was arguing that the asymmetries of hardware are going to be revealed to more "mainstream" datacenter applications as well, not just HPC.
 
Single GPU abstraction doesn't make much sense to pursue for HPC applications as these are made to scale to 100s and 1000s of GPU dies anyway.
In fact it can be counter productive as the thing (s/w or h/w) making this abstraction can get in a way of code execution and reduce the transparency of what the system is actually doing.

I agree. There’s little benefit for HPC. The real win will be for games where the programming model is not multi-GPU friendly. Maybe we don’t need it that soon and we can keep maxing out reticle limits on the next process node for a few more years.
 
Back
Top