NVIDIA COPA - Composable On-Package Architecture

Jawed

Legend
This seems to be purely for data centre compute:

2104.02188.pdf (arxiv.org)

"In this work, we demonstrate that diverging architectural requirements between the HPC and DL application domains put converged GPU designs on a trajectory to become significantly under-provisioned for DL and over-provisioned for HPC. We propose a new composable GPU architecture that leverages emerging circuit and packaging technologies to provide specialization, while maintaining substantial compatibility across product lines. We demonstrate that COPA-GPU architectures can enable selective deployment of on-package cache and off-chip DRAM resources, allowing manufacturers to easily tailor designs to individual domains."

So the focus here is that cache and memory controllers are interchangeable, collectively called Memory System Module. They attach using custom links to the GPU Module (GPM).

Along the way I learnt that 826mm² appears to be the current reticle limit.
 
Oh nice find !

That's what NV came up with as they struggle to abandon the GPU culture they created on the enterprise market. On a time of specialized accelerators (GPU, AI/ML, FPGA, DPU/IPU, VPU), it's more and more difficult to make one arch to perform well on every workload. As I said on the other topic, pure AI/ML players put more and more pressure on NVDA dominance and some compute gurus inside green team are pushing for disruptive pure AI silicon.

BTW I like this diagram :

GPM package options - 2021 Nvidia Researchi.png

Good summary of packages options available for next gen silicon
 
It’s strange that Nvidia acknowledges the obvious deficiencies of a jack-of-all-trades design yet insists on reusing the same fundamental compute architecture for both HPC and DL. Their DL competition is highly customized in both compute and memory systems and this proposal only addresses the latter. Seems like a losing bid.
 
I guess it all comes down to finding the right point in time when paths must diverge. Like they used to sell their big iron chips in consumer GeForces for a couple of years until GM200 and with P100 went to a diverged approach for consumer and HPC - with the HPC part retaining the ability to be sold as a high-end consume part, like a gen later in Titan V.
 
So their plan is to get more flexible in the memory department to address different DL and HPC markets, but not to split the actual GPU development?

I wonder how many people are using the GA100 for graphics, considering it still has a whopping 128 ROPs and 864 TMUs. At least they didn't put RT cores in it.
 
I think in this patent, they are exploring the possibilities for a flexible memory interface/configuration. Once you got that down, you can design all your (large, expensive, MCM) GPUs with those interfaces and put inbetween whatever is all the rage of the day.
 
It’s strange that Nvidia acknowledges the obvious deficiencies of a jack-of-all-trades design yet insists on reusing the same fundamental compute architecture for both HPC and DL. Their DL competition is highly customized in both compute and memory systems and this proposal only addresses the latter. Seems like a losing bid.
NVidia is pretty safe with its software infrastructure for, let's say, five years.

This snippet is pretty worrying:

"Figure 12 shows that doubling and quadrupling the number of baseline GPU-N instances (2× GPU-Ns and 4× GPU-Ns) results in mean 29% and 43% performance gains respectively for our training workloads. We find that a DL-optimized HBML+L3 COPA-GPU configuration (with 27% performance gain) provides similar levels of performance to 2× GPU-Ns, yet should cost significantly less than buying and hosting 2× larger installations of traditional GPU-Ns.

[...]

HBML+L3 integrates 1.6× more HBM memory, resulting in total aggregate cost lower than 2× of GPU-N. Thus, DL-optimized COPA-GPUs will provide substantially better cost-performance at scale, saving on not just overall GPU cost but additional system-level collateral such as datacenter floorspace, CPUs, network switches, and other peripheral devices."

It demonstrates that in DL, scaling by using more GPUs is almost a dead-end - and you can bet NVidia's customers have noticed. Sure, putting 1GB of L3 cache and much more HBM bandwidth is a radical, difficult change, but NVidia's competitors are doing radical difficult things too.

Much like patent documents are usually a sliver of the future, because the products are way more complex than any single document can hint, I think it's safe to assume that NVidia is also planning to do far more radical things inside the GPM.

Perhaps processing in memory is where DL is headed.
 
So, erm, Hopper being "chiplet" based might be where we start to see COPA?:


Thread indicates that there's a growing consensus that Hopper is data centre (Lovelace is gaming).

Videocardz's spin-off article refers back to tweets from May about the configuration of GH100:

NVIDIA Hopper GPU rumored to tape out soon - VideoCardz.com

So I suppose we now need to be aware of NVidia codenames for products that use the "composable" concepts. Hopper might be the architecture of the chiplets, but something else might be the name of the AI accelerator and something else again might be the name of the non-AI accelerator.

So a collection of codenames related to Hopper could be the first real clue that it is at the centre of COPA-based family of products.

Of course, it might be too early for anything COPA-based.
 
Since Hopper, GH100 and the paper linked above are in the tweets all over again, here's a quote from the paper:
"We then forward-project the hardware capabilities of a hypothetical next-GPU configuration (GPU-N) using evolutionary scaling. We calculate the compute and memory bandwidth of GPU-N by linearly extrapolating these parameters from V100 to A100."
 
Back
Top