NVIDIA COPA - Composable On-Package Architecture

Jawed · Jun 27, 2021

This seems to be purely for data centre compute:

2104.02188.pdf (arxiv.org)

"In this work, we demonstrate that diverging architectural requirements between the HPC and DL application domains put converged GPU designs on a trajectory to become significantly under-provisioned for DL and over-provisioned for HPC. We propose a new composable GPU architecture that leverages emerging circuit and packaging technologies to provide specialization, while maintaining substantial compatibility across product lines. We demonstrate that COPA-GPU architectures can enable selective deployment of on-package cache and off-chip DRAM resources, allowing manufacturers to easily tailor designs to individual domains."

So the focus here is that cache and memory controllers are interchangeable, collectively called Memory System Module. They attach using custom links to the GPU Module (GPM).

Along the way I learnt that 826mm² appears to be the current reticle limit.

xpea · Jun 28, 2021

Oh nice find !

That's what NV came up with as they struggle to abandon the GPU culture they created on the enterprise market. On a time of specialized accelerators (GPU, AI/ML, FPGA, DPU/IPU, VPU), it's more and more difficult to make one arch to perform well on every workload. As I said on the other topic, pure AI/ML players put more and more pressure on NVDA dominance and some compute gurus inside green team are pushing for disruptive pure AI silicon.

BTW I like this diagram :

Good summary of packages options available for next gen silicon

trinibwoy · Jun 28, 2021

It’s strange that Nvidia acknowledges the obvious deficiencies of a jack-of-all-trades design yet insists on reusing the same fundamental compute architecture for both HPC and DL. Their DL competition is highly customized in both compute and memory systems and this proposal only addresses the latter. Seems like a losing bid.

CarstenS · Jun 28, 2021

I guess it all comes down to finding the right point in time when paths must diverge. Like they used to sell their big iron chips in consumer GeForces for a couple of years until GM200 and with P100 went to a diverged approach for consumer and HPC - with the HPC part retaining the ability to be sold as a high-end consume part, like a gen later in Titan V.

Deleted member 13524 · Jun 28, 2021

So their plan is to get more flexible in the memory department to address different DL and HPC markets, but not to split the actual GPU development?

I wonder how many people are using the GA100 for graphics, considering it still has a whopping 128 ROPs and 864 TMUs. At least they didn't put RT cores in it.

CarstenS · Jun 28, 2021

I think in this patent, they are exploring the possibilities for a flexible memory interface/configuration. Once you got that down, you can design all your (large, expensive, MCM) GPUs with those interfaces and put inbetween whatever is all the rage of the day.

Bondrewd · Jun 28, 2021

Jawed said:
Along the way I learnt that 826mm² appears to be the current reticle limit.

858mm^2 actually.

Jawed · Jun 28, 2021

trinibwoy said:
It’s strange that Nvidia acknowledges the obvious deficiencies of a jack-of-all-trades design yet insists on reusing the same fundamental compute architecture for both HPC and DL. Their DL competition is highly customized in both compute and memory systems and this proposal only addresses the latter. Seems like a losing bid.

NVidia is pretty safe with its software infrastructure for, let's say, five years.

This snippet is pretty worrying:

"Figure 12 shows that doubling and quadrupling the number of baseline GPU-N instances (2× GPU-Ns and 4× GPU-Ns) results in mean 29% and 43% performance gains respectively for our training workloads. We find that a DL-optimized HBML+L3 COPA-GPU configuration (with 27% performance gain) provides similar levels of performance to 2× GPU-Ns, yet should cost significantly less than buying and hosting 2× larger installations of traditional GPU-Ns.

[...]

HBML+L3 integrates 1.6× more HBM memory, resulting in total aggregate cost lower than 2× of GPU-N. Thus, DL-optimized COPA-GPUs will provide substantially better cost-performance at scale, saving on not just overall GPU cost but additional system-level collateral such as datacenter floorspace, CPUs, network switches, and other peripheral devices."

It demonstrates that in DL, scaling by using more GPUs is almost a dead-end - and you can bet NVidia's customers have noticed. Sure, putting 1GB of L3 cache and much more HBM bandwidth is a radical, difficult change, but NVidia's competitors are doing radical difficult things too.

Much like patent documents are usually a sliver of the future, because the products are way more complex than any single document can hint, I think it's safe to assume that NVidia is also planning to do far more radical things inside the GPM.

Perhaps processing in memory is where DL is headed.

Deleted member 13524 · Jun 28, 2021

Jawed said:
Perhaps processing in memory is where DL is headed.

Context for your post:
https://forum.beyond3d.com/threads/samsung-hbm-pim-processing-in-memory.62280/
https://www.tomshardware.com/uk/news/samsung-hbm2-hbm-pim-memory-tflops

Jawed · Jul 21, 2021

So, erm, Hopper being "chiplet" based might be where we start to see COPA?:

https://twitter.com/x/status/1417732487803924481

Thread indicates that there's a growing consensus that Hopper is data centre (Lovelace is gaming).

Videocardz's spin-off article refers back to tweets from May about the configuration of GH100:

NVIDIA Hopper GPU rumored to tape out soon - VideoCardz.com

So I suppose we now need to be aware of NVidia codenames for products that use the "composable" concepts. Hopper might be the architecture of the chiplets, but something else might be the name of the AI accelerator and something else again might be the name of the non-AI accelerator.

So a collection of codenames related to Hopper could be the first real clue that it is at the centre of COPA-based family of products.

Of course, it might be too early for anything COPA-based.

CarstenS · Dec 15, 2021

Since Hopper, GH100 and the paper linked above are in the tweets all over again, here's a quote from the paper:
"We then forward-project the hardware capabilities of a hypothetical next-GPU configuration (GPU-N) using evolutionary scaling. We calculate the compute and memory bandwidth of GPU-N by linearly extrapolating these parameters from V100 to A100."

troyan · Jan 7, 2022

nVidia published more information about their COPA research: https://dl.acm.org/doi/10.1145/3484505#d1e1405

Nextplattform has a summary: https://www.nextplatform.com/2022/0...s-a-course-to-multiple-multichip-gpu-engines/

CarstenS · Jan 7, 2022

From a first glance over it, it looks like the same publication as in the first post, which in turn means the information is from april 5th, 2021: https://arxiv.org/pdf/2104.02188.pdf

Jawed · Jan 7, 2022

Yes, identical URLs.

NVIDIA COPA - Composable On-Package Architecture

Jawed

xpea

trinibwoy

Meh

CarstenS

Moderator

Deleted member 13524

Guest

CarstenS

Moderator

Bondrewd

Jawed

Deleted member 13524

Guest

Jawed

CarstenS

Moderator

troyan

CarstenS

Moderator

Jawed