AMD Execution Thread [2023]

del42sa · Aug 15, 2023

trinibwoy said:
Maybe try getting two compute dies to work together before trying a few dozen? Just a thought.

maybe it was suposed for high end RDNA3 with two GCD, which never materialized... so I guess RDNA4 is/was more advanced in that matter.
Perhaps somebody knows more.

anyway , I must admit this NAVI 4C experiment was a step too far, that's why they canceled it.
The thing was too hard to get working with too many chiplets.... and I would expect RDNA5 chiplets with a bit simplified design

DmitryKo · Aug 15, 2023

Bondrewd said:
Those are two GPUs on one OAM.

Doh. AMD still calls it 'GCD' though.

So it's one MCM package, but ROCm drivers present these GCDs as two GPUs with two separate device memories, and it's HIP/OpenCL application's job to enumerate the devices and to schedule the workload across multiple GPUs...

https://www.amd.com/en/technologies/cdna2

AMD Instinct™ MI200 is built on advanced packaging technologies, enabling two GCDs to be integrated into a single package in the OAM (OCP Accelerator Module) form factor in the MI250 and MI250X products... employing AMD’s unique Infinity Fabric to extend the on-die fabric across the package so that each GCD appears as a GPU in one shared memory system.

https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-mi200-memory-space-overview/

The MI250 and MI250X GPUs are OCP Accelerator Modules (OAMs) comprised of two GCDs with 128 GB of total memory but are presented to software as two unique devices with separate 64GB blocks of VRAM

BTW didn't know they even developed a new mezzanine card form factor, OAM (OpenCompute Accelerator Module): https://www.opencompute.org/projects/open-accelerator-infrastructure

Each server blade has a Universal Base Board hosting up to eight 102x170 mm OAM cards, each with top/bottom stiffener plates, two 688 pin Molex Mirror Mezz 15x11 connectors (176 shielded differential pairs each, up to 7 PCIe 6.0 x16 links in total), 48-59 V power, air or liquid cooled (600 W to 1000W TBP), reference heatsink and coldplate designs... could be employed in high-end PCs that make a $7500 Apple Mac Pro look affordable!

del42sa said:
this NAVI 4C experiment was a step too far

Isn't that what you would expect from chiplets in a vertically stacked MCM package?

del42sa · Aug 15, 2023

Moving on, a different source illustrated that N41, the purported flagship RDNA 4 GPU, has suffered a ton of “random issues”. While the source didn't state these random issues as the sole reason behind the chip’s cancelation, it is reasonable to assume so.

https://www.notebookcheck.net/High-...g-early-performance-projections.740547.0.html

del42sa · Aug 15, 2023

DmitryKo said:
Isn't that what you would expect from chiplets in a vertically stacked MCM package?

I think, most people would be grateful for two functional GCD mcm design (at least for this generation) in contrast for some ubercomplicated "AMD Ponte Vecchio" cool looking esque....

DmitryKo · Aug 15, 2023

del42sa said:
most people would be grateful for two functional GCD mcm design

Two GCDs in an MCM package would still be two separate GPUs, and DXGi / WDDM 2.0 only has explicit multi-adapter, either as several separate devices (i.e. iGPU and dGPU) or several 'nodes' in the primary device (for identical discrete GPUs).

Explicit Multi-GPU with DirectX 12 – Control, Freedom, New Possibilities

developer.nvidia.com

How many Direct3D 12 applications support SLI-style alternate-frame rendering, besides a few demos and code samples?

Remember they also have 'Navi4M' design which is supposedly a monolithic GCD in Navi 43, and its integration into larger MCM modules was cancelled as well, though the single GCD product is still on track. So the problem wasn't in the number of chiplets they divided the graphics processor die into, but likely with control and coherence protocols...

Bondrewd · Aug 15, 2023

del42sa said:
The thing was too hard to get working with too many chiplets

That wasn't the issue.

del42sa said:
and I would expect RDNA5 chiplets with a bit simplified design

It's even more complex.

del42sa said:
I think, most people would be grateful for two functional GCD mcm design (at least for this generation) in contrast for some ubercomplicated "AMD Ponte Vecchio" cool looking esque....

Oh no, that will never work.

del42sa · Aug 15, 2023

Bondrewd said:
That wasn't the issue.

It's even more complex

so what was the issue then ?

cross my fingers with RDNA5 development , AMD customers deserve better than another disappointment ....

Bondrewd · Aug 15, 2023

del42sa said:
o what was the issue then

Our favourite legacy graphics APIs are weirdly serial in places and are hard to map and validate with distributed FF tiled N4x had.
It's a one-time hell for validation people and they've picked RDNA5 for that.

del42sa said:
cross my fingers with RDNA5 development

We'll see.
Fundamentally the hard parts of getting the basic tiling concept right are done by MI300 which is why tiled N4x was a thing (and why Venice is what it is).

del42sa said:
AMD customers deserve better than another disappointment ....

There are no customers, their mss will be in single digits soon.
It's a bet on snapping necks hard enough and getting notoriety before selling anything.

Seanspeed · Aug 15, 2023

DmitryKo said:
Two GCDs in an MCM package would still be two separate GPUs, and DXGi / WDDM 2.0 only has explicit multi-adapter, either as several separate devices (i.e. iGPU and dGPU) or several 'nodes' in the primary device (for identical discrete GPUs).

Why couldn't they present multiple GCD's to the operating system as one GPU?

CPU's are much more complex and yet AMD have managed exactly this with CPU's.

Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.

Bondrewd · Aug 15, 2023

Seanspeed said:
Why couldn't they present multiple GCD's to the operating system as one GPU?

IFIS between the two is very thin (like 400GB/s bidir).

Seanspeed said:
CPU's are much more complex and yet AMD have managed exactly this with CPU's.

Lower bandwidth requirements by a country mile.

Seanspeed said:
Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.

Yes that's the idea.

del42sa · Aug 15, 2023

But there is no official confirmation that RNDA 4 high end GPU's are even cancelled. Its just a rumour still, right ?
Could be that "4C" multi-die package that was "cancelled" is just one of experimental chips that is being tested and possibly prepared for RDNA5 ? Perhaps people wrongly interpreting what is leaked?
No one has mentioned that "4X" chip is cancelled, right? "4X" could be Navi 41 and 42 ? We could stil get improved version of current GCD+6xMCD ?

Bondrewd · Aug 15, 2023

del42sa said:
But there is no official confirmation that RNDA 4 high end GPU's are even cancelled.

They are.

del42sa said:
Could be that "4C" multi-die package that was "cancelled" is just one of experimental chips that is being tested and possibly prepared for RDNA5 ?

No that was the halo.

del42sa said:
"4X" could be Navi 41 and 42 ?

No.

del42sa said:
We could stil get improved version of current GCD+6xMCD ?

No.
You get two tiny single dies for RDNA4.
That's it.

function · Aug 15, 2023

Bondrewd said:
No.
You get two tiny single dies for RDNA4.
That's it.

How tiny is tiny?

That might not be so awful, so long as these mainstream parts finally have massively improved RT capabilities beyond RDNA2. AMD are going to be taking RT seriously for RDNA4 ... right?

Seanspeed · Aug 15, 2023

It needs to be reminded that Bondrewd, despite posting as if he's an authority and has direct inside sources, admits that he doesn't really know what he purports to know when it comes down to it, and on multiple occasions has struggled(or disappeared) when it comes time to explain why he was very wrong about what he said would happen.

He's not an authority, but he's very good at pretending he is through very confidently worded posts.

Bondrewd · Aug 15, 2023

function said:
How tiny is tiny?

They've been pretty consistent at keeping their tiny dies sub 250mm^2 so there's that.

Seanspeed said:
admits that he doesn't really know what he purports to know when it comes down to it

que.

Seanspeed said:
has struggled(or disappeared) when it comes time to explain why he was very wrong about what he said would happen.

The product forum was quarantined off and no I'm not posting the 2022 FAD RDNA3 slide again (because yes they've missed, not even close to 50% ppw bump, wait for STX1 iGP results to see if they managed to fix the thing).

Seanspeed said:
He's not an authority

no one's ever said I am.
I just know what I know.

Seanspeed said:
but he's very good at pretending he is through very confidently worded posts.

que.

trinibwoy · Aug 16, 2023

Bondrewd said:
Our favourite legacy graphics APIs are weirdly serial in places and are hard to map and validate with distributed FF tiled N4x had. It's a one-time hell for validation people and they've picked RDNA5 for that.

Do they not simulate these chips in software first? They would be crazy to design a multi GCD architecture without validating the concept in software many years earlier.

Bondrewd · Aug 16, 2023

trinibwoy said:
Do they not simulate these chips in software first?

Of course they do; everyone does.

trinibwoy said:
They would be crazy to design a multi GCD architecture without validating the concept in software many years earlier.

On paper everything works fine.
Then you can get RDNA3.

DavidGraham · Aug 16, 2023

Seanspeed said:
Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.

If it's indeed one single GPU to the OS, then we still need to see how it performs in real workloads, or if it suffers from any latency related problems (like Zen CPUs suffered from at the beginning).

DmitryKo · Aug 16, 2023

Seanspeed said:
Why couldn't they present multiple GCD's to the operating system as one GPU?

It's not about assigning PnP Device IDs - no matter if the OS sees two separate devices, or a single device with several subfunctions, these are still two separate graphics processors connected on the host CPU's PCIe bus (or some internal data bus that mimics PCIe protocol), each with separate register file and local VRAM.

If you load shader code and geometry/texture date into local memory and registers on the first GPU, this code / data will not auto-magically appear on a second GPU. It's Direct3D/Vulkan/OpenCL etc. application's responsibility to split the workload, and doing it optimally has never been trivial.

CPU's are much more complex and yet AMD have managed exactly this with CPU's.

Well, consumer CPUs are actually less complex these days, but ATI/AMD did make multi-chip graphics cards like Radeon HD 7990 (2013) and R9 295x2 (2014).

AMD still supports CrossFire (CrossFireX / XDMA generation GPUs only) even in relatively recent Adrenalin drivers - you can install two identical PCIe graphics cards (i.e. same GPU model and video RAM size) then enable CrossFire in the Control Panel, and the driver will try to split the workload by rendering each alternate frame on a different GPU. Multi-chip graphics cards work the same way, though the GPUs are connected by on-board PCIe bridge chips or on-die PCIe bridges with multiple x16 links.

The downside is, CrossFire only works in Direct3D 9/10/11 and OpenGL applications which have CrossFire profiles (though recent drivers have an 'automatic' mode) - it doesn't work with Direct3D 12 (and Vulkan) applications at all, these have to use explicit multi-adapter APIs as described above, so you need to enable mGPU in the Control Panel to 'link' the two identical GPUs into a single adapter with two 'nodes'.

https://www.amd.com/en/support/kb/faq/dh-018
https://www.amd.com/en/support/kb/faq/dh2-018
https://www.amd.com/en/support/kb/faq/dh3-018

Either way, CrossFire has very little gains in recent Direct3D 11 games.

function said:
so long as these mainstream parts finally have massively improved RT capabilities beyond RDNA2

That could only happen if RX 8600 had ALU count that's comparable to RX 6900/7900 - I'm not really sure that's possible for a 200 mm2 die even on a 3 nm node.

Bondrewd said:
IFIS between the two is very thin (like 400GB/s bidir).

The question was about 'RDNA3 with two GCD', and Navi 31 GCDs would still be separate GPUs even if they had external Infinity Fabric links (which they don't).

Bondrewd · Aug 16, 2023

DavidGraham said:
then we still need to see how it performs in real workloads

Very well.
It's just expensive.

DavidGraham said:
or if it suffers from any latency related problems

It's a GPU.
It doesn't care.
see A100/H100 split L2 where the far side is always eww-tier.

AMD Execution Thread [2023]

del42sa

DmitryKo

del42sa

del42sa

DmitryKo

Explicit Multi-GPU with DirectX 12 – Control, Freedom, New Possibilities

Bondrewd

del42sa

Bondrewd

Seanspeed

Bondrewd

del42sa

Bondrewd

function

None functional

Seanspeed

Bondrewd

trinibwoy

Meh

Bondrewd

DavidGraham

DmitryKo

Bondrewd

Similar threads