AMD Execution Thread [2023]

Status
Not open for further replies.
Maybe try getting two compute dies to work together before trying a few dozen? Just a thought.
maybe it was suposed for high end RDNA3 with two GCD, which never materialized... so I guess RDNA4 is/was more advanced in that matter.
Perhaps somebody knows more.

anyway , I must admit this NAVI 4C experiment was a step too far, that's why they canceled it.
The thing was too hard to get working with too many chiplets.... and I would expect RDNA5 chiplets with a bit simplified design
 
Last edited:
Those are two GPUs on one OAM.

Doh. AMD still calls it 'GCD' though.

So it's one MCM package, but ROCm drivers present these GCDs as two GPUs with two separate device memories, and it's HIP/OpenCL application's job to enumerate the devices and to schedule the workload across multiple GPUs...

AMD Instinct™ MI200 is built on advanced packaging technologies, enabling two GCDs to be integrated into a single package in the OAM (OCP Accelerator Module) form factor in the MI250 and MI250X products... employing AMD’s unique Infinity Fabric to extend the on-die fabric across the package so that each GCD appears as a GPU in one shared memory system.

https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-mi200-memory-space-overview/
The MI250 and MI250X GPUs are OCP Accelerator Modules (OAMs) comprised of two GCDs with 128 GB of total memory but are presented to software as two unique devices with separate 64GB blocks of VRAM


BTW didn't know they even developed a new mezzanine card form factor, OAM (OpenCompute Accelerator Module): https://www.opencompute.org/projects/open-accelerator-infrastructure

Each server blade has a Universal Base Board hosting up to eight 102x170 mm OAM cards, each with top/bottom stiffener plates, two 688 pin Molex Mirror Mezz 15x11 connectors (176 shielded differential pairs each, up to 7 PCIe 6.0 x16 links in total), 48-59 V power, air or liquid cooled (600 W to 1000W TBP), reference heatsink and coldplate designs... could be employed in high-end PCs that make a $7500 Apple Mac Pro look affordable!


this NAVI 4C experiment was a step too far
Isn't that what you would expect from chiplets in a vertically stacked MCM package?
 
Last edited:
Isn't that what you would expect from chiplets in a vertically stacked MCM package?
I think, most people would be grateful for two functional GCD mcm design (at least for this generation) in contrast for some ubercomplicated "AMD Ponte Vecchio" cool looking esque....
 
Last edited:
most people would be grateful for two functional GCD mcm design
Two GCDs in an MCM package would still be two separate GPUs, and DXGi / WDDM 2.0 only has explicit multi-adapter, either as several separate devices (i.e. iGPU and dGPU) or several 'nodes' in the primary device (for identical discrete GPUs).

How many Direct3D 12 applications support SLI-style alternate-frame rendering, besides a few demos and code samples?


Remember they also have 'Navi4M' design which is supposedly a monolithic GCD in Navi 43, and its integration into larger MCM modules was cancelled as well, though the single GCD product is still on track. So the problem wasn't in the number of chiplets they divided the graphics processor die into, but likely with control and coherence protocols...
 
The thing was too hard to get working with too many chiplets
That wasn't the issue.
and I would expect RDNA5 chiplets with a bit simplified design
It's even more complex.
I think, most people would be grateful for two functional GCD mcm design (at least for this generation) in contrast for some ubercomplicated "AMD Ponte Vecchio" cool looking esque....
Oh no, that will never work.
 
o what was the issue then
Our favourite legacy graphics APIs are weirdly serial in places and are hard to map and validate with distributed FF tiled N4x had.
It's a one-time hell for validation people and they've picked RDNA5 for that.
cross my fingers with RDNA5 development
We'll see.
Fundamentally the hard parts of getting the basic tiling concept right are done by MI300 which is why tiled N4x was a thing (and why Venice is what it is).
AMD customers deserve better than another disappointment ....
There are no customers, their mss will be in single digits soon.
It's a bet on snapping necks hard enough and getting notoriety before selling anything.
 
Two GCDs in an MCM package would still be two separate GPUs, and DXGi / WDDM 2.0 only has explicit multi-adapter, either as several separate devices (i.e. iGPU and dGPU) or several 'nodes' in the primary device (for identical discrete GPUs).
Why couldn't they present multiple GCD's to the operating system as one GPU?

CPU's are much more complex and yet AMD have managed exactly this with CPU's.

Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.
 
But there is no official confirmation that RNDA 4 high end GPU's are even cancelled. Its just a rumour still, right ?
Could be that "4C" multi-die package that was "cancelled" is just one of experimental chips that is being tested and possibly prepared for RDNA5 ? Perhaps people wrongly interpreting what is leaked?
No one has mentioned that "4X" chip is cancelled, right? "4X" could be Navi 41 and 42 ? We could stil get improved version of current GCD+6xMCD ?
 
But there is no official confirmation that RNDA 4 high end GPU's are even cancelled.
They are.
Could be that "4C" multi-die package that was "cancelled" is just one of experimental chips that is being tested and possibly prepared for RDNA5 ?
No that was the halo.
"4X" could be Navi 41 and 42 ?
No.
We could stil get improved version of current GCD+6xMCD ?
No.
You get two tiny single dies for RDNA4.
That's it.
 
No.
You get two tiny single dies for RDNA4.
That's it.
How tiny is tiny?

That might not be so awful, so long as these mainstream parts finally have massively improved RT capabilities beyond RDNA2. AMD are going to be taking RT seriously for RDNA4 ... right?
 
It needs to be reminded that Bondrewd, despite posting as if he's an authority and has direct inside sources, admits that he doesn't really know what he purports to know when it comes down to it, and on multiple occasions has struggled(or disappeared) when it comes time to explain why he was very wrong about what he said would happen.

He's not an authority, but he's very good at pretending he is through very confidently worded posts.
 
How tiny is tiny?
They've been pretty consistent at keeping their tiny dies sub 250mm^2 so there's that.
admits that he doesn't really know what he purports to know when it comes down to it
que.
has struggled(or disappeared) when it comes time to explain why he was very wrong about what he said would happen.
The product forum was quarantined off and no I'm not posting the 2022 FAD RDNA3 slide again (because yes they've missed, not even close to 50% ppw bump, wait for STX1 iGP results to see if they managed to fix the thing).
He's not an authority
no one's ever said I am.
I just know what I know.
but he's very good at pretending he is through very confidently worded posts.
que.
 
Our favourite legacy graphics APIs are weirdly serial in places and are hard to map and validate with distributed FF tiled N4x had. It's a one-time hell for validation people and they've picked RDNA5 for that.

Do they not simulate these chips in software first? They would be crazy to design a multi GCD architecture without validating the concept in software many years earlier.
 
Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.
If it's indeed one single GPU to the OS, then we still need to see how it performs in real workloads, or if it suffers from any latency related problems (like Zen CPUs suffered from at the beginning).
 
Last edited:
Why couldn't they present multiple GCD's to the operating system as one GPU?
It's not about assigning PnP Device IDs - no matter if the OS sees two separate devices, or a single device with several subfunctions, these are still two separate graphics processors connected on the host CPU's PCIe bus (or some internal data bus that mimics PCIe protocol), each with separate register file and local VRAM.

If you load shader code and geometry/texture date into local memory and registers on the first GPU, this code / data will not auto-magically appear on a second GPU. It's Direct3D/Vulkan/OpenCL etc. application's responsibility to split the workload, and doing it optimally has never been trivial.

CPU's are much more complex and yet AMD have managed exactly this with CPU's.
Well, consumer CPUs are actually less complex these days, but ATI/AMD did make multi-chip graphics cards like Radeon HD 7990 (2013) and R9 295x2 (2014).

AMD still supports CrossFire (CrossFireX / XDMA generation GPUs only) even in relatively recent Adrenalin drivers - you can install two identical PCIe graphics cards (i.e. same GPU model and video RAM size) then enable CrossFire in the Control Panel, and the driver will try to split the workload by rendering each alternate frame on a different GPU. Multi-chip graphics cards work the same way, though the GPUs are connected by on-board PCIe bridge chips or on-die PCIe bridges with multiple x16 links.

The downside is, CrossFire only works in Direct3D 9/10/11 and OpenGL applications which have CrossFire profiles (though recent drivers have an 'automatic' mode) - it doesn't work with Direct3D 12 (and Vulkan) applications at all, these have to use explicit multi-adapter APIs as described above, so you need to enable mGPU in the Control Panel to 'link' the two identical GPUs into a single adapter with two 'nodes'.

https://www.amd.com/en/support/kb/faq/dh-018
https://www.amd.com/en/support/kb/faq/dh2-018
https://www.amd.com/en/support/kb/faq/dh3-018

Either way, CrossFire has very little gains in recent Direct3D 11 games.


so long as these mainstream parts finally have massively improved RT capabilities beyond RDNA2
That could only happen if RX 8600 had ALU count that's comparable to RX 6900/7900 - I'm not really sure that's possible for a 200 mm2 die even on a 3 nm node.


IFIS between the two is very thin (like 400GB/s bidir).
The question was about 'RDNA3 with two GCD', and Navi 31 GCDs would still be separate GPUs even if they had external Infinity Fabric links (which they don't).
 
Last edited:
Status
Not open for further replies.
Back
Top