Speculation and Rumors: AMD RDNA4 ...

CarstenS · Dec 2, 2022

tunafish said:
As I understand it, the big difference is that GCD contains the command processor, while the SED is just arrays of shaders, and command processor is on the base die.

Yep, that seems to be the differentiation in this instance. But those are just made-up names, not general conventions or universal standards, hence my „however you may call those… “.

But this patent seems to focus on scaling up performance, since adding an active interposer does add cost as well. You don't do that on a mid-range class card (yet).

Bondrewd · Dec 2, 2022

tunafish said:
There's of course the obligatory caveat that just because a company patents something doesn't mean that they are building it any time soon, or ever.

Oh, the RDNA3 CU patent was posted here like a zillion times over and it was very real so...

techuse · Dec 2, 2022

After how disappointing RDNA 3 architecture is I expect only minor and iterative changes for RDNA 4. Multi compute die is a pipe dream IMO. Still seems an insurmountable problem for graphics. Maybe we will see the return of HBM or at least stacked GDDR to break the bandwidth wall we are stuck at. Infinity cache is useful but it doesn't negate the need for continued bandwidth scaling.

Bondrewd · Dec 2, 2022

techuse said:
Multi compute die is a pipe dream IMO.

That's exactly what you're getting first in MI300, then in Navi41.

techuse said:
I expect only minor and iterative changes for RDNA 4

Gfx12 so anything but that.

techuse · Dec 2, 2022

Bondrewd said:
That's exactly what you're getting first in MI300, then in Navi41.
Gfx12 so anything but that.

Both of these aspects were hyped for RDNA 3 too and reality ended up being quite the opposite.

Just to clarify I think it’s a pipe dream for consumer graphics.

Bondrewd · Dec 2, 2022

techuse said:
Both of these aspects were hyped for RDNA 3 too and reality ended up being quite the opposite

RDNA3 is amazing outside of N31 clocking way short of pre-Si projections due to reasons.

techuse said:
Just to clarify I think it’s a pipe dream for consumer graphics

No one cares what you think, the product is a true chiplet ™ design for all RDNA4 parts.
Just the way things are.

Kaotik · Dec 2, 2022

techuse said:
After how disappointing RDNA 3 architecture is I expect only minor and iterative changes for RDNA 4. Multi compute die is a pipe dream IMO. Still seems an insurmountable problem for graphics. Maybe we will see the return of HBM or at least stacked GDDR to break the bandwidth wall we are stuck at. Infinity cache is useful but it doesn't negate the need for continued bandwidth scaling.

GDDR6W gives you 1.4TB/s on 8 chips (512-bit, 64bit per chip (but still probably 16 bit real channels, there's just 4 memory dies inside the chip instead of 2 in GDDR6))

Bondrewd · Dec 2, 2022

Kaotik said:
GDDR6W gives you 1.4TB/s on 8 chips (512-bit

Ain't no one ever making 512b buses anymore.

Kaotik · Dec 2, 2022

Bondrewd said:
Ain't no one ever making 512b buses anymore.

Samsung disagrees in their press release at least

Samsung Electronics is proceeding standardization for GDDR6W products. It has also announced that it will expand the application of GDDR6W to small form factor devices such as notebooks as well as new high-performance accelerators used for AI and HPC applications, through cooperation with its GPU partners.

Of course it's possible one would just put 4 chips for 256bit but still. Don't know how big of a difference it would be that the 512-bit bus would be to just 8 chips, not 16 like before.

Bondrewd · Dec 2, 2022

Kaotik said:
Samsung disagrees in their press release at least

Samsung loves lying like every other non-Micron DRAM vendor so...

Kaotik said:
would be that the 512-bit bus would be to just 8 chips, not 16 like before.

Packages aren't the issue, wires are.
Running single-ended DDR wires at those insane data rates is quite an ordeal.

Frenetic Pony · Dec 2, 2022

What is GDDR6W for anyway? Battlemage, is GDDR7 going to be late, workstation GPUs?

Bondrewd · Dec 2, 2022

Frenetic Pony said:
What is GDDR6W for anyway?

Laptops, mostly.
It's a denser package that does denser package things.

del42sa · Dec 3, 2022

Bondrewd said:
the product is a true chiplet ™ design for all RDNA4 parts.

I wonder how they cope with much much higher data flow between those compute chiplets as lot of wires between shader engines leads to increased power consumption, right ? Isn´t it a reason why they didn´t go for mlultiple GDC in RDNA3 because it wasn´t feasible as Sam Naffziger hinted in the video from Gamers Nexus ?

software part would be interesting though, make two or more shader chiplets as one single GPU to the driver

Jawed · Dec 3, 2022

del42sa said:
I wonder how they cope with much much higher data flow between those compute chiplets as lot of wires between shader engines leads to increased power consumption, right ? Isn´t it a reason why they didn´t go for mlultiple GDC in RDNA3 because it wasn´t feasible as Sam Naffziger hinted in the video from Gamers Nexus ?

software part would be interesting though, make two or more shader chiplets as one single GPU to the driver

SEDs should be sharing almost no data amongst themselves. Global atomics and vertex workloads would involve shared traffic.

Vertices, once shaded and ready for primitive assembly need to end up at the shader engines that will do rasterisation. In some cases that will be multiple shader engines as the resulting triangles will touch multiple screen space tiles. The decision over which tiles to send the triangles is taken after coarse rasterisation has been performed. Then each tile's shader engine does fine rasterisation.

Algorithms where the GPU generates its own work will also involve SEDs sending work data to each other...

AID to AID connections will presumably be a scaled-up version of what we see between RDNA 3's GCDs and each MCD. Clearly three AIDs means that the two AIDs at each end will require two hops for roughly one-third of their memory requests... Clearly Infinity Cache within each AID plays a big part.

Bondrewd · Dec 3, 2022

del42sa said:
I wonder how they cope with much much higher data flow between those compute chiplets as lot of wires between shader engines leads to increased power consumption, right ?

You just use SoIC like they do for 3D V$ already.

JoeJ · Dec 3, 2022

del42sa said:
I wonder how they cope with much much higher data flow between those compute chiplets

Well, we never got any options to communicate across compute workgroups, other than using VRAM.
Thus we are already used to minimize such communication, because we can assume it's prohibitively slow.
It's how GPGPU worked since day one. Nothing changes. For pixel and vertex shaders there is no general way to communicate with other threads in the same group, even.

Jawed has mentioned rasterization details.
Besides, if there were global ray reordering in HW (noboby does this afaict), moving rays across chiplets would become an issue. Just to mix in some hypothetical speculations.

So the only data flow across compute chiplets i see is to get work from some global queue, and eventually steal / redistribute work across chiplets.
But that's very little data compared to the flow happening on doing the actual work, which they have already solved with RDNA3. Basically just an index and a count per work item, for example.
Plus some context on the workloads, some synchronization primitives, etc. Seems peanuts.

So, being an amateur about HW, my assumption actually is: It should be easy to make a compute chiplet GPU iterating over RDNA3, which already covered the real BW problems.

trinibwoy · Dec 3, 2022

Jawed said:
I don't see chiplets ever being more than a one-off advantage, whichever variants of "chiplet" and "stacking" are involved, and however many stages of evolution they go through for GPUs, specifically.

There's no reason to expect NVidia to be more than one generation late with chiplets - it was one generation later with a monster on-die cache.

Maybe RDNA 4 is where this happens:

DIE STACKING FOR MODULAR PARALLEL PROCESSORS - ADVANCED MICRO DEVICES, INC.

<div p-id="p-0001">A multi-die parallel processor semiconductor package includes a first base IC die including a first plurality of virtual compute dies 3D stacked on top of the first base IC die. A f

www.freepatentsonline.com

So we have:

active interposer die (AID) - graphics control processor and last level cache

shader engine die (SED) - corresponds with shader engines seen in current RDNA GPUs

multimedia and I/O die (MID) - crap that belongs on a cheap process.

There is no GCD as such in this design.

That certainly looks like the future. It would be a little bizarre for chip packaging to advance this far and still have to settle for off package VRAM.

hkultala · Dec 4, 2022

techuse said:
After how disappointing RDNA 3 architecture is I expect only minor and iterative changes for RDNA 4. Multi compute die is a pipe dream IMO. Still seems an insurmountable problem for graphics. Maybe we will see the return of HBM or at least stacked GDDR to break the bandwidth wall we are stuck at. Infinity cache is useful but it doesn't negate the need for continued bandwidth scaling.

Architecture disappointing?

I think you got things totally mixed up.

The architecture got much more improvements than expected, but because the leaked numbers were misinterpreted and bad speculation was based on those misintepreted numbers, people had unrealistic expectations.
And then when that speculation did prove to be false, people were disappointed.

To me, the disappointing part is not the architecture. The disappointing part is exactly the other way: that because of all those changes in the architecture, they just did not increase their core counts enough.

Bondrewd · Dec 4, 2022

hkultala said:
they just did not increase their core counts enough.

N31 is skinny because it's designed to run 800MHz faster than it does now.

DegustatoR · Dec 4, 2022

Kaotik said:
GDDR6W gives you 1.4TB/s on 8 chips (512-bit, 64bit per chip (but still probably 16 bit real channels, there's just 4 memory dies inside the chip instead of 2 in GDDR6))

I honestly don't see how G6W is better than the old G6X available for two years now. Don't think it has a future.

Speculation and Rumors: AMD RDNA4 ...

CarstenS

Moderator

Bondrewd

techuse

Bondrewd

techuse

Bondrewd

Kaotik

Drunk Member

Bondrewd

Kaotik

Drunk Member

Bondrewd

Frenetic Pony

Bondrewd

del42sa

Jawed

Bondrewd

JoeJ

trinibwoy

Meh

DIE STACKING FOR MODULAR PARALLEL PROCESSORS - ADVANCED MICRO DEVICES, INC.

hkultala

Bondrewd

DegustatoR

Similar threads