AMD RDNA4 Architecture Speculation

Charlietus · Feb 28, 2025

Bondrewd said:
no such thing as UDNA.

What makes you think that? It makes sense that they would want to unify the architecture again.

Bondrewd · Feb 28, 2025

Charlietus said:
What makes you think that?

CDNA is its very own thing.
Lisa doesn't even pretend otherwise.

Charlietus said:
It makes sense that they would want to unify the architecture again.

best I can give you is adding MFMA into client parts.
You can't unify client and DC parts, since they're making client shader cores a lot less orthodox.

DegustatoR · Feb 28, 2025

Samwell said:
You're very naive to think, that there isn't a substantial price difference. Why should the suppliers even develop faster Ram, if they can't charge a premium?
Especially at the beginning with Samsung as only supplier Nvidia will pay a significant higher amount. This might change at the end of the year, when Micron and Hynix have their GDDR7 on the market too.

I wouldn't be surprised, if there wouldn't be a big margin difference comparing 5070Ti with the 9070XT at launch price levels. But Nvidia has the 5080 to increase their margin of GB203, while the 9070 decreases AMD total margin of the chip.

20-21 Gbps G6 isn't exactly slow or available from anyone either.

LordEC911 · Feb 28, 2025

DegustatoR said:
20-21 Gbps G6 isn't exactly slow or available from anyone either.

Mass production from both Samsung and SK Hynix for, at least, the last ~2.5years.

Charlietus · Feb 28, 2025

Subtlesnake said:
I mean the extra BoM on the PS5 Digital Edition is probably only $100 - $150. But it makes sense that Sony isn't going to do all that extra R&D for nothing.

With cheaper, I meant, cheap motherfuckers lol

Sony just wanted profit, didn't care about anything else

Bondrewd · Feb 28, 2025

DegustatoR said:
20-21 Gbps G6 isn't exactly slow or available from anyone either.

It's been MP'd for a long long long while, RDNA2 refresh shipped with it I think.

QPlayer · Feb 28, 2025

pjbliverpool said:
According to TPU FSR4 requires 779 AI TOPS

Where do these numbers come from? By comparison, how much does DLSS4 require?

trinibwoy · Feb 28, 2025

Bondrewd said:
the most quirk chungus part is OoO memory fills a-la Cortex A510.

Very nice. Surprising that this isn’t already a thing on GPUs given all of the handwringing over memory stalls for the past 10+ years.

raytracingfan · Feb 28, 2025

LordEC911 said:
Mass production from both Samsung and SK Hynix for, at least, the last ~2.5years.

If regular GDDR6 can deliver those speeds then what was the point of GDDR6X?

Kaotik · Feb 28, 2025

pjbliverpool said:
According to TPU FSR4 requires 779 AI TOPS which pretty much confirms it has very little to do with PSSR (which runs on the PS5 Pros 300 TOPs) and will hopefully be a much superior solution. Also the 9070 (non XT) offers almost 1200 TOPs or around 4x the PS5 Pros AI capability at raster levels which are presumably more like 50% higher, so clearly little to no architectural relation there either from an AI perspective.

As a product the 9070XT seems pretty exciting. ~4070Ti Super level performance for 75% of the price with what will hopefully be an upscaler comparable to DLSS 3 along with comparable frame gen capabilities. They even apparently have their own AI based denoiser in response to Ray Reconstruction. Hopefully it's competitive.

QPlayer said:
Where do these numbers come from? By comparison, how much does DLSS4 require?

TPU has somehow misunderstood that "Up to 779 TOPS AI-Acceleration via AMD RDNA 4 Architecture" in FSR 4 slide means it would need that, while in reality that's what RX 9070 XT has at FP8 precision used by FSR 4.

Bondrewd · Feb 28, 2025

raytracingfan said:
If regular GDDR6 can deliver those speeds then what was the point of GDDR6X?

G6X delivered them earlier.

LordEC911 · Feb 28, 2025

raytracingfan said:
If regular GDDR6 can deliver those speeds then what was the point of GDDR6X?

I would guess it was because Micron wanted a headstart moving away from NRZ and Nvidia wanted a clear roadmap to +20Gbps and better efficiency.

QPlayer · Feb 28, 2025

Kaotik said:
TPU has somehow misunderstood that "Up to 779 TOPS AI-Acceleration via AMD RDNA 4 Architecture" in FSR 4 slide means it would need that, while in reality that's what RX 9070 XT has at FP8 precision used by FSR 4.

Interesting, but one can assume that FSR4 would not run well on the previous generation. The TOPS value of the 9000 series is several times that of the 7000 series. Here is a significant improvement.

Kaotik · Feb 28, 2025

QPlayer said:
Interesting, but one can assume that FSR4 would not run well on the previous generation. The TOPS value of the 9000 series is several times that of the 7000 series. Here is a significant improvement.

IIRC AMD said they'll investigate if it can be brought to at least part of RX 7000 gen, but it will also be twice as heavy if they do (since they need to do it in FP16)

Sega_Model_4 · Feb 28, 2025

Bondrewd said:
CDNA is its very own thing.
Lisa doesn't even pretend otherwise.

best I can give you is adding MFMA into client parts.
You can't unify client and DC parts, since they're making client shader cores a lot less orthodox.

Were you in some cave?

AMD announces unified UDNA GPU architecture — bringing RDNA and CDNA together to take on Nvidia's CUDA ecosystem

Two become one.

www.tomshardware.com

Bondrewd · Feb 28, 2025

Sega_Model_4 said:
Were you in some cave?

that's an off-the-record remark that's not relevant.
MI400 is CDNA-next, listen to the last ER call.

Kaotik · Feb 28, 2025

RobertR1 said:
This all hinges on following through on msrp.

Only relative to what 5070/5070 Ti will actually be available for. For example in Finland currently cheapest actually available 5070 Ti is 25%+ over it's supposed MSRP.

raytracingfan · Feb 28, 2025

Kaotik said:
IIRC AMD said they'll investigate if it can be brought to at least part of RX 7000 gen, but it will also be twice as heavy if they do (since they need to do it in FP16)

If they can't get it running on the RDNA3/3.5 APUs, then there isn't much of a business case for that. Those APUs will continue to be manufactured and sold for years to come - unlike RX 7000 series, which has now ended and has a minuscule market share.

arandomguy · Feb 28, 2025

no-X said:
All "4" TSMC processes are in fact derived from N5.

This seems like a semantics argument.

TSMC N4 is (at least claimed by TSMC) to be an iterative node enhancement with density, efficiency and performance gains.

TSMC 4N despite the naming from all reporting is just a customization of TSMC N5.

trinibwoy · Mar 1, 2025

DegustatoR said:
GB203 has the same 64MB of L2 as N48 has for IC. The difference is just 8MBs of L2 on N48 which doesn't sound like a lot.

N48 also has 24MB in vector registers vs 21MB on GB203. And 6MB of WGP cache/LDS vs 10.5MB SM cache on GB203. Basically a wash in terms of on-chip storage.

One of the biggest differences between the architectures is scheduler to register ratio. Maximum thread occupancy is only 33% higher on N48 (16 vs 12) but it has 3x the register capacity per scheduler. In theory N48 should be much better at keeping its SIMDs fed with work on complex shaders.

AMD RDNA4 Architecture Speculation

Charlietus

Bondrewd

DegustatoR

LordEC911

Charlietus

Bondrewd

QPlayer

trinibwoy

Meh

raytracingfan

Kaotik

Drunk Member

Bondrewd

LordEC911

QPlayer

Kaotik

Drunk Member

Sega_Model_4

AMD announces unified UDNA GPU architecture — bringing RDNA and CDNA together to take on Nvidia's CUDA ecosystem

Bondrewd

Kaotik

Drunk Member

raytracingfan

arandomguy

trinibwoy

Meh

Similar threads