AMD RDNA3 Specifications Discussion Thread

Subtlesnake · Jun 3, 2024

del42sa said:
I am not saying it´s bad, it was reaction to Bondrew claim, that on Computex we will see how AMD fixed frequency of RDNA3 with RDNA3.5 and it´s just 100Mhz over Ryzen 9 7940HS

https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-9-7940hs.html

Well if they managed to fit 33% more CUs in the same power envelope, the power savings had to come from somewhere. We should compare the 12 CU Strix Point with Phoenix at the same TDP.

Leoneazzurro5 · Jun 3, 2024

Well it's difficult to say if it's fixed or not, in a mobile part you are power and thermally limited but if we see a +33% more CU at same power with a slight frequency bump that could mean that we could see an improvement on higher powered parts as well, unfortunately we will never know that because it seems no RDNA3.5 desktop parts are coming.

Seanspeed · Jun 3, 2024

Subtlesnake said:
Well if they managed to fit 33% more CUs in the same power envelope, the power savings had to come from somewhere.

Like going from N5 to N4....

33% sounds like a lot here, but we're talking fairly small CU counts still. It's not gonna take a lot more to feed them.

So yea, hard to say for sure.

DegustatoR · Jun 3, 2024

Let's wait for benchmarks before judging perf/watt changes shall we? It's not like 7900X is always at its 170W power limit in actual applications, same goes for 7800X. Just setting 9000 parts TDPs lower doesn't mean that perf/watt will improve much.

Leoneazzurro5 · Jun 3, 2024

Seanspeed said:
Like going from N5 to N4....

33% sounds like a lot here, but we're talking fairly small CU counts still. It's not gonna take a lot more to feed them.

So yea, hard to say for sure.

Phoenix was already on N4.

Erinyes · Jun 3, 2024

Leoneazzurro5 said:
Phoenix was already on N4.

And even if we take additional improvements of N4P and DTCO, the power savings are in the range of 5-10% at best (At iso clock/performance). Strix point is probably memory bandwidth limited as well.

fellix · Jun 4, 2024

So, what would you prefer in the place of that meaty NPU -- more CUs or some amount of Infinity Cache?

no-X · Jun 4, 2024

As for the 2.9GHz clock - it tells absolutely nothing. These values are “up-to”, not real world clocks. The 2.8GHz iGPU in Phoenix runs at 1.4GHz in some form factors under certain workloads, so these ~3GHz numbers are more or less marketing values. No technological conclusions can be drawn from them.

Seanspeed · Jun 4, 2024

Leoneazzurro5 said:
Phoenix was already on N4.

Ah that does change things.

Erinyes · Jun 4, 2024

fellix said:
So, what would you prefer in the place of that meaty NPU -- more CUs or some amount of Infinity Cache?

View attachment 11421

Supposedly it was planned with 16MB Infinity cache at one point in its design phase but it was dropped later due to cost reasons. Would certainly have helped both power/performance and been more useful than the NPU for most consumers. Phoenix had just 2 MB L2, Strix Point should have at least 4 MB I would think (Intel has gone to 8 MB with Lunar Lake).

And Strix Halo has 32 MB Infinity cache as per rumoured specs.

del42sa · Jun 5, 2024

no-X said:
As for the 2.9GHz clock - it tells absolutely nothing. These values are “up-to”, not real world clocks. The 2.8GHz iGPU in Phoenix runs at 1.4GHz in some form factors under certain workloads, so these ~3GHz numbers are more or less marketing values. No technological conclusions can be drawn from them.

actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3

"baby clocks, RDNA3.5 gets a 500-600Mhz clock increase"
"RDNA4 clock about the same"

Subtlesnake · Jun 5, 2024

del42sa said:
actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3

500 MHz seems quite optimistic, but I would expect to see a clock speed increase at the TDP ranges these chips are operating at, all things being equal (12 CU vs 12 CU).

Erinyes · Jun 5, 2024

del42sa said:
actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3

Not really, you cannot say anything from "up to" specs between different generations. Unless we see an apples to apples comparison, say a 12CU vs 12CU in a GPU bound workload in the same exact chassis and TDP. Even then there will be some differences due to CPU but you would at least see if the GPU clocks are meaningfully higher.

Subtlesnake said:
500 MHz seems quite optimistic, but I would expect to see a clock speed increase at the TDP ranges these chips are operating at, all things being equal (12 CU vs 12 CU).

Actually in typical thin and light laptops which are configured to run sub 30W, there might not be enough power headroom for either to hit their max clocks. A chassis with a 50W+ power limit CPU might actually have enough headroom but those usually come with dGPUs anyway.

Though in the thin and light laptops, the 16CU part could operate at a lower frequency as that's more power efficient and still be faster. If RDNA 3.5 truly has a "fixed" V/F curve, it would be even better, though limited by memory bandwidth. I don't expect significant increases in iGPU performance until we get LPDDR6 (Strix Halo aside)

Subtlesnake · Jun 5, 2024

Erinyes said:
Actually in typical thin and light laptops which are configured to run sub 30W, there might not be enough power headroom for either to hit their max clocks. A chassis with a 50W+ power limit CPU might actually have enough headroom but those usually come with dGPUs anyway.

I was talking about clocks at a given TDP level, not max clocks. I think we're saying the exact same thing.

Lurkmass · Jul 4, 2024

[AMDGPU] Add no return image_sample intrinsics and instructions by perlfu · Pull Request #97542 · llvm/llvm-project

An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to th...

github.com

So that must be what their compiler implementation for sampler feedback looks like. An image resource descriptor modifier in it's field which alters the behaviour of the image instructions ...

Seanspeed · Jul 15, 2024

Not a whole lot, but some info on RDNA3.5:

The AMD Zen 5 Microarchitecture: Powering Ryzen AI 300 Series For Mobile and Ryzen 9000 for Desktop

www.anandtech.com

techuse · Jul 15, 2024

This will only materialize in APU form right?

Seanspeed · Jul 15, 2024

techuse said:
This will only materialize in APU form right?

Well Strix Halo is potentially chiplet-based, but yes, there's no word on this ever coming to discrete GPU products.

Frenetic Pony · Jul 15, 2024

techuse said:
This will only materialize in APU form right?

Yep

Also Strix Point doesn't scale well for (GPU) clockspeed because it gets bandwidth restricted almost immediately. Makes me believe that rumor it was going to have 16mb of cache for the GPU, but AMD made a quick turn around to a huge NPU after Microsoft pressured everyone into putting "AI" in instead.

At least Strix Halo has a shared systems level cache, 32mb apparently, so it'll scale much better if someone wants it in a NUC/Mac Studio kind of form factor.

Newguy · Jul 15, 2024

Frenetic Pony said:
Also Strix Point doesn't scale well for (GPU) clockspeed because it gets bandwidth restricted almost immediately. Makes me believe that rumor it was going to have 16mb of cache for the GPU, but AMD made a quick turn around to a huge NPU after Microsoft pressured everyone into putting "AI" in instead.

Tangential but this is why I'm interested in how LPDDR6 CAMM2 does. 192 bit wide bus per module and approximate max of 2x MT/s vs DDR5/LPDDR5, potentially 3x bandwidth compared to most dual channel solutions today. 17.6GT/s, 192 bit bus = 422.4GB/s, 14.4GT/s = 345.6GB/s, both of which will drastically raise the ceiling on iGPU performance. A big laptop and handheld bottleneck reduced considerably at least momentarily

AMD RDNA3 Specifications Discussion Thread

Similar threads