AMD RDNA3 Specifications Discussion Thread

Well it's difficult to say if it's fixed or not, in a mobile part you are power and thermally limited but if we see a +33% more CU at same power with a slight frequency bump that could mean that we could see an improvement on higher powered parts as well, unfortunately we will never know that because it seems no RDNA3.5 desktop parts are coming.
 
Well if they managed to fit 33% more CUs in the same power envelope, the power savings had to come from somewhere.
Like going from N5 to N4....

33% sounds like a lot here, but we're talking fairly small CU counts still. It's not gonna take a lot more to feed them.

So yea, hard to say for sure.
 
Let's wait for benchmarks before judging perf/watt changes shall we? It's not like 7900X is always at its 170W power limit in actual applications, same goes for 7800X. Just setting 9000 parts TDPs lower doesn't mean that perf/watt will improve much.
 
As for the 2.9GHz clock - it tells absolutely nothing. These values are “up-to”, not real world clocks. The 2.8GHz iGPU in Phoenix runs at 1.4GHz in some form factors under certain workloads, so these ~3GHz numbers are more or less marketing values. No technological conclusions can be drawn from them.
 
So, what would you prefer in the place of that meaty NPU -- more CUs or some amount of Infinity Cache?

View attachment 11421

Supposedly it was planned with 16MB Infinity cache at one point in its design phase but it was dropped later due to cost reasons. Would certainly have helped both power/performance and been more useful than the NPU for most consumers. Phoenix had just 2 MB L2, Strix Point should have at least 4 MB I would think (Intel has gone to 8 MB with Lunar Lake).

And Strix Halo has 32 MB Infinity cache as per rumoured specs.
 
Last edited:
As for the 2.9GHz clock - it tells absolutely nothing. These values are “up-to”, not real world clocks. The 2.8GHz iGPU in Phoenix runs at 1.4GHz in some form factors under certain workloads, so these ~3GHz numbers are more or less marketing values. No technological conclusions can be drawn from them.
actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3

"baby clocks, RDNA3.5 gets a 500-600Mhz clock increase"
"RDNA4 clock about the same"
 
actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3
500 MHz seems quite optimistic, but I would expect to see a clock speed increase at the TDP ranges these chips are operating at, all things being equal (12 CU vs 12 CU).
 
actually it telling a lot, as the messenger promised 500 - 600 Mhz clock increase over RDNA3
Not really, you cannot say anything from "up to" specs between different generations. Unless we see an apples to apples comparison, say a 12CU vs 12CU in a GPU bound workload in the same exact chassis and TDP. Even then there will be some differences due to CPU but you would at least see if the GPU clocks are meaningfully higher.
500 MHz seems quite optimistic, but I would expect to see a clock speed increase at the TDP ranges these chips are operating at, all things being equal (12 CU vs 12 CU).
Actually in typical thin and light laptops which are configured to run sub 30W, there might not be enough power headroom for either to hit their max clocks. A chassis with a 50W+ power limit CPU might actually have enough headroom but those usually come with dGPUs anyway.

Though in the thin and light laptops, the 16CU part could operate at a lower frequency as that's more power efficient and still be faster. If RDNA 3.5 truly has a "fixed" V/F curve, it would be even better, though limited by memory bandwidth. I don't expect significant increases in iGPU performance until we get LPDDR6 (Strix Halo aside)
 
Actually in typical thin and light laptops which are configured to run sub 30W, there might not be enough power headroom for either to hit their max clocks. A chassis with a 50W+ power limit CPU might actually have enough headroom but those usually come with dGPUs anyway.
I was talking about clocks at a given TDP level, not max clocks. I think we're saying the exact same thing.
 

So that must be what their compiler implementation for sampler feedback looks like. An image resource descriptor modifier in it's field which alters the behaviour of the image instructions ...
 
This will only materialize in APU form right?
Yep

Also Strix Point doesn't scale well for (GPU) clockspeed because it gets bandwidth restricted almost immediately. Makes me believe that rumor it was going to have 16mb of cache for the GPU, but AMD made a quick turn around to a huge NPU after Microsoft pressured everyone into putting "AI" in instead.

At least Strix Halo has a shared systems level cache, 32mb apparently, so it'll scale much better if someone wants it in a NUC/Mac Studio kind of form factor.
 
Also Strix Point doesn't scale well for (GPU) clockspeed because it gets bandwidth restricted almost immediately. Makes me believe that rumor it was going to have 16mb of cache for the GPU, but AMD made a quick turn around to a huge NPU after Microsoft pressured everyone into putting "AI" in instead.
Tangential but this is why I'm interested in how LPDDR6 CAMM2 does. 192 bit wide bus per module and approximate max of 2x MT/s vs DDR5/LPDDR5, potentially 3x bandwidth compared to most dual channel solutions today. 17.6GT/s, 192 bit bus = 422.4GB/s, 14.4GT/s = 345.6GB/s, both of which will drastically raise the ceiling on iGPU performance. A big laptop and handheld bottleneck reduced considerably at least momentarily
 
Back
Top