AMD Execution Thread [2023]

Status
Not open for further replies.

Christopher Rolland -- Susquehanna International Group -- Analyst

Thanks for the question. There was an article suggesting that you guys could be interested in doing some ARM-based CPUs. I guess I'd love any thoughts that you have there on that architecture for PC. But also Apple has their M3 out now.

It seems pretty robust. Qualcomm has an X Elite new chip. It was rumored NVIDIA might be doing that as well. Would love your expectations for this market.

And what does that mean for the TAM for AMD moving forward?

Lisa Su -- President and Chief Executive Officer

Yeah. Sure, Chris. Thanks for the question. So, look, the way we think about ARM, ARM is a partner in many respects so we use ARM throughout parts of our portfolio.

I think as it relates to PCs, x86 is still the majority of the volume in PCs. And if you think about sort of the ecosystem around x86 and Windows, I think it's been a very robust ecosystem. What I'm most excited about in PCs is actually the AI PC. I think the AI PC opportunity is an opportunity to redefine what PCs are in terms of productivity tool and really sort of operating on sort of user data.

And so, I think we're at the beginning of a wave there. We're investing heavily in Ryzen AI and the opportunity to really broaden sort of the AI capabilities of PCs going forward. And I think that's where the conversation is going to be about. It's going to be less about what instructions that you're using and more about what experience are you delivering to customers.

And from that standpoint, I think that we have a very exciting portfolio that I feel good about over the next couple of years.
AMD doesn't seem to have plans to release ARM processors for PC.
 
https://www.anandtech.com/show/2111...s-with-zen-4c-smaller-cores-bigger-efficiency

On AMD’s server processors where everything has been entirely above the board and fully detailed, AMD clearly publishes that none of the Zen 4c chips clock higher than 3.1GHz, some 1.3GHz (30%) slower than the fastest Genoa chip (9174F). However, for their consumer chips, the only clockspeeds AMD are disclosing are the max turbo clockspeed – which is for the regular Zen 4 core(s) – and then the base clockspeed for the entire chip. Which, in the case of the fastest 7545U, is 3.2GHz.
All of this is to say that, based on AMD’s disclosures thus far, all evidence points to Zen 4c not clocking much above 3GHz – and it’s not supposed to.
pretty low , huh ?
 
Last edited:
The entire trade off for Zen 4c is much less transistors but also much lower clocks.
Less area, but I would guess transistor count needs to be fairly similar.

It seems AMD is trying to push the 'power efficiency' angle here a bit harder than is probably warranted, when I think the main goal was clearly space efficiency - something only AMD themselves really benefits from in the consumer space in this specific case. There's next to no difference in performance at 15w and if you're doing anything that requires '6 core utilization', you're almost assuredly pushing beyond 15w.

Perhaps for handheld sort of devices or ultrathins or something it can make sense, but I'm guessing AMD is just internally more excited about being able to make smaller chips.
 
"FSR implemented in Samsung Galaxy" it's basic open source softrware, there'sb no "secret sauce" to implement. But let's see if there's more to it
Maybe collaborate to improve it.
 
Maybe collaborate to improve it.

That's how I would interpret "jointly develop".

Regards,
SB
Wasn't critiquing that, but the latter part of the tweet: "it is anticipated that FSR technology will be implemented in Samsung Galaxy along ray tracing in the future".
There's no technology to implement, they can run it already. Also they already have ray tracing.
 
AMD really need a hardware based approach to compete properly with DLSS at this stage. Although Qualcomm and Samsung have no standing in the desktop PC space, going forward the mobile and desktop gaming segments are likely to overlap further - so having a solution that is dependent on specific hardware but this hardware is found in a few different vendors GPUs could be enough overcome the issue they would have doing so otherwise with the small market share they have.

Of course the end result will be like Intel with XeSS where there is an hardware agnostic path and a hardware dependent path. The agnostic approach can only go so far though and will likely be deprecated once a hardware dependent approach gains some market share. It's possible that if they make it easy enough to implement for devs they could simply tack it onto FSR2 implementation. If it takes more work they might still struggle to get the same support as DLSS. Kind of like how a few games still launch with only FSR1and DLSS and FSR2 is added later. That is a real problem for marketing FSR - FSR1 really only serves as a disservice to the name at this point.
 
Less area, but I would guess transistor count needs to be fairly similar.

It seems AMD is trying to push the 'power efficiency' angle here a bit harder than is probably warranted, when I think the main goal was clearly space efficiency - something only AMD themselves really benefits from in the consumer space in this specific case. There's next to no difference in performance at 15w and if you're doing anything that requires '6 core utilization', you're almost assuredly pushing beyond 15w.

Perhaps for handheld sort of devices or ultrathins or something it can make sense, but I'm guessing AMD is just internally more excited about being able to make smaller chips.

I would think a large part of the smaller size was removing the transistors that enabled paths for higher clock speeds. It's going into effectively monolithic hybrid CPUs with 4/4c cores so the area savings can't come a differing process (or even higher density/low perf library).

Marketing messaging aside the efficiency by product is more so that wide work loads become power limited with the conventional "big" cores when they are clocked (specced) for more peak performance (that you want for burst workloads). Yes you can clock they done into the efficiency range but then you basically "waste" transistors that were used to enable those high clock speeds in the first place. Ultimately as with Intel the main goal is performance per area/cost (however you want to phrase) as opposed to actual low end power efficiency as with the mobile hybrids.

The longer term outlook is that there seems to be suggestions that SMT is on it's way out. As such these area efficient cores (from both) are going to be needed to offset the thread count loss.
 
I assume SMT is going to disappear, we are about to see another step in the evolution of CPUs. Early rumours have started about more dynamic cores, groups of execution units that can be grouped dynamically to form cores. Caches that can be dynamically partitioned between cores. Intel has a new core design coming that is a departure from what has come before. I have heard phrases like "rentable units", "leasable". We have had Tweaking of a design that must be nearly 20 years old and not much else for a while. There may even be some divergent designs coming. It might be exciting.

SMT is going away to be replaced by something else, not a return to simple cores with one thread. So I would assume thread counts are not going to go down.
 
I assume SMT is going to disappear, we are about to see another step in the evolution of CPUs. Early rumours have started about more dynamic cores, groups of execution units that can be grouped dynamically to form cores. Caches that can be dynamically partitioned between cores. Intel has a new core design coming that is a departure from what has come before. I have heard phrases like "rentable units", "leasable". We have had Tweaking of a design that must be nearly 20 years old and not much else for a while. There may even be some divergent designs coming. It might be exciting.

SMT is going away to be replaced by something else, not a return to simple cores with one thread. So I would assume thread counts are not going to go down.
SMT is relatively cheap and is dynamic with regards to execution units.
It adds complexity to the design process rather

Removing SMT for a while and then directly add a more complex means of partitioning resources makes little sense to me.
To require an improved means of partitioning something fundamental in the architecture would need to change first (e.g. significant change in number of execution units, much higher/lower target energy usage and what not)
 
I would think a large part of the smaller size was removing the transistors that enabled paths for higher clock speeds. It's going into effectively monolithic hybrid CPUs with 4/4c cores so the area savings can't come a differing process (or even higher density/low perf library).

Marketing messaging aside the efficiency by product is more so that wide work loads become power limited with the conventional "big" cores when they are clocked (specced) for more peak performance (that you want for burst workloads). Yes you can clock they done into the efficiency range but then you basically "waste" transistors that were used to enable those high clock speeds in the first place. Ultimately as with Intel the main goal is performance per area/cost (however you want to phrase) as opposed to actual low end power efficiency as with the mobile hybrids.

The longer term outlook is that there seems to be suggestions that SMT is on it's way out. As such these area efficient cores (from both) are going to be needed to offset the thread count loss.
I'd expect Zen 4c cores to have significantly higher dynamic Perf/W *at the same voltage* but to hit the same frequency they need to run at a higher voltage than Zen 4 cores. So, the Perf/W advantage is only there if you run more cores at a significantly lower frequency, which is something you might want to do for a a highly parallel datacenter/cloud workload or for a handheld device like a Steam Deck

In fact there's no need for me to guess, this chinese article has a lot of measured data: https://zhuanlan-zhihu-com.translat...uto&_x_tr_tl=en&_x_tr_hl=en-US&_x_tr_pto=wapp

The V/F curve shows Zen 4c and Zen 4 voltages both around ~0.69v at 1.5GHz, but at 2.3GHz, it's something like 0.79v vs 0.70v, which feels like it would be more than enough to offset the higher power efficiency at the same voltage. As far as I can tell, the first 3 points on the "Energy Efficiency Curve" graph of that article are 1.5GHz, 1.8GHz, and 2.3GHz where power efficiency is significantly better at 1.5GHz, about the same at 1.8GHz, and noticeably worse at 2.3GHz. The power benefit in that graph at 1.5GHz looks tiny, but that's because at 1.5GHz (single core for that graph I think), the vast majority of the power is outside the CPU (>4W feels high - I wonder if that platform had PCI-E power management issues like some users have had in the past with AMD?), so the actual power consumption difference for the CPU core itself in percentage terms seems very significant.

Zen 4c is good but it isn't that amazing in my opinion though... :(

Assuming those measurements are correct: ~35% lower area for ~35% lower frequency at Vmin, or ~29% lower frequency at max boost clock (so perf/mm2 if you didn't care about perf/watt hasn't really improved much) with *worse* dynamic Perf/W as early as >2GHz?! It's nice if you're very power limited and running close-ish to Vmin anyway (e.g. Steam Deck) but for multithreaded workloads where you want better Perf/mm2 & Perf/$ than Zen 4c Vmin offers you, it sounds the benefit isn't as big as AMD's marketing made me think originally.

At the same power efficiency as a 3.5GHz Zen4c (according to that specint energy efficiency graph), you can clock Zen4 ~25% higher, so the perf(/mm2) benefit at fixed power and fixed area is roughly +15% at that point of the curve (1.0 / ((1-0.35) / (1-0.25))). But at 2.3GHz, you can only clock Zen 4 about ~15% higher at the same power (2.7GHz vs 2.3GHz) so the advantage is roughly +23%. If you care about power consumption rather than perf/$, then you can trade-off area vs power by reducing the voltage on both chips! (obviously if you want to go below 2.3GHz then you need 4c to reduce the voltage and that's where it starts to shine, but AMD's Zen 4c datacenter chips look to be pretty close to 2.3GHz base clock and much higher boost clocks).

There are definitely some markets in both datacenter and Steam Deck-like devices where these trade-offs make a lot of technical sense... and then there's other markets where it doesn't make as much technical sense and it's kind of a wash, but it still makes a lot of marketing sense to compete with Intel's core counts.
 
Last edited:
Status
Not open for further replies.
Back
Top