AMD Execution Thread [2024]

Ryzen AI have the old style 2-clock execution for AVX-512 which may in fact greatly affect it's perf/watt in a positive way. I dunno if it's a good idea to just call both of these "Zen 5".
 
We know that. Look at the headlines. This is about AMD execution which includes marketing.

Very Efficient Ryzen 7 9700X Held Back by Power Limits!
AMD 9700X and 9600X Benchmarks... OOF
AMD Ryzen 7 9700X Review - Zen 5 Sucks For Gaming!
Wasted Opportunity: AMD Ryzen 7 9700X CPU Review & Benchmarks vs. 7800X3D, 7700X, & More
AMD Ryzen 7 9700X review: YouTube hates this CPU

For the majority of potential customers, this will be their introduction to Zen 5. I know AMD marketing sucks but this is indefensible.
I didn't mean to defend Ryzen 9700X, there's no reason to get it outside AVX512 use, just pointing out that this isn't Zen 5 introduction like claimed.
 
Sure we can expect improvements. That doesn’t mean it’s a terrible product when they don’t materialize. Especially when it comes to gaming CPUs where multiple generations produce the same end user experience.
 
Last edited by a moderator:
Mod Mode: Just a bit of cleanup, and perhaps someone will relax a bit and realize their opinions aren't shared by everyone. And now we return to our regularly scheduled program, already in progress...

I'm still at least holding out for some hope on AGESA / motherboard firmware bringing some additional improvements, especially with the clockspeeds seeming like they have a LOT of additional headroom given the thermal and power constraints. We've now seen a few examples from various review sites where the IPC gains do seem to be there when 7000-series and 9000-series are (approximately) locked to the same clocks. As the owner of a 5950x, I'm excited to see what the X3D versions bring, when they arrive.

I also wager the "higher powered" parts (>100W) will compete very nicely, but we'll have to wait to see.
 

After reading through all of the changes to Zen5, I kind of wonder if gaming workloads just can't take advantage of the width of the zen5 cores, or if it's an issue of cache "friendliness" of games. Kind of wondering if Zen5 would actually scale with memory latency better than zen4, and if we'll see a bigger gap between 9x00X3D and 7x00X3D because cache bandwidth and latency can be maximized and potentially take advantage of the core improvements. Or maybe it's just the type of workloads most games are, and they'll never really exploit the changes. It really seems like AMD made pretty big changes to the this new gen, but maybe they're just more suited to mathematical or scientific workloads.
 
We know that. Look at the headlines. This is about AMD execution which includes marketing.

Very Efficient Ryzen 7 9700X Held Back by Power Limits!
AMD 9700X and 9600X Benchmarks... OOF
AMD Ryzen 7 9700X Review - Zen 5 Sucks For Gaming!
Wasted Opportunity: AMD Ryzen 7 9700X CPU Review & Benchmarks vs. 7800X3D, 7700X, & More
AMD Ryzen 7 9700X review: YouTube hates this CPU

For the majority of potential customers, this will be their introduction to Zen 5. I know AMD marketing sucks but this is indefensible.
Lead Zen engineer "This launch isn't for gaming"
Reviewers "Bitching about no gaming performance"

I mean, yes it's a failure on AMD to control the press. But how many sales does seeding review samples to drama factories like these actually get them? It's a relevant question all IHVs should be asking.
 
Last edited:
Ryzen AI have the old style 2-clock execution for AVX-512 which may in fact greatly affect it's perf/watt in a positive way. I dunno if it's a good idea to just call both of these "Zen 5".
Disagree, AVX-512 is very niche (not in the least, because of Intel's abysmall adoption of their own instructions) . For non-worstation users, doubly so.

As you said (that it positively impacts per/watt), it seems to be aptly intended for mobile where perf/watt is more important
 
Last edited:
Lead Zen engineer "This launch isn't for gaming"
Reviewers "Bitching about no gaming performance"

I mean, yes it's a failure on AMD to control the press. But how many sales does seeding review samples to drama factories like these actually get them? It's a relevant question all IHVs should be asking.
I don't blame this on reviewers. We had Ryzen 1700X, Ryzen 2700X, Ryzen 3700X, Ryzen 5700X, and Ryzen 7700X which were most definitely used for gaming. To release a 9700X and say it's not for gaming is silly.

I believe this is from AMD's slide deck:
1723395549452.png

But yea all those drama factories should know, when a company uses phrases like "Gaming Leadership" followed by a chart of gaming performance in specific games, this doesn't mean they should actually test the CPU in games. This launch isn't for gaming :LOL: :LOL:
 
Last edited:
Pretty sure that means Zen 5 rather than specific SKU, as in, they focused on other things in this architecture. AVX512 should be a telltale sign for everyone on that, it's pretty nonexistant on consumer space and Intel killed it off in their consumer CPUs already.
Also DIY space is more than gaming.
 
Pretty sure that means Zen 5 rather than specific SKU, as in, they focused on other things in this architecture. AVX512 should be a telltale sign for everyone on that, it's pretty nonexistant on consumer space and Intel killed it off in their consumer CPUs already.
Also DIY space is more than gaming.
I'm balking at the suggestion that Zen 5 is not for gaming. When Intel introduced AVX512 on the desktop nobody said "oh well these must not be for gaming".

1723405914915.png

1723405925328.png

1723405934106.png

1723405941784.png

1723405955933.png

Notice that these slides include the entire Zen 5 product stack.
 
Yeah, they pretty clearly and obviously call out the 9600x, 9700x, and the as-yet-unreleased 9900x along side their "Gaming Leadership". Seems like they're intended for gaming, at least partially...

The TPU article about disabling SMT is a curious result. I wonder if this lends any credence to the memory subsystem being unable to feed such a wide beast, and maybe this suggests more cache (eg the 3D VCache parts) will enjoy the "full" potential of the chip? Will be interesting to see.
 
Yeah, they pretty clearly and obviously call out the 9600x, 9700x, and the as-yet-unreleased 9900x along side their "Gaming Leadership". Seems like they're intended for gaming, at least partially...

The TPU article about disabling SMT is a curious result. I wonder if this lends any credence to the memory subsystem being unable to feed such a wide beast, and maybe this suggests more cache (eg the 3D VCache parts) will enjoy the "full" potential of the chip? Will be interesting to see.
The 7700X also shows gains from disabling SMT. Is this expected behavior?
 
The short answer is yes because scheduling workloads across heterogeneous logical threads and heterogenous physical cores is Hard(TM).

In nearly all cases, any application that heavily uses just a few cores (1 < x < CPU physical core count) will net slightly better performance with SMT disabled. SMT works by helping pack more instructions into a single CPU pipeline, attempting to keep as many bubbles out of the pipeline as possible by using non-alike instruction sets (eg toss an FPMUL down the pipe and an INTADD unit is sitting there unused...) At some point, more instructions can't fit into the same pipeline, which is why performance with SMT enabled (more threads!) never ends up performing like "doubling" the physical core count -- and in fact is quite far from it (SMT typically finds another ~25% or so in heavily threaded workloads.) This behavior is not limited to AMD, and it also isn't limited to just current-gen parts.

There is also another ~10% performance bump to be found in disabling one CCD in a multi-CCD chip as well, given workloads which are lightly (1 < x < CPUs in a CCD) threaded. This is due to memory and cache access coherence across NUMA nodes. If one CPU core in CCD0 has to "reach across" to access L1 / L2 / main memory tied to a CPU core in CCD1, there's a non-trivial delay to make it happen. Also, by disablling a CCD, you leave a bit more power on the table and perhaps a tiny bit more thermal headroom for the same reason.
 
I'm balking at the suggestion that Zen 5 is not for gaming. When Intel introduced AVX512 on the desktop nobody said "oh well these must not be for gaming".

Notice that these slides include the entire Zen 5 product stack.
You must understand the difference between "not developed gaming first" and "you definitely aren't allowed to game on these ever" (exaggerated to make the point). It isn't you can't game or games don't work faster(/watt X3D aside), it's that the architecture evolutions were designed for other workloads, whatever gains gaming got weren't the priority and probably mostly comes from the optimizations made for other types of loads. Thus gaming workloads get smaller improvements this gen than some other areas, like that AVX512.
 
You must understand the difference between "not developed gaming first" and "you definitely aren't allowed to game on these ever" (exaggerated to make the point). It isn't you can't game or games don't work faster(/watt X3D aside), it's that the architecture evolutions were designed for other workloads, whatever gains gaming got weren't the priority and probably mostly comes from the optimizations made for other types of loads. Thus gaming workloads get smaller improvements this gen than some other areas, like that AVX512.
That may have been the focus of the architecture, but that's not something that was effectively communicated by AMD's marketing before launch.
 
I wonder if the split decoders has impacted the SMT performance.
I watched the chips and cheese interview, and seen the block diagram, of the new split decoder layout,
but it's not clear to me if, when in SMT mode both decoders can access instructions in a single thread to decode?
In theory across both decoders zen5 should be able to decode up to 8 instructions ( 4 each ), but If the core is running in SMT mode,
is this only 4 per thread? That might explain some of the less than stellar single thread perf we are seeing.
 
Yeah afaik (but might be misss remembering) decoders are statically partitioned between the 2 threads when SMT is enabled.

Info is somewhere in here:


My intuition is also that frontend improvements make it easier to feed more of the execution units, hence now less of them would be idling with SMT off.

On earlier generations, I recall comparisons with smt off for both AMD and Intel. With earlier Zen cores, the gain was larger (than for intel cores of that generation) when enabling SMT, and one explanation was that the cores were wider, and couldn't be fed with a single thread. So might be than Z5 improved on that
 
Last edited:
Back
Top