The Intel Execution in [2024]

Check CPU package power for games there, ARL is 2-3x more effective as RPL.
They also have a summary graph there.
Fully threaded (production) workloads seem to fare a bit worse but there is still a visible perf/watt gain.
In the review, they mention they use HWINFO to gather CPU power statistics...
https://www.computerbase.de/artikel/prozessoren/intel-core-ultra-200s-285k-265k-245k-test.90019/seite-4 said:
...over the duration of the benchmark, the telemetry is read out every second with HWiNFO. The maximum and mean values are then determined.
As part of their review, GamersNexus pointed out software readings of CPU package power may not be telling the whole story:
https://www.youtube.com/watch?v=XXLY8kEdR1c&t=510s said:
Here's the comparison we've all wanted: in 7-zip, ignoring EPS12v, we noticed that the ATX12V power was exceptionally high on a 285k and the Z890 Hero combination, compared to the 14900k and Z790 Hero. It's at about 50W here, where the 14900K had the ATX12V down around 30W."
Gamers Nexus used a completely hardware-based measuring system for the CPU and system board, even to the point of dissociating PCIe power for the video card during measurements. Their methodology found the 285K power efficiency is still better than the 14900k, to the tune of about 57% in the most favorable readings, or losing to the 14900k by ~18% in the least favorable readings (where framerate went down, despite power also being down.) The pertinent YouTube link here:

In the end, very much not a doubling of efficiency or halving of power consumption when compared to the 14900k, and using software readings from software like HWInfo may not be providing reliable readings.
 
Check CPU package power for games there, ARL is 2-3x more effective as RPL.
They also have a summary graph there.
Fully threaded (production) workloads seem to fare a bit worse but there is still a visible perf/watt gain.
As far as I understood Gamernexus latest video the "traditional" way to measure perf/watt doesn't really apply to the latest Intel cpu. He went into some elaborate measurement setup to get more accurate results.

I wouldn't assume that computerbase.de adjusted their methodology so I'm not sure how accurate their results really are.
 
285K is also still being pushed pretty hard. It's hard to get any kind of good efficiency claims when comparing processors being way out of any efficiency sweetspot window and more towards their extreme ends of practical usage. Even Raptor Lake could show really big improvements in efficiency over Alder Lake when performance equalized in certain tasks, and that was only on the back of an iteration to the Intel 7 process ultimately. This is because when performance equalized, Raptor Lake was able to come away from its more extreme end of its clock/voltage curve a bit, while Alder Lake couldn't.

It'd be nice to see testing done showing such curves, kinda like Geekerwan does with smartphone processor testing. This is the best way to see the proper efficiency picture.

I think it's obvious that Arrow Lake does have some quite good efficiency improvements(though not 2x performance/watt on the whole). But at the same time, improvements in efficiency are often leveraged to push performance, which it doesn't seem like Arrow Lake can do very well, so that's still disappointing. And I dont think there's any world where simply matching Raptor Lake in single thread performance, which is more than two years old now, is an 'ok' result. There's only a small handful of workloads that it seems Arrow Lake excels at, making it a poor buy for the vast, vast majority of the market.
 
Last edited:
Yeah, to Sean's point, I'm waiting to see some laptop tests running the new Alder Arrow Lake procs. That's when you get into the battery life tests, which is where the new process will be able to demonstrate efficiency superiority -- or not.
 
Last edited:
Yeah, to Sean's point, I'm waiting to see some laptop tests running the new Alder Lake procs. That's when you get into the battery life tests, which is where the new process will be able to demonstrate efficiency superiority -- or not.
You mean Arrow Lake, Alder was 12th gen Core ;)
 
How do games perform on 6-8 thread CPUs? I'm wondering how much losing SMT is affecting gaming performance. Seems like games have been expecting 10+ threads for a while now. Are the E cores now picking up more work in games?
 
How do games perform on 6-8 thread CPUs? I'm wondering how much losing SMT is affecting gaming performance. Seems like games have been expecting 10+ threads for a while now. Are the E cores now picking up more work in games?
TBH it feels like 8 real cores matter today more than threads
Edit: arrow lake kinda supports it too
 
How do games perform on 6-8 thread CPUs? I'm wondering how much losing SMT is affecting gaming performance. Seems like games have been expecting 10+ threads for a while now. Are the E cores now picking up more work in games?

Some games run better with SMT/hyperthreading disabled and some games run better with it enabled. So the choice of 8 cores with no hyper-threading is going to be a win or a loss depending on the game.
 
Some games run better with SMT/hyperthreading disabled and some games run better with it enabled. So the choice of 8 cores with no hyper-threading is going to be a win or a loss depending on the game.
But it will probably no exceptions beat 4 core 8 thread by miles even if these theoretical CPUs would be a match in perfectly thread scaling app
 
But it will probably no exceptions beat 4 core 8 thread by miles even if these theoretical CPUs would be a match in perfectly thread scaling app
Well, to be fair, that was always going to be true on any x86 processor, regardless of AMD or Intel. Symmetric multithreading works by permitting more individual execution units to operate simultaneously when feasible.

For those who may not be aware, native x86 execution stopped existing inside the 486 and later processors. Instead, x86 instructions are broken down into micro-operations, and specific execution units inside the processor consume those micro-ops in tiny bundles. The trick is, there's a LOT of execution units in a modern CPU, and there's no single X86 instruction which breaks down into a sufficient number of micro-ops to keep every execution unit running. Thus, SMT comes along to attempt stuffing more micro-ops into the pipeline, to permit more execution units to stay busy, and thus getting more work done will still technically fitting inside a single CPU core.

Reality is, there will never be a proper doubling of performance using this method. Execution units are still unique, which means more often than not, there simply aren't the right types of execution units available to properly execute two simultaneous fully decomposed x86 instructions. SMT is still an overall win, because it does allow for less bubbles in the pipeline, which can help mask (but never eliminate) pipeline stalls by providing more instructions to chew on. It's also literally doing more work with the same resources.

In all cases x86, one CPU core which operates in SMT fashion will never yield the same performance as two individual CPU cores. Nevertheless, the overall power consumption and silicon die size of doublng total CPU core count vs providing SMT with half core count will always favor the latter.
 
Last edited:
I’m actually kind of curious about side-channel attacks. If you disable HT or SMT how many security mitigations can you safely disable and how much performance does it get you. Besides security I would guess that HT probably loses in an application that is cache sensitive. So maybe a large L3 can help mitigate the drawbacks. I do think alder lake is really interesting. I’ve always liked the idea of a bunch of small cores to handle a lot of background stuff.

Will be interesting to see how it plays out in terms of windows optimizing for the platform because it’s a shit os.
 
I’m actually kind of curious about side-channel attacks. If you disable HT or SMT how many security mitigations can you safely disable and how much performance does it get you. Besides security I would guess that HT probably loses in an application that is cache sensitive. So maybe a large L3 can help mitigate the drawbacks. I do think alder lake is really interesting. I’ve always liked the idea of a bunch of small cores to handle a lot of background stuff.

Will be interesting to see how it plays out in terms of windows optimizing for the platform because it’s a shit os.

Many of the new CPU vulnerabilities such as Spectre and Meltdown are not very sensitive to SMT. They work even without SMT, so disabling SMT alone is not enough. For some it's possible to mitigate by flushing something (e.g. branch predictors) after each context switch but that's just too inefficient.

In a sense SMT (or other multi-threading architecture) is good mostly for hiding latency, as long as there's enough threads out there. Unfortunately, this is not what most people are using. For example, gaming workloads are mostly latency sensitive. In theory it's quite possible to parallelize gaming workloads but many games today are still bounded by a single thread (they do have multiple threads but generally there's one main thread that's the main bottleneck).

Another problem is scheduling. SMT itself created a new scheduling problem because not all "cores" are created equal. However, in a SMT system it's generally easier because it's generally fine if the OS dispatches threads to each real cores first, then "logical" cores later. However, today's systems are far more complex, such as each cores have different boost performance, or the P cores and E cores thing, and also the different CCD thing. Worse, some might just be temporary, for example, it's rumored that AMD's next V3D cache CPU will no longer be V3D on single CCD only like 7950X3D and 7900X3D. If true, that makes optimizing for special arrangements like these CPU will be only temporary and developers will be less willing to do much optimization for them.
 
In all cases x86, one CPU core which operates in SMT fashion will never yield the same performance as two individual CPU cores. Nevertheless, the overall power consumption and silicon die size of doublng total CPU core count vs providing SMT with half core count will always favor the latter.
But then why did they ditch SMT?

I'm wondering why Arrow Lake gets brutalized by Raptor Lake in some games.
 
But then why did they ditch SMT?

I'm wondering why Arrow Lake gets brutalized by Raptor Lake in some games.

The reasons I've seen are simplifying the cpu core design, mitigating security problems, decreasing power consumption (not sure how), hyper-threading being of less benefit with the existence of efficiency cores. I guess ultimately intel has to prove that they were right by releasing a product that demonstrates those things.
 
Very much to @Scott_Arm 's point, I don't think "we" know why Intel ditched SMT.

I also am not sure it's specifically a "Windows is a shite OS" problem regarding how to schedule workloads on a modern processor. Intel made it bad enough with SMT on higher performing p-cores, but then no SMT on lower performing e-cores, how do you decide which threads go where? AMD didn't make it any easier with different CCDs having potentially different cache topologies, and of course each CCD has it's own "top clock" CPU core. Forget a Windows vs Linux thing, how would any rational human know how to lay processes down across these heterogeneous compute nodes?

The difficult part about this scheduling is not having information about what you're scheduling. How would any given operating system know that it's scheduling a game versus an office app? Or the main thread of a hugely parallel software task vesrus one of the subordinate worker threads? Here's a hint: it doesn't! Recent OS patches have created affinity pinning rules to ensure grouped threads of known executable names all get lumped into the right CCD or P vs E core or SMT versus native core or whatever, but it's just a big kludge.

Scheduling was once just as simple as "I have this many things to do, I have this many compute resources, I shall spread fairly and accordingly." Today it's "Well, this set of cores is faster, but this set of cores is more power efficient, but this set of cores has faster cache, but this one individual core actually runs at a higher clock, whereas these individual cores run at lower clocks, and this really even isn't a core but can sap cycles from another core for background tasks..." And try doing that when you have no specific indicator of the workload you're tasking out?

This isn't so simple as a "Windows OS is shite" problem, it's a whole rethinking of how applications spawn threads, how they would "tag" these threads they've created to indicate a finer granularity of compute priority, energy priority, and overall group task priority, and then getting appdevs to actually start using it once it's been defined, ratified, and pushed into modern operating systems.
 
Officialy due to power reasons. SMT leads to worse perf/watt than non-SMT.
Unofficially they are working on a different approach to better h/w utilization which wasn't ready for ARL.
I really don't know. @Albuquerque just established the opposite is true:
In all cases x86, one CPU core which operates in SMT fashion will never yield the same performance as two individual CPU cores. Nevertheless, the overall power consumption and silicon die size of doublng total CPU core count vs providing SMT with half core count will always favor the latter.
It does make sense that scheduling becomes ridiculously complicated when you have P cores with Hyperthreads and also E cores. For a given application is it better to task all physical (P+E) cores first and then start tasking the Hyperthreads? Or fill up the threads on the P cores before tasking the E cores? Not sure how you would even know this in advance. And it might depend on whatever else you have going on at the time.
 
Last edited:
Back
Top