The Intel Execution in [2024]

@Albuquerque just established the opposite is true
His comparison is between a 1C/2T and 2C/2T which doesn't make much sense production and cost wise as you don't really spend a whole 1C budget on adding SMT - far from it.
Intel's comparison is between a 1C/2T and 1C/1T. General benchmarks confirm this too - you get a huge power spike when enabling SMT on RPL for example while you get pretty much nothing in performance in games and other lightly threaded and/or highly interlocked workloads.
 
His comparison is between a 1C/2T and 2C/2T which doesn't make much sense production and cost wise as you don't really spend a whole 1C budget on adding SMT - far from it.
Intel's comparison is between a 1C/2T and 1C/1T. General benchmarks confirm this too - you get a huge power spike when enabling SMT on RPL for example while you get pretty much nothing in performance in games and other lightly threaded and/or highly interlocked workloads.
Which benchmarks are you referring to?
 
Enabling SMT or not is also a tradeoff in a given arhitectural context ; on wider cores (more or more complex execution units) it makes more sense compared to narrow ones.
Then good out of order capabilties also coud reduce the benefits for SMT. And then probably decoder throghput factors in as well.

(a fascinating bit to me was how much more earlier Ryzen chips benefited from SMT compared to their Intel competitors at the time)

All in all, It's a deeply technical topic , i'm pretty sure there's no single "reason". "Producing" a single one thus is not that interesting

What's interesting to me would be comparing the RTL -> ARL changes and speculating wether the architecture would have benefited more or less with SMT
 
There were plenty over the recent years.
For example: https://www.green-coding.io/case-studies/hyper-threading-and-energy/
Thanks. Are there any showing HT off not hurting gaming performance? It seems clear there that HT is a win for games at least on 4/8 or 6/12 core/thread CPUs exept in a couple cases. The inevitable 4 P cores models might not be very strong for games.

So far you'd have to be crazy to go Arrow Lake for gaming. I can't see any reason to go Arrow Lake over Zen4 or 5. Arrow Lake is worse in every way that matters.
 
Last edited:
Thanks. Are there any showing HT off not hurting gaming performance? It seems clear there that HT is a win for games at least on 4/8 or 6/12 core/thread CPUs exept in a couple cases. The inevitable 4 P cores models might not be very strong for games.
Sure, the majority of modern 8+ cores CPUs tend to actually benefit from disabling HT, although this is very title to title specific. I vaguely remember seeing a benchmark where the average was about the same but the difference in titles was in the range of +/-20%.

So far you'd have to be crazy to go Arrow Lake for gaming.
Kinda depends on the competition really. In the top end it's pretty clear who the winner is and this hasn't changed since the launch of 5800X3D really. But below that there are segments where various Intel CPUs can and do provide a better perf/price.
 
It seems clear there that HT is a win for games at least on 4/8 or 6/12 core/thread CPUs exept in a couple cases
Yep indeed, CPUs with limited threads (less than 12) benefit tremendously from Hyper Threading, but more threads than that is not always beneficial because not all games can spawn and distribute threads in a manner that increases performance, especially with all the complications of P cores and E cores, higher clocked cores and slower clocked cores, multiple CCDs, CCDs with 3D cache .. etc.

Too many threads for a given game task requires too much high level synchronization, too many different threads can cause the CPU to run out of caches and suffer stalls. Different cores/threads can also write or read data form the same memory location, causing bugs and performance problems, so you need to hike up the level of synchronization between cores/threads to avoid these problems, but synchronization in itself is costly task. The main synchronization operation to communicate data between cores is often atomic operations, Atomic operations are often significantly more expensive than normal operations, so too much synchronization with them can harm performance badly.

The OS is also another factor, background tasks can hop between threads and cause all kinds of havoc to the game's performance, especially if the game doesn't take this factor into account. Memory bandwidth is also another factor, using RAM that is not fast enough usually leads to regression in performance with HT.

In short, many games find that too many threads is actually a net negative to performance. As such CPU heavy games often limit the number of available threads to the game. For example, Warhammer Darktide have a setting that controls the number of worker threads available to the game, and the max number is always less than the number of available cores, on my 7800X3D system (which has 16 threads), the game only allows me to use 14 threads.
 
Yeah, to Sean's point, I'm waiting to see some laptop tests running the new Alder Arrow Lake procs. That's when you get into the battery life tests, which is where the new process will be able to demonstrate efficiency superiority -- or not.

You can get an idea from Lunar Lake as it's very similar, just slightly different tiles. The power efficiency was significantly better than Meteor Lake. Given Arrow Lake's more complicated tile design, I expect it to be slightly behind Lunar Lake on efficiency.
 
You can get an idea from Lunar Lake as it's very similar, just slightly different tiles. The power efficiency was significantly better than Meteor Lake. Given Arrow Lake's more complicated tile design, I expect it to be slightly behind Lunar Lake on efficiency.
Lunar has newer NPU and GPU, also Arrow Lakes will carry the burden of 2 types of E cores at least in all but highest end SKUs (and that would mean highest end is just desktop in mobile formfactor)
 
Lunar has newer NPU and GPU, also Arrow Lakes will carry the burden of 2 types of E cores at least in all but highest end SKUs (and that would mean highest end is just desktop in mobile formfactor)

Yes but neither NPU nor GPU would make a difference for typical battery life workloads which are web browsing or video playback, unless Lunar Lake also has significantly more efficient media decoders. The main cores being on the 3nm compute tile for both would be pretty similar in efficiency. As I said due to the additional tile complexity and power penalty for moving data around, ARL should be less efficient than LL but I don't expect it to be significant.
 
The net margin is largely due to the additional charges they've taken across the board, which is a one time thing. But the Gross Margin number is quite poor, and the outlook isn't great either. With DCAI ramping, the margins really should have improved further as they are finally somewhat competitive with AMD in DC CPU. DP GPU is one area where they are obviously lacking, and as a result of which AMD has for the first time overtaken Intel in Data center revenue.
 
Intel CEO stated the chip maker is "not going to achieve" its prior guidance of $500 million AI-accelerator revenue this year.


That means Gaudi is as good as dead. Falcon Shores is expected only by late 2025 (let's say early 2026 given Intel's track record), and it will likely be competing against Rubin and MI400. Intel looks to have missed the AI boat completely.
 
Back
Top