CPU Security Flaws MELTDOWN and SPECTRE

Yeah, Intel's messaging is plain weird. There clearly is a real issue that does specifically affect Intel more than at the very least AMD. So statements shifting blame/saying "nothing to see here" are actually the premiere reason to avoid buying Intel products in the future. I can live with vendors screwing up, I cannot live with vendors not admitting/hiding/downplaying mistakes.
 
The thing I do not get the most is the glee I hear. "I will never buy Intel again", "Intel was cheating all the time", etc.
Mistakes happen to anyone. There is no way of proving that there are no hidden issues in AMD CPUs (at the very least, IMHO, no practical way).
And, well, I do actually expect Intel CPUs to be validated more thoroughly (purely because Intel has more resources).

Something as complex as virtual memory and the interactions between the microarchitecture and OS can have any number of flaws, particularly at the margins of the system or legacy support. One side note is that I recall seeing comments about removing certain guard bands at the end of kernel/user space that would have kept a limited set of pages off-limits in case of chip prefetchers or microcoded instructions running over and crashing/looping.
AMD and Intel had varying amounts of memory off-limits in that zone, just in case.
The new paging scheme shifts the actual space into a new mapping, so some poorly handled boundary conditions are avoided.

If people are just flaking out about the KAISER issues because they've been coded, it might be an oddly escalated reaction due to people noticing. Whether validation can help may depend on whether Intel is trying to throw a problem that hasn't been specified in the public research into the same basket.

Side channel issues on their own wouldn't normally be caught by validation, because the results are fully valid. The design and system architecture didn't mandate invariant time, or absolute obfuscation of any incidental behaviors that could hint at the history or state of process execution.
More fundamental changes would have to happen to change what the accepted results are.

More fundamentally, some of these exploits pit the desire for the best performance against the need for secrecy. Sometimes being faster means something to those paying attention, and hiding that information may mean giving up some of that speed.


We may need to wait and see if it's all just about the KAISER proposal, or if AMD is playing loose with the situation in the other direction. Even if this is all a blowup about the original "KASLR is pretty sucky and has been for a while" fixes, there are exploits even in the source papers for that proposal that AMD was vulnerable from that the proposal didn't cover, and general issues where being cleaner about kernel and user space could help long-term.
 
From that, the scenarios are related to side-channel attacks based on timing cache residency, branch history, and perhaps most flexibly a crafted kernel value-based action that creates an external side effect like a highly revealing address loaded into cache.

Residency and branch history can likely be crafted, with varying degrees of difficulty for most vendors.

The third case is the apparent high-priority one Intel is trying to minimize.
It's not a direct breaking of internal pipelining like a complex store forwarding scenario I threw out there earlier, but a more straightforward method where a read to a location whose permissions will produce a fault is allowed to speculatively forward its result to instructions that take the value and make something that can be used to generate a side effect: like another memory address for another speculative load.
After the speculation is rolled back, timing a set of loads meant to exercise the cache will turn up a faster load time in the section corresponding to the value (or a specific part of it, repeat as needed).

AMD's hardware not speculating at the initial load to the kernel from user space is probably the reason they state they're safer, though that's not the total range of issues fixed by the page table isolation changes.

Intel is likely truthful when they say everything is working as designed, although is likely trying to hide the magnitude of one of its specific vulnerabilities.
AMD is fixating on Intel's issues to the exclusion of equally applicable flaws, many of which no architecture can really avoid as long as the same state-holding hardware is touched by code from both domains or multiple actors. Even separating that may not be sufficient unless timing can be made invariant or uninformative.
 
From that, the scenarios are related to side-channel attacks based on timing cache residency, branch history, and perhaps most flexibly a crafted kernel value-based action that creates an external side effect like a highly revealing address loaded into cache.

(Spectre) Variant 1: Bounds Check Bypass - Use existing code with access to secrets by making it speculatively execute memory operations
(Spectre) Variant 2: Branch Target Injection - Malicious code usurpsproperties of CPU branch prediction features to speculatively run code
(Meltdown) Variant 3:Rogue Data Load - Access memory controlled by the OS while running amalicious application.
 
Forget what I said earlier.
Meltdown is Intel exclusive vulnerability
https://meltdownattack.com/
Spectre affects practically every single OOoE-CPU out there, including Ryzen
https://spectreattack.com/

edit:
Also, since there's generally 3 variants being talked about, Meltdown is Variant 3, Spectre is Variant 1 & 2

From ARM, it appears variant 3 is not Intel-exclusive:
https://developer.arm.com/support/security-update

The other thing about this is that while OoOE is speculative, in-order cores can speculate as well.
The Cortex A8 is affected by Spectre, for example.

Pipelines and branch prediction can start executing instructions ahead of knowing if they will fault. Caches, predictors, and other subsystems whose functions can accumulate state and affect timings can provide side-channel information.
 
AMD update
https://www.amd.com/en/corporate/speculative-execution

Variant One - Bounds Check Bypass - Resolved by software / OS updates to be made available by system vendors and manufacturers. Negligible performance impact expected.
Variant Two - Branch Target Injection - Differences in AMD architecture mean there is a near zero risk of exploitation of this variant. Vulnerability to Variant 2 has not been demonstrated on AMD processors to date.
Variant Three - Rogue Data Cache Load - Zero AMD vulnerability due to AMD architecture differences.

That's quite a different release from AMD than from Intel lol.
 
Linus rips into Intel
https://lkml.org/lkml/2018/1/3/797

Why is this all done without any configuration options?

A *competent* CPU engineer would fix this by making sure speculation
doesn't happen across protection domains. Maybe even a L1 I$ that is
keyed by CPL.

I think somebody inside of Intel needs to really take a long hard look
at their CPU's, and actually admit that they have issues instead of
writing PR blurbs that say that everything works as designed.

.. and that really means that all these mitigation patches should be
written with "not all CPU's are crap" in mind.

Or is Intel basically saying "we are committed to selling you shit
forever and ever, and never fixing anything"?

Because if that's the case, maybe we should start looking towards the
ARM64 people more.

Please talk to management. Because I really see exactly two possibibilities:

- Intel never intends to fix anything

OR

- these workarounds should have a way to disable them.

Which of the two is it?
 
As it turns out, it's not that rosy on the ARM-side of the fence either. Meltdown aka Variant 3, thought to be Intel exclusive, applies to ARM Cortex-A75 too. Also there's related "Variant 3a" which applies to Cortex-A15, A57 and A72.
Maybe Linus should look more towards AMD? It's the only safe one from Meltdown of the 3 and apparently no-one has been able to get Variant 2 working on AMD either even though it's apparently possible in theory.

https://developer.arm.com/support/security-update
 
Last edited:
That's for something hitting a lot of syscalls.
Games will be barely (if ever) affected.
Is graphics driver outside kernel memory space in modern MS OSes? Because doing 100k+ drawcalls/sec would bring giant performance hit if it isn't...

I'm sure many tech journalists are working on articles, gathering information etc. I know Anandtech are working on something.
Yeah, my post was only true at the time I wrote it, with me looking at only a few tech news websites which I visit more or less regularly... :p (Including for example Ars Technica, which are usually pretty quick at covering big bad breaking news stuff.) Now it's everywhere, pretty much.

I cannot live with vendors not admitting/hiding/downplaying mistakes.
It's one thing not admitting/downplaying/hiding their mistakes; quite another downplaying your own mistakes whilst raising a finger pointing and saying, "hey hey, look over there, they screwed up too!"

This latter thing is what Intel is doing.

Intel is likely truthful when they say everything is working as designed
A Ford Pinto worked as designed when it caught fire after being rear-ended... Hum-de-hum hum...
 
AMD and ARM are not in the total clear of Meltdown yet:
6.4 Limitations on ARM and AMD
We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.
https://web.archive.org/web/20180103223603/https://meltdownattack.com/meltdown.pdf
 
Is graphics driver outside kernel memory space in modern MS OSes? Because doing 100k+ drawcalls/sec would bring giant performance hit if it isn't...
It may also be a question as to how much of the draw call's cost is specific to a TLB flush versus data movement or any synchronization with a device.
Some operations may be waiting on a pokey command processor somewhere to send a signal back.

A Ford Pinto worked as designed when it caught fire after being rear-ended... Hum-de-hum hum...
There's usually a line or two in the regulations as to how often a car's passengers can be exploded.
 
Is graphics driver outside kernel memory space in modern MS OSes? Because doing 100k+ drawcalls/sec would bring giant performance hit if it isn't...
If I'm not mistaken each drawcall hits the usermode driver which then batches up drawcalls and sends them in one kernel mode driver 'go'. This take on things is supported by how to efficiently submit drawcalls and command lists in DX12. My opinion is based also on the description of the software stack as described here: https://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/

edit - oh an MS put out a patch yesterday (Jan 3) for the exploits. kb4056892
edit2 - https://www.windowscentral.com/microsoft-pushing-out-emergency-fix-newly-disclosed-processor-exploit
 
Last edited:
Wow this was just mentioned on hourly radio news where I live :oops:
They pretty much never mention anything techy.
 
https://github.com/torvalds/linux/commit/00a5ae218d57741088068799b810416ac249a9ce

Also there was a tweet somewhere stating that for Windows, different code may be executed depending on the CPU vendor ( so the fix should behave differently for AMD)
  • Exclude AMD from the PTI enforcement. Not necessarily a fix, but if AMD is so confident that they are not affected, then we should not burden users with the overhead"
So now its a matter of time to see if they can indeed trigger on AMD and if so, then the config will need to change to include them for PTI.
 
Back
Top