CPU Security Flaws MELTDOWN and SPECTRE

I should also had expanded that System Z has a different memory/paging system management/security/architecture to AS/400-Power X, but thought the fact it also has memory and storage encryption would suffice to show its focus and design was different to that of Power x, like I said 390/System Z is more niche.
Part of the differences with regards to the memory/VM memory and paging along with hypervisor (depends upon client use-implementation) and privileges may be why there has been no announcement for System Z being vulnerable to Meltdown to date unlike Power x, they did announce System Z vulnerable to Spectre.
But one needs the IBM operational engineer books to drill down into this.
 
Last edited:

From my limited sampling of such discussions and articles that quote or mention him, I'm not sure that's all that special.

I think the discussion that follows is interesting. While this is a public email chain, this is still a back and forth discussion that's generally treated by its participants in a more informal style--as Torvalds does.
A single email is not a final judgement or the end of the matter.
Outsiders fixating on the fireworks is an unfortunate side effect of open discussion, although in this case it did seem to prompt a later explanation of some of the details: https://lkml.org/lkml/2018/1/22/598
*Note: the peanut gallery mentioned would likely include us.

One part that Torvalds objected to in particular was the implication that the IBRS status bits involved in Spectre mitigation are not implemented in the same manner as the Meltdown status bit. The former seems to leave an open-ended commitment to Spectre persisting with seemingly arbitrary compromises to security possible forever more. The bit related to Meltdown is a flag that effectively states "not fixed" and "fixed" going forward.

Following the chain, the major motivators for some of the most complicated IBRS changes were related to mitigation measures underway prior to the retpoline trick being introduced. Retpoline has generally replaced the need for the heaviest IBRS involvement, although it was kept on for areas where coverage was uncertain and for further discussion about trade-offs.
The ugly IBRS persists despite retpoline particularly for Skylake (or one of its variants) because Skylake's architecture falls back to its more standard prediction pipeline in a manner that leaves retpoline as an incomplete solution. IBRS functions as a form of barrier and so some of its behaviors are more complicated than a dedicated instruction or simple mode switch.
Further, IBRS is less of a penalty for Skylake, and might actually allow Skylake to fall back to a simplified retpoline (perhaps akin to what AMD gets).

Even further down the chain, it seems like someone brainstormed up a potentially more elegant fix, at least for Linux. Using the stack trace functionality already in place, the kernel can determine if the call stack has gotten 16 deep, the threshold that might make Skylake escape retpoline coverage. The predictor gets wiped if it happens, which is significantly lower-overhead than some of the hundreds or thousands of cycles associated with some of the microcoded functions.

There may still be specific (perhaps near-zero probability ones in AMD's parlance) corner cases that Skylake may have if it doesn't have the full measure, so the next step would be to have further debate and explicit decision-making on what's good enough.


That doesn't resolve the question about why IBRS might be open-ended, and whether it's iffy because it's a rough but functional hack that will be forgotten in the future or if it's more portentous.
I've stated that I hope the stakeholders do get together to forge a comprehensive framework for defining and measuring such concerns, rather than fixing the immediate exploit at hand and winding up caught out by a new one years down the road and having a similar learning curve under pressure.
 
Quick and stupid question. I have yet to read the Spectre paper yet so this is about meltdown (I only read the blog post linked from the meltdownattack website so far). When the offending thread tries to access kernel memory a fault of some sort happens (page or seg not sure which). Isn't the program terminated right then and there? Isn't that the default behavior? Or is that behavior overridable? If it is overridable why not make the OS fault handler detect that the fault was user ring code trying to access kernel address space and forcibly terminate the thread/process in that case? Wouldn't this be a simpler solution than to move the kernel into a separate address space?
 
Quick and stupid question. I have yet to read the Spectre paper yet so this is about meltdown (I only read the blog post linked from the meltdownattack website so far). When the offending thread tries to access kernel memory a fault of some sort happens (page or seg not sure which). Isn't the program terminated right then and there? Isn't that the default behavior? Or is that behavior overridable? If it is overridable why not make the OS fault handler detect that the fault was user ring code trying to access kernel address space and forcibly terminate the thread/process in that case? Wouldn't this be a simpler solution than to move the kernel into a separate address space?

One possibility in the paper is forking prior to running the attack code, so that a child thread is halted and the parent can still time the cache. Another is to install a signal handler for that exception, which would allow the thread to survive.

The speculative execution vulnerabilities can also rely on the automatic quashing of operations that are in-flight that would not reach the commit stage.
That can include hiding behind a mispredicted branch, if the signalling for discarding the forwarded load is delayed sufficiently.

Intel's chips have a wider vulnerability with TSX, since the response for a fault is that the transaction should abort while suppressing the exception. This doesn't change the data read into the cache.
A non-faulting prefetch was a vector for overcoming KASLR in the original KAISER paper. The Meltdown paper indicates using a prefetch on a kernel address can sometimes raise the success rate of the main exploit.
 
AMD's paper could use some copy-editing. It mangled the V1-1 mitigation segment and it looks like it failed to include the fixed code sequence. Some grammatical errors and some sloppily handled splits across page breaks.
I'd assume those that need to know this most would get the point, but it's unfortunate.

Mitigation V2-4 isn't particularly informative since it varies so much between implementations--which would be the place that would need more detail.
AMD seems to be saying that it intends to harden its return address predictor, since it doesn't expect future processors will need the option to flood it to prevent hostile code from setting a bad entry.

Interestingly, Intel's stating they intend to have CPUs that do not need software mitigation for Meltdown and Spectre later this year.
https://www.slashgear.com/intel-spectre-and-meltdown-proof-cpus-coming-this-year-25517122/

Changes to processor architecture are in the pipeline to permanently bypass the Meltdown and Spectre loopholes. However, it’ll take a little time to get them ready, and Intel says that the updated chips won’t be available on the market until later in 2018. It’s unclear what ranges Intel is prioritizing, since the security flaws affect so many models

Given the lag time for significant design changes, I'm curious what could be done. Google became aware of the exploits last June. Tapeout to launch of a chip this year could take 2-3 quarters. How long Intel had to make changes or how hacky they may be is uncertain.

Meltdown might be amenable to something of a quicker fix. If the hardware is positioned to know that it's in the scenario where user code is hitting a kernel address, it might suppress the operation like AMD does with its specific check.
If that part isn't changed, perhaps the load pipeline can be made to disable forwarding or zero out the value in the faulting scenario. That wouldn't necessarily require new behaviors in the rest of the chip.

Spectre is the more pervasive one, and if Intel means no software workarounds like serializing before branch checks, retpolines, and the indirect branch control instructions and barriers, I'm curious what would be changed in what is publicly a short time across multiple hardware units and scenarios.

Not knowing the details of the hardware, perhaps there's enough information nearby for the pipeline to trap out if it detects a subset of instructions that can generate side-effects not rolled back by standard misprediction handling.
Adding a new cache partition seems expensive, and not speculating at all seems impractical.
Maybe expand the role of the line fill buffers to delay booting things out of the L1, although that's a complex set of areas to change.

Perhaps a variation on what is being done to counteract Meltdown in Power?
https://git.kernel.org/pub/scm/linu...t&id=aa8a5e0062ac940f7659394f4817c948dc8c0667
It purges the L1 upon exiting the kernel or hypervisor, since it apparently does catch permissions violations if a load misses the L1. That seems like a specific corner, perhaps a window opened up by the way-predicted data cache and parallel TLB and directory checks.

Perhaps a hack fix by Intel if it's rushing could be to make it so a Meltdown or Spectre dependent access prompts an L1 wipe, or perhaps less expensively invalidate one line from other cache sets so that there's no identifiable pattern to the invalidations.
That requires tracking which operations loaded a value speculatively, and somehow through a dependence chain launched a load with an address derived from it.
Some of that might be partially present, based on how the register allocations are tracked and freed for standard mispredicts.
The pipeline could then detect that this happened, and wipe all or part of the L1.
TSX's low-latency line invalidation capabilities might make it possible to do this with less overhead.
 
Google Chrome adds Protection against Meltdown and Spectre
As mentioned, this latest version of Chrome also protects Mac and Windows users against the Meltdown and Spectre vulnerabilities. Google is disabling the SharedArrayBuffer feature to mitigate against web-based attacks. The update will apply itself, or alternatively, you can manually update by going to help and then about.
http://www.guru3d.com/news-story/google-chrome-adds-protection-against-meltdown-and-spectre.html
 
I revisited Nvidia's security advisory and saw that it has updated the mitigation information for its products.
It clarified that the Jetson TX2 that was apparently vulnerable to Meltdown in the initial release was considered to be affected by the ARM-specific Variant 3a, which has been listed as not needing any mitigation.
http://nvidia.custhelp.com/app/answers/detail/a_id/4617

This clarifies my earlier reading of the generic Meltdown advisory for the product with a custom ARM core, which might not be the one affected by 3a, since there are A57 cores in the product as well.

This reduces the ARM contingent to ARM's A75, Cavium's Thunder X2, and some unspecified number of Apple's custom cores.
The status of Samsung's custom ARM cores, and the recently announced M3 (wide, OOE, out of order memory pipeline, etc.) is unclear.

Spectre is more verifiably applicable to all the vendors mentioned, and also Qualcomm's Falkor processor. Meltdown vulnerability is unclear for the latter.
 
I revisited Nvidia's security advisory and saw that it has updated the mitigation information for its products.
It clarified that the Jetson TX2 that was apparently vulnerable to Meltdown in the initial release was considered to be affected by the ARM-specific Variant 3a, which has been listed as not needing any mitigation.
http://nvidia.custhelp.com/app/answers/detail/a_id/4617

This clarifies my earlier reading of the generic Meltdown advisory for the product with a custom ARM core, which might not be the one affected by 3a, since there are A57 cores in the product as well.

This reduces the ARM contingent to ARM's A75, Cavium's Thunder X2, and some unspecified number of Apple's custom cores.
The status of Samsung's custom ARM cores, and the recently announced M3 (wide, OOE, out of order memory pipeline, etc.) is unclear.

Spectre is more verifiably applicable to all the vendors mentioned, and also Qualcomm's Falkor processor. Meltdown vulnerability is unclear for the latter.

Just to add regarding 3a Arm says in their whitepaper:
Variant 3a: said:
Note: It is believed that there are no implementations of Arm processors which are susceptible to this mechanism that also implement the Pointer Authentication Mechanism introduced as part of Armv8.3-A, where there are keys held in system registers.
On their site it shows the A15,A57,A72 with said potential vulnerability but with caveat 'In general, it is not believed that software mitigations for this issue are necessary'.
A75 is shown with Variant 3 rather than 3a.
https://developer.arm.com/support/security-update
The whitepaper is near top right to download, while processors are listed lower down in a table with vulns.

For Variant 3:
For some implementations where a speculative load to a permission faulting (or in AArch32 domain faulting) memory location returns data that can be used for further speculation, this side-channel has been demonstrated to allow the leakage of EL1-only accessible memory to EL0 software. This then means that malicious EL0 applications could be written to exploit this side-channel.
A definitive list of which Arm-designed processors are potentially susceptible to this issue can be found at www.arm.com/security-update. It is believed that at least some Arm processors designed by Arm and its architecture partners are susceptible to this side-channel, and so Arm recommends that the software mitigations described in this whitepaper are deployed where protection against malicious applications is required.

Edit:
The ARM info is the most recent out there dated 26/01.
 
Last edited:
There's been some testing as far as Spectre is concerned for PowerPC and Power6.
A number of PowerPC chips are affected, and Power6 was found to be susceptible to Spectre.

The author appears to have been under the impression that Meltdown is an Intel-only issue, so it doesn't look like Meltdown was tested.
Further updates indicate an assumption that the Power7 and higher patches were just for Spectre, when there are Linux changes explicitly for Meltdown. The window of opportunity for Meltdown on IBM's cores appears to be more limited than Intel's.
http://tenfourfox.blogspot.com/2018/01/actual-field-testing-of-spectre-on.html
http://tenfourfox.blogspot.com/2018/01/more-about-spectre-and-powerpc-or-why.html

From an advisory concerning SUSE Linux, Meltdown is noted as having fixes ready or inbound for Intel, Power, and ARM.
Spectre is noted for the x86 vendors, ARM, IBM Power and IBM Z.
https://www.suse.com/de-de/support/kb/doc/?id=7022512

The following indicates there is some Z system effects from Spectre from Z14 and earlier.
https://www.mail-archive.com/linux-390@vm.marist.edu/msg70885.html
Z in other areas has been described as being systemically or architecturally designed with already separate kernel and user mappings, which would prevent Meltdown.
I don't have a citation for that, currently.

Some changes related to better managing branch predictors for s390 showed up.
http://lkml.iu.edu/hypermail/linux/kernel/1801.2/01347.html
QEMU indicated s390 was affected by Spectre, so the above is likely part of the response.
https://www.qemu.org/2018/01/04/spectre/
 
Last edited:
Strange,
because IBM announced awhile back all were vulnerable to Meltdown apart from System Z for now.
Looking at their patch updates, it is for both Spectre and specifically Meltdown as well, applies to a heck of a lot of IBM processors.
Power 7+: In response to recently reported security vulnerabilities, this Power System firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754. Note that a subsequent FW release is required and will replace this FW update for CVE-2017-5715 for IBMi when available. In addition, Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.

Power 8/Power 9: In response to recently reported security vulnerabilities, this Power System
firmware update is being released to address Common Vulnerabilities and Exposures issue numbers
CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754.
Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.
I have other sources as well.
 
Aside from the blog whose focus was on Power Macs, the rest of the disclosures appear to be consistent with Power7 and later being affected by Meltdown.
I think the blog author was mistakenly working from the initial public perception that Meltdown was Intel-only.
 
Just to add regarding 3a Arm says in their whitepaper:

On their site it shows the A15,A57,A72 with said potential vulnerability but with caveat 'In general, it is not believed that software mitigations for this issue are necessary'.
A75 is shown with Variant 3 rather than 3a.
https://developer.arm.com/support/security-update
The whitepaper is near top right to download, while processors are listed lower down in a table with vulns.

For Variant 3:


Edit:
The ARM info is the most recent out there dated 26/01.
Just to make sure, what ARM calls "Variant 3" is in fact Meltdown, and "Variant 3a" some sort of Meltdown-related thing, right?
Spectre only has 2 variants
 
ARM's 3a variant is speculative execution allowing the leakage of information from system registers not accessible to user code. ARM's justification for why it probably doesn't need mitigation is that while this is undesirable, they consider most of that data to be mostly useless for further exploit. One class of system register they would be concerned about is key storage for their pointer authentication functionality, but there is apparently no overlap between vulnerable CPUs and CPUs that provide that function.

The mitigation section does indicate that it may be possible to leak information that could break KASLR, so mitigating 3a would be paired with the changes used to mitigate 3. This seems like it applies in the other direction, that if you want to safeguard KASLR against 3a you would want to add the mitigations for 3 even if the CPU isn't vulnerable--given the coincidental nature of KPTI's protection against Meltdown and original intent to protect KASLR.
 
Linus needs to shut his mouth. Every time I have to touch that OS of his I get suicidal tenancies. Fucking permission management is the most dysfunctional shit I've ever seen. You can give everybody access to everything and still end up with permission denied. Never ever just works that junk.

You are dissing the person who has started and is head developer of the most used OS kernel? And who also designed the most used SCM system? It seems you should read something like "POSIX for beginners".
 
Back
Top