CPU Security Flaws MELTDOWN and SPECTRE

Arstechnica has further evaluation of the responses to the Spectre and Meltdown situation.
https://arstechnica.com/gadgets/201...hers-are-doing-about-it/?comments=1&start=160
The article is critical of Intel's early response, although more approving of subsequent documentation and offered details.
The ARM response was given the most kudos.

It's not particularly satisfied with AMD's subsequent PR, and goes further to lay the blame for the early NDA break and lack of a unified response on AMD's developer giving the reason for the desired exemption from the page table isolation changes.
Once it was clear that there was something related to speculative accesses to kernel memory from user space AMD didn't do, the narrative is that it allowed outside research to uncover the problem.
If accurate, I would add a possible additional wrinkle that since this disclosure occurred in relation to changes in the x86 branch, it may have significantly fed into the continued perception that Meltdown in particular is an Intel-only problem.

Intel's muddling of the exploit types and its indication that it's not alone concerning Meltdown doesn't look quite as bad knowing now that it isn't alone in terms of Meltdown.
Potentially, the pressure to break the news early may have also come to give Intel some cover from the disproportionate focus it was receiving on this--and maybe in part because Intel may have been more involved in the overall coordination and research than most.

The disclosure papers for the exploits do give props to Intel for what it did to help everyone regarding both exploits.

Also Qualcomm, which may mean something about future disclosures for them. Interestingly, ARM's recommendation for following the repository changes for ARM64 includes an possibly unrelated and yet still enlightening change regarding a TLB erratum for Falkor, where the assumed atomicity of TLB operations does not hold in this instance--in case there's not enough evidence about how hard this area is.
That being said, I'm not sure if that is indicative of Meltdown since the apparent direction is an assumption that the isolation changes are happening regardless given their general value in enhancing security. (Which again, makes me think AMD can't milk its exemption forever.)

For now, I'm willing to give AMD some benefit of the doubt despite their sparse followup and possible indescretion. Rather than attribute things to malice, one possible scenario is that AMD's organization did not anticipate one employee's loose lips, and a more substantive response wasn't ready for the embargo lift.
 
My apologies for the double-post, but given the time elapsed I figured a somewhat independent bit of speculation would be best kept separate.

There's been some interesting facets to the mitigation strategies proposed for ARM and Intel, such as a change to convert indirect branches to a usually unexploited return prediction in call/return pair (some Skylake variants may still have issues here), as well as using an x86 lfence instruction for serializing execution and therefore preventing further speculation in terms of a spectre attack. Further, ARM documented a reserved argument for one of its instructions that effectively formalized a testing or "chicken" bit that created a similar function for ARM seemingly out of thin air.
Intel has also disclosed control registers that with a microcode update will allow the OS to start toggling behaviors for parts of the branch prediction logic, and elements like whether a co-resident SMT thread can influence the prediction of the other. These too are potentially debug or "chicken" bits that can turn functionality off for testing or a "just-in-case" way to take out a buggy element.

Intel's white paper also plugged their various explicit methods for enforcing control flow or restricting accesses to or from certain spaces. I interpret some of this as an indication that Intel in general was thinking about such exploits from buggy code, but was not expecting the weak link to be on the other side of its decoders.

Barring that, there are some patches with some proposed brute-force methods for overwhelming the predictors so that a hostile actor's prediction history and everyone else's is wiped.


AMD has hinted at non-specific mitigations for the array bounds variant of Spectre, and indicated that the branch prediction exploit was very unlikely without further elaboration.
One element of AMD's terse response to all this is that it does take pains to indicate it is only discussing the specific tests in the Spectre and Meltdown disclosures, which may point to a bit hiding behind pedantry. At least some mitigations for Spectre are showing up, with one apparently badly-titled SUSE patch stating it is going to disable branch prediction with no further clarification.

Giving AMD the benefit of the doubt that it's not being that super-literal genie you don't want to make wishes with, some possible reasons why it hasn't said as much could include a lack of readiness and some nuanced elements to its architecture that differ from the Haswell-based tests for the second variant of Spectre.

Haswell's predictors were apparently most easily reverse-engineered, and have a dependence on partial virtual addresses, which for certain types are also reduced down in a way that makes them more readily aliased.

Ryzen's front end changed a number of things from the Bulldozer era, including the introduction of a TLB hierarchy into the prediction pipeline, and the existence of a physical request queue from the predictors, indirect target array, and return stack might mean the Icache isn't being given virtual addresses.

The perceptrons and the indirect branch hardware take the PC, but have some additional connections to the TLBs. That may point to it being more difficult for a program in user space to get its branch history to hash to the same entries as a gadget in kernel space. Some elements may be more physically based, so a virtual address alias may get translated out before it can insinuate itself into the pipeline.

The indirect, branch target, and call/return hardware has a back and forth with the TLBs, which may have implications as to the availability of other page table entry information and what is loaded into the L1.

Perhaps part of AMD's caginess is that it wasn't ready or is unwilling to provide a clear description of its hashing functions.
The TLBs are also SMT-tagged, which may present some additional influence to the predictors and branch target calculations, which may change whether AMD would match some of Intel's control register and microcode changes.
Potentially there are other variations on the theme that do apply, since AMD does hint at OS and software mitigation as part of its Spectre response.
 
Last edited:


--

Guess which one of these 3 Fortnite servers got patched?
https://www.epicgames.com/fortnite/forums/news/announcements/132642-epic-services-stability-update
MwzsHRXQLVbmJ3pusNuGwn0ZQVjo9h8nRJHJhIo4d3XFqbvUYCj8EPq5jV7zeVEEcHAkraNBesbbNDW_UAlIjvw-hZBd80rKt7ZYl35nBIcfCCVyRvW5V7M7KVejv9tvVBHfgSKr
 
Seems that benchmarking the fix now is premature, there is a microcode/firmware update that should be installed in addition to the OS patch. Reports are coming in with further performance penalties.
 
And it looks like AMD offers also µcode updates for their CPUs to support the IBPB and IBRS features in a similar way as intel. I guess they will still maintain their CPUs are at "near zero risk" from Spectre v2 and this is just for the paranoid ones out there. Will be interesting to see where it will be used (at least OpenSuse and Redhat Enterprise will use it apparantly). And as a side note, the old K10 and Jaguar(?) CPUs don't need a µcode update to disable the indirect branch predictor, obviously there already existed an MSR to control that (globally at boot).
 
Last edited:
New @DavidGraham meaning that performance can and will tank even more once the entirety of the patches are fully deployed?
According to different reports, Yes. The vulnerabilities need the combination of Firmware + OS patches. Applications that saw a hit after the OS patch (eg, intensive I/O apps) will see an even greater hit after the firmware update, and some applications that saw no hit from the OS patch are expected to suffer a hit, maybe even a significant one. As the firmware adjusts the speculative behavior/branch prediction of the CPU itself.
 
Software fixes are part of the mitigation strategy for Spectre, where potentially vulnerable conditional checks need to be analyzed for whether a serializing instruction is needed or the code refactored. Other branch types like indirect branches may need to employ a more convoluted way to jump to their target.
The drivers do a lot of data management and calls to the kernel and subroutines, so it seems reasonable that mitigation measures are inbound.
 
why is a bios update required when MS can easily load the updated microcode? they've done it in the past, updated microcodes via windows update and loaded on boot kind of like linux I guess.
for 3+ years old hardware, this looks like the only realistic way to do it
 
https://cloudblogs.microsoft.com/mi...-and-meltdown-mitigations-on-windows-systems/
  • With Windows 10 on newer silicon (2016-era PCs with Skylake, Kabylake or newer CPU), benchmarks show single-digit slowdowns, but we don’t expect most users to notice a change because these percentages are reflected in milliseconds.
  • With Windows 10 on older silicon (2015-era PCs with Haswell or older CPU), some benchmarks show more significant slowdowns, and we expect that some users will notice a decrease in system performance.
  • With Windows 8 and Windows 7 on older silicon (2015-era PCs with Haswell or older CPU), we expect most users to notice a decrease in system performance.
  • Windows Server on any silicon, especially in any IO-intensive application, shows a more significant performance impact when you enable the mitigations to isolate untrusted code within a Windows Server instance. This is why you want to be careful to evaluate the risk of untrusted code for each Windows Server instance, and balance the security versus performance tradeoff for your environment.
For context, on newer CPUs such as on Skylake and beyond, Intel has refined the instructions used to disable branch speculation to be more specific to indirect branches, reducing the overall performance penalty of the Spectre mitigation. Older versions of Windows have a larger performance impact because Windows 7 and Windows 8 have more user-kernel transitions because of legacy design decisions, such as all font rendering taking place in the kernel. We will publish data on benchmark performance in the weeks ahead.
 
  • Windows Server on any silicon, especially in any IO-intensive application, shows a more significant performance impact when you enable the mitigations to isolate untrusted code within a Windows Server instance. This is why you want to be careful to evaluate the risk of untrusted code for each Windows Server instance, and balance the security versus performance tradeoff for your environment.

How can "a balance between security and performance" be even discussed for servers? (maybe being Windows servers it's not terribly important though , prolly just 1% of the market)

Also , for Linux : https://www.phoronix.com/scan.php?page=news_item&px=KPTI-Retpoline-Combined-Ubuntu

LE : Thought experiment : consider you have some crypto currency stored on an exchanges. How do you feel if they didn't apply the patches? Or about them not being transparent about wether they applied it or not?
 
Last edited:
Apple has updated iOS 11 and macOS High Sierra, https://support.apple.com/en-mn/HT201222
These are listed as the Spectre-focused sequels to the Meltdown-specific changes in 11.2--which Apple retroactively disclosed for the December patch in January.

How can "a balance between security and performance" be even discussed for servers? (maybe being Windows servers it's not terribly important though , prolly just 1% of the market)
Possibly for servers dedicated for use within an organization where there would be few cases of unvetted code running?

LE : Thought experiment : consider you have some crypto currency stored on an exchanges. How do you feel if they didn't apply the patches? Or about them not being transparent about wether they applied it or not?
Going by the headlines, I'd think the low-hanging fruit is that users have their currency on exchanges, where the threat vectors that come ahead of timing issues being organizational and systemic failures to adhere to best practices for security, software quality, financial regulations, or operational integrity.
And that probably more times than any of these exploits have been proven to have been used, exchanges that have accumulated enough have closed up shop after they say "we've totally been hacked by somebody not us, honest" and that's that.


On a different note, there's at least some statements that some of these apply to Nvidia's newest chips as well:
https://www.morningstar.com/news/ma...ystem-is-also-affected-by-security-flaws.html
 
How can "a balance between security and performance" be even discussed for servers? (maybe being Windows servers it's not terribly important though , prolly just 1% of the market)
Servers are not just public web servers, which I guess your "1%" refer to ;)
 
This vulnerability is probably going to play havoc with a diverse range of software-utilities/tools/ once the Microsoft patch is applied, some have reported it affecting tools such as Asus AI Suite since they updated.

More for those interested from a security software perspective the following spreadsheet (not created by me) lists those that are Microsoft patch compatible, added the registry change requirement/etc.
The focus is on security products, also worth noting the patch will also affect other solutions including VPN.
Definitely worth keeping link as a reference, nice work by the person who did it (Kevin Beaumont):
https://docs.google.com/spreadsheet...iuirADzf3cL42FQ/htmlview?usp=sharing&sle=true
 
Back
Top