CPU Security Flaws MELTDOWN and SPECTRE

Kaotik · Jan 16, 2018

CSI PC said:
It is not that clear cut.
You notice AMD changed their view on "Variant 2" from not being applicable and Ryzen is safe to now being applicable but as "may be"/"possibly" since 11/01, also AMD were a bit dismissive of the three research teams and some of them have been on record to say they are not impressed with AMD's response to Spectre.
You can also say some vague comments around Intel as the researchers identified that some indirect branch prediction attacks worked on Skylake but not Haswell.
The only case for Ryzen is that the neural network is a more complex speculative behaviour, this does not necessarily make it better, anyway in context of the Spectre research paper the OS update will not fix Ryzen, which is probably why AMD updated their point about "Variant 2".
Time will evolve these Spectre concept attacks to be even more successful.

I am not being critical of AMD, just was including them to show how broad this problem is for modern CPU design..

No it hasn't.
AMD's original statement word to word about variant 2, bolding by me:

Differences in AMD architecture mean there is a near zero risk of exploitation of this variant. Vulnerability to Variant 2 has not been demonstrated on AMD processors to date.

They never said it's not applicable to their CPUs, they said it's possible in theory but not been demonstrated and it still hasn't.

3dilettante · Jan 16, 2018

CSI PC said:
I was more interested in Spectre than Meltdown, all reports are saying Meltdown is containable whereas Spectre requires a redraw of CPU hardware to fully mitigate its weakness.

The issue with Spectre extends beyond just correcting hardware. The metrics for what is "correct" from an architectural standpoint likely need serious review. Up to this point, no definition of wrong extended to side effects not relevant to the committed state of a thread's execution.
Broken down to the sub-domains of the hardware, the caches are doing nothing wrong, the TLBs are properly flagging accesses, the pipeline is properly quashing invalid results, and the branch predictor is accumulating history and providing targets based on it.
The definition of incorrect when dealing with these exploits is dealing with concepts many of these elements do not understand, and have up to this point never been asked to track.
Some parts of this problem space are demanding something understand the context of an access and side effect that has before now not been measured and that no party in hardware or software has taken responsibility for. There's no consistent agreement that I'm aware of for how this should be characterized or measured.

Sure System Z has speculative execution but this system is very different to many out there and also with a scope heavily focused towards security,

Most indications are from the various other methods in place from IBM or other vendors that security is a focus.
Making it a high priority doesn't mean they foresaw this vector.
Strong layers of system-level isolation or software protection can increase complexity in the layers involved in hardware rollback, which can widen the time window or introduce other side-effects and bugs.

it has a unique set of R&D engineers (from my time working with IBM engineers albeit 10+ years ago);

I do not believe omniscience is one of their characteristics. Many of the security measures being added were in a different direction, generally isolation against software compromise like hostile VMs, compromised IO, DRAM snooping, bus glitching, or in some cases a hostile OS or hypervisor.

one complaint from some of those researchers that security has become less of afocus in terms of CPU design these days, trade-off with performance and ease of use-flexibility.

Was this a complaint in the Spectre or Meltdown papers, or in one of the disclosures?
There's actually a number of side channels that have had mitigation measures or coincidental resistance added in CPUs outside of Meltdown and Spectre, so I am curious which people are complaining and where they are coming from.

CSI PC · Jan 16, 2018

3dilettante said:
The issue with Spectre extends beyond just correcting hardware. The metrics for what is "correct" from an architectural standpoint likely need serious review. Up to this point, no definition of wrong extended to side effects not relevant to the committed state of a thread's execution.
Broken down to the sub-domains of the hardware, the caches are doing nothing wrong, the TLBs are properly flagging accesses, the pipeline is properly quashing invalid results, and the branch predictor is accumulating history and providing targets based on it.
The definition of incorrect when dealing with these exploits is dealing with concepts many of these elements do not understand, and have up to this point never been asked to track.
Some parts of this problem space are demanding something understand the context of an access and side effect that has before now not been measured and that no party in hardware or software has taken responsibility for. There's no consistent agreement that I'm aware of for how this should be characterized or measured.

Most indications are from the various other methods in place from IBM or other vendors that security is a focus.
Making it a high priority doesn't mean they foresaw this vector.
Strong layers of system-level isolation or software protection can increase complexity in the layers involved in hardware rollback, which can widen the time window or introduce other side-effects and bugs.

I do not believe omniscience is one of their characteristics. Many of the security measures being added were in a different direction, generally isolation against software compromise like hostile VMs, compromised IO, DRAM snooping, bus glitching, or in some cases a hostile OS or hypervisor.

Was this a complaint in the Spectre or Meltdown papers, or in one of the disclosures?
There's actually a number of side channels that have had mitigation measures or coincidental resistance added in CPUs outside of Meltdown and Spectre, so I am curious which people are complaining and where they are coming from.

In Spectre paper and also publicly, and the indicator is you can build the security around the speculative prediction requirements although the scale of opportunity for attacks/exploits make it challenging; they only outlined some ways of the Spectre concept as an attack vector.
From the mainframe Z perspective, it was always designed to be virtualised and fully secure while also being more of a pain to work with than AS400/Power series; used to work with system engineers who had to maintain both in large scale deployments.
The issue here is around performance/flexibility over security rather than omniscience and just how far one takes the scope of security, context specific to Spectre.

In summary they said:

Software security fundamentally depends on having a clear common understanding between hardware and software developers as to what information CPU implementations are (and are not) permitted to expose from computations. As a result, long-term solutions will require that instruction set architectures be updated to include clear guidance about the security properties of the processor, and CPU implementations will need to be updated to conform.

More broadly, there are trade-offs between security and performance. The vulnerabilities in this paper, as well as many others, arise from a longstanding focus in the technology industry on maximizing performance.

CSI PC · Jan 16, 2018

Kaotik said:
No it hasn't.
AMD's original statement word to word about variant 2, bolding by me:

They never said it's not applicable to their CPUs, they said it's possible in theory but not been demonstrated and it still hasn't.

And its purpose was to direct all press towards Intel and say Intel have problems while AMD with Ryzen is fine, plenty of articles out there like that.
Pretty clear IMO the purpose and intent of the quick initial reponse from AMD, which then had to be clarified 10 days later.
Like I said, some of the original researchers who identified these vulnerabilities were unimpressed with AMD's initial response, I quoted one that was easy to find in TheRegister.

3dilettante · Jan 16, 2018

CSI PC said:
In Spectre paper and also publicly, and the indicator is you can build the security around the speculative prediction requirements although the scale of opportunity for attacks/exploits make it challenging; they only outlined some ways of the Spectre concept as an attack vector.

My interpretation of the Spectre paper was that it was more understanding of the situation, where hardware behaviors were evaluated traditionally around data integrity and performance. It was assumed that there were no security implications for this kind of speculation, and that is an assumption that was not challenged until recently.
The performance focus was described as long-standing, and since we have examples of architectures going back decades affected, I am curious which statements were made about a recent downturn in security focus.

From the mainframe Z perspective, it was always designed to be virtualised and fully secure while also being more of a pain to work with than AS400/Power series; used to work with system engineers who had to maintain both in large scale deployments.

It was also designed to be backwards compatible with generations of CISC mainframes, significant millicode involvement, large-scale multiprocessing, an increasing set of enterprise features, IO, and RAS.
Not that I've seen in-depth documentation for the latest chips, but from the description of the Z196, a full 5 stages of pipeline were devoted to checkpointing and error correction--which widens the time window for exploits further.
IBM's complex coherence protocols were credited as one use case for checkpointing in Power, though I forget if that was in Power 6 or later. Those protocols underpin some of the security and scale-out capabilities, but open up time for exploits and more side effects. Perhaps also interesting is that error correction, lockstep execution, DRAM integrity measures, coherence checkpoints, and other features might allow for timing attacks or measurements even from committed software state.

The issue here is around performance/flexibility over security rather than omniscience and just how far one takes the scope of security, context specific to Spectre.

I would prefer a citation as to this being done purposefully despite knowledge of exploitability versus assumptions being proven wrong. Some of the assertions to that effect appear to overstate this angle, given pretty much every CPU with a long enough pipeline or any notable performance aspirations is affected by at least some of this, despite reams of other security features being introduced in recent years.

Malo · Jan 16, 2018

CSI PC said:
And its purpose was to direct all press towards Intel and say Intel have problems while AMD with Ryzen is fine, plenty of articles out there like that.
Pretty clear IMO the purpose and intent of the quick initial reponse from AMD, which then had to be clarified 10 days later.

You can interpret it however you like but keep in mind that the only company focusing on the fact that other companies are affected and specifically naming them in their PR releases, was in fact Intel.

DavidGraham · Jan 16, 2018

Malo said:
You can interpret it however you like but keep in mind that the only company focusing on the fact that other companies are affected and specifically naming them in their PR releases, was in fact Intel.

That's because some AMD affiliates leaked to the press that the vulnerabilities are exclusive to Intel before the official agreed upon announcement.

3dilettante · Jan 16, 2018

DavidGraham said:
That's because some AMD affiliates leaked to the press that the vulnerabilities are exclusive to Intel before the official agreed upon announcement.

A recent article on Arstechnica indicated the embargo was short-circuited by AMD's programmer posting the architectural justification on a public mailing list for it being exempt from the KPTI changes. That allegedly was enough to take the mounting speculation about the fast patch schedule from theory to proof of concept rather quickly.

The following appears to corroborate Arstechnica's claim that AMD's disclosure was a major factor in accelerating the timeline.
https://www.theverge.com/2018/1/11/...tre-disclosure-embargo-google-microsoft-linux

Perhaps key to Intel being the lightning rod was an element of the tech press missing the significance of parallel work for other architectures, x86's higher level of exposure due to the datacenter impact, and people getting tunnel vision with some of the rather sensational elements of the AMD/Intel Linux changes, given the rather undiplomatic wording of the vulnerability flags and the lack of discretion from AMD. I'm not sure the choice of leaking the probable mechanism was done to purposefully raise the stakes, given AMD's response seemed to be less prepared for the early embargo lift.

The rest of the industry didn't seem to mind the distraction.

Kaotik · Jan 16, 2018

CSI PC said:
And its purpose was to direct all press towards Intel and say Intel have problems while AMD with Ryzen is fine, plenty of articles out there like that.
Pretty clear IMO the purpose and intent of the quick initial reponse from AMD, which then had to be clarified 10 days later.
Like I said, some of the original researchers who identified these vulnerabilities were unimpressed with AMD's initial response, I quoted one that was easy to find in TheRegister.

DavidGraham said:
That's because some AMD affiliates leaked to the press that the vulnerabilities are exclusive to Intel before the official agreed upon announcement.

The initial outbreak was just about Meltdown (which AMD is completely immune to), Spectre came out only when the industry officially told about the details about these vulnerabilities.

DavidGraham · Jan 17, 2018

Kaotik said:
The initial outbreak was just about Meltdown (which AMD is completely immune to), Spectre came out only when the industry officially told about the details about these vulnerabilities.

Yes, conveniently hiding the rest of the vulnerabilities to advance the narrative of Intel being the only one affected, and then obfuscating the fact AMD is exposed to some of them as well. This is what @CSI PC meant by the whole thing. AMD sabotaged the joint plan to announce all vulnerabilities together after the fixes have already been rolled out, for the sake of gaining some headlines over Intel, then backpedaled on their stance once the storm has calmed off a bit. However, the press caught wind of that change in AMD's stance and announced it, this is what he meant by AMD receiving criticism for their actions.
https://finance.yahoo.com/video/amd-changes-stance-admits-chip-005425549.html

3dilettante · Jan 17, 2018

DavidGraham said:
Yes, conveniently hiding the rest of the vulnerabilities to advance the narrative of Intel being the only one affected, and then obfuscating the fact AMD is exposed to some of them as well.

In the case of Spectre, it would be double or triple the level of suspicion of purposeful sabotage if they had disclosed that, too. I don't think the code changes or developer correspondence for the Linux kernel or compiler changes were linked to the submitted Meltdown changes. How Spectre could be brought up in that context with plausible deniability is unclear. In addition, since AMD did need a variation of the changes for Spectre, "leaking" in the same way would involve picking a fight or creating a disagreement where there wasn't one.

I am uncertain about the how AMD's engineer happened to disclose the speculative memory vector in the way he did--as it does seem questionable in timing and choice of venue, but on the other hand it didn't look like AMD was nearly as prepared to capitalize on it when Google lifted the veil soon after.

AMD sabotaged the joint plan to announce all vulnerabilities together after the fixes have already been rolled out, for the sake of gaining some headlines over Intel, then backpedaled on their stance once the storm has calmed off a bit.

It appears AMD caused problems per a number of reports. The other side of the coin is that baldly damaging the welfare of major stakeholders like Microsoft, Amazon, and multiple data center clients wouldn't be compensated by convincing a smattering of forums. Those partners aren't going to have problems figuring out the extent of the vulnerability industry-wide or whom to blame for making a very troubling situation worse.

However, the press caught wind of that change in AMD's stance and announced it, this is what he meant by AMD receiving criticism for their actions.
https://finance.yahoo.com/video/amd-changes-stance-admits-chip-005425549.html

As shifty as AMD was trying to be in its initial statement, I do not agree that the Yahoo article's interpretation is consistent with the meaning of the text.
Given the lead time for microcode and code changes which AMD rolled out rather soon after Intel, they likely didn't decide last week to start working on Spectre mitigation. There are changes referencing AMD's retpoline variant at least as far back as 4-5 January, and references to inbound microcode changes being needed for certain AMD chips in particular would mean AMD would be aware that it would be contradicting very quickly any claim that its chips didn't need additional fixes.

Esrever · Jan 17, 2018

They should have just announced 1 vulnerability at a time instead of 3. It would make it so Intel couldn't just hide behind "it affects everyone" narrative.

Also would clear up confusion about what AMD said about what. Even when AMD's initial press release clearly outlines each situation with a table clearly addressing each variant, people are still confused. They keep pointing to AMD's comments about the Meltdown patches in Linux to be AMD's stance on all 3 variants. AMD was clearly trying as hard as it can to stop patches that would lower their CPU performance when they didn't need to. Oddly, the confusion plays well to Intel's point of view as a lot of people still don't know what the update that cut performance is supposed to fix (meltdown) and what part AMD is fixing as well (spectre). The more people who stay confused and thinks everything is just as broken, the more Intel can just say, not our problem.

And then you have articles by people, probably with money in Intel stock, publishing articles after AMD releases Spectre fixes that says "AMD Changes Stance, Admits to Chip Vulnerabilities" claiming its something new. AMD said from the beginning that Spetre vulnerabilities will be fixed in microcode updates and never claimed they were completely immune to it. It's almost as if updates sometimes are applied just to be sure. With this same dumb logic these articles are using, we should say nvidia GPUs are vulnerable because they put out a driver update as well.

And then you have sales people from service providers saying complete bullshit.

It almost seems like there is some powers behind the scene trying to make the whole situation as good as it can for Intel by creating as much confusion as possible and then putting bullshit out on AMD as well.

Arnold Beckenbauer · Jan 17, 2018

https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectre-mitigations-in-msvc/

3dilettante · Jan 17, 2018

Esrever said:
They should have just announced 1 vulnerability at a time instead of 3. It would make it so Intel couldn't just hide behind "it affects everyone" narrative.

Well, it's more like it affects almost everyone, but still.
It's multiple vulnerabilities with a common source mechanism and overlapping research and development periods. Once the use of speculated loads using a speculatively loaded index came out for any of the variants, it would have been effectively a disclosure of the remainder for hostile actors.
I'd rather that the ecosystem at large get to protecting itself as soon as possible versus denying Intel a couple of PR days before the confusion would have occurred in the form of widespread third-party reverse-engineering or actual exploits.

Also would clear up confusion about what AMD said about what. Even when AMD's initial press release clearly outlines each situation with a table clearly addressing each variant, people are still confused. They keep pointing to AMD's comments about the Meltdown patches in Linux to be AMD's stance on all 3 variants. AMD was clearly trying as hard as it can to stop patches that would lower their CPU performance when they didn't need to.

That situation seems to be at least partly of AMD's own doing due to its engineer disclosing a compromising piece of information in an incomplete form and uncontrolled venue.
AMD's security advisory, while technically accurate as far as we can tell, was not comprehensive in its description of the situation or what AMD was doing. That's why I lean more towards a mistake than it being part of some plan, because it would have been a stupid plan that was poorly executed anyway.

Oddly, the confusion plays well to Intel's point of view as a lot of people still don't know what the update that cut performance is supposed to fix (meltdown) and what part AMD is fixing as well (spectre). The more people who stay confused and thinks everything is just as broken, the more Intel can just say, not our problem.

For consumers on Windows, the performance picture seems to be the less clear than it is for Linux. KPTI is a bigger server hit, and possibly due to differences in how Linux and Window manage pages the hit seems more significant for Linux.
For client loads and Windows, the retpoline and bounds check mitigations seem more notable, with SSD throughput being a stand-out regression that might be more noticeable for many.
The initial spike in CPU usage due to the initial AWS and VM patches being rushed out in their first functional form has from some reports started to subside.

Deleted member 2197 · Jan 17, 2018

Download: inSpectre Meltdown and Spectre Check tool for Windows from Gibson Research

This InSpectre utility was designed to clarify every system's current situation so that appropriate measures can be taken to update the system's hardware and software for maximum security and performance.

Gibson warns that his tool is new and that conclusions on the output of the tool should be carefully considered as he writes, “it has been carefully tested under as many different scenarios as possible. But new is new, and it is new. We may well have missed something. So please use and enjoy InSpectre now.

http://www.guru3d.com/news-story/download-inspectre-meltdown-and-spectre-check-tool.html

msxyz · Jan 18, 2018

During the weekend I made some tweaks to Spectre PoC to run on a 16 years old laptop with a Pentium4 and Windows XP and it works also there, albeit it's able to retrieve only about half of the data it's supposed to read (maybe due to sloppy coding on my part, of this I've few doubts!

). It's even harder for me to believe NOBODY came up with this idea before: the actual exploit is only a handful of instructions in C

Lightman · Jan 18, 2018

msxyz said:
During the weekend I made some tweaks to Spectre PoC to run on a 16 years old laptop with a Pentium4 and Windows XP and it works also there, albeit it's able to retrieve only about half of the data it's supposed to read (maybe due to sloppy coding on my part, of this I've few doubts! ). It's even harder for me to believe NOBODY came up with this idea before: the actual exploit is only a handful of instructions in C

I've got old K6-2+ system and Athlon XP, would be interesting to see if they also are affected.

Kaotik · Jan 18, 2018

https://newsroom.intel.com/news/fir...ial-performance-data-for-data-center-systems/

Reboot-problems affect all Core-chips since Sandy Bridge, not just Haswell & Broadwell (okay, they don't explicitly mention Coffee Lake but everything else is included)

CSI PC · Jan 18, 2018

3dilettante said:
My interpretation of the Spectre paper was that it was more understanding of the situation, where hardware behaviors were evaluated traditionally around data integrity and performance. It was assumed that there were no security implications for this kind of speculation, and that is an assumption that was not challenged until recently.
The performance focus was described as long-standing, and since we have examples of architectures going back decades affected, I am curious which statements were made about a recent downturn in security focus.

It was also designed to be backwards compatible with generations of CISC mainframes, significant millicode involvement, large-scale multiprocessing, an increasing set of enterprise features, IO, and RAS.
Not that I've seen in-depth documentation for the latest chips, but from the description of the Z196, a full 5 stages of pipeline were devoted to checkpointing and error correction--which widens the time window for exploits further.
IBM's complex coherence protocols were credited as one use case for checkpointing in Power, though I forget if that was in Power 6 or later. Those protocols underpin some of the security and scale-out capabilities, but open up time for exploits and more side effects. Perhaps also interesting is that error correction, lockstep execution, DRAM integrity measures, coherence checkpoints, and other features might allow for timing attacks or measurements even from committed software state.

I would prefer a citation as to this being done purposefully despite knowledge of exploitability versus assumptions being proven wrong. Some of the assertions to that effect appear to overstate this angle, given pretty much every CPU with a long enough pipeline or any notable performance aspirations is affected by at least some of this, despite reams of other security features being introduced in recent years.

Your comments regarding mainframe, are you coming more from theory or working at a system engineer level on both that and say AS/400-Power x? - asking as if trying to see if this is a theoretical debate, albeit it does not really matter due to the vulnerability exists anyway.
From working with IBM the focus in my experience was specifically virtualisation and security around that without same level of flexibility and ease of use as say AS/400, virtualisation is integral to part of the context around focus on security leakage from CPU-software-memory; same theory you can build security around speculative execution but at a cost to performance.
Power series is more similar to AS/400 than it is to the System Z architectures.

The paper and some of the security experts are coming from this with the understanding that for years it has always been a concept of attack via this route, this is not something new and so why the paper frames the sentence as they did along with what some of those researchers say in public; just finally they have been successful, or more worringly it was successful years ago and has been exploited quietly by certain state organisations across the world.
So yes one could say done purposefully with focus of performance over security.

msxyz · Jan 18, 2018

Lightman said:
I've got old K6-2+ system and Athlon XP, would be interesting to see if they also are affected.

The PoC I've compiled requires at least a CPU supporting SSE2 (more specifically the rdtsc instruction). This can avoided with a timer loop instead but I don't know how effective would be in practice. There are another couple of 'odd' instructions which are not part of the standard x386 set (cflush and mfence) but I'm fairly sure they've been supported since the Pentium CPU.

I'll let you know if I can come up with something that can work on really old PCs.

CPU Security Flaws MELTDOWN and SPECTRE

Kaotik

Drunk Member

3dilettante

CSI PC

CSI PC

3dilettante

Malo

Yak Mechanicum

DavidGraham

3dilettante

Kaotik

Drunk Member

DavidGraham

3dilettante

Esrever

Arnold Beckenbauer

3dilettante

Deleted member 2197

Guest

msxyz

Lightman

Kaotik

Drunk Member

CSI PC

msxyz