CPU Security Flaws MELTDOWN and SPECTRE

Malo · Jan 4, 2018

Well they've been at it for 6 months so that's a good indicator AMD is safe with current attack vectors. And with AMD themselves stating without any PR fluff that they're not vulnerable to Meltdown, the powers that be feel that they should be excluded.

matthias · Jan 4, 2018

The performance implications for datacenters after meltdown patches are massive. Who will pay the higher cost? This could get nasty.

Gubbi · Jan 4, 2018

Pretty devastating for Intel in shared kernel cloud environments (Linux.docker, BSD.jails). There is really no safe way to run without PTI. Stuff that does a lot of io suffer the most; du: 50% negative impact, Postgres: 17-23%. That's two or more CPU generations worth of performance.

Cheers

msxyz · Jan 4, 2018

Since these attacks exploit a 'design quirk' in the implementation of speculative execution inside the CPU pipeline, it would be interesting to know on how many different processors and architectures the attack would be reproducible. While ARM processors take the lions share in today, there are still countless devices using some kind of PowerPC variant (and a simplified Linux as OS). Would those devices equally at risk?

On a side note, in the x86 space, speculative execution has been employed since mid '90s (P6 and 6x86) and the concept is even older and certainly not limited to x86 architectures.
Why nobody came up with this concept of attack before? Is this vulnerability a consequence of a CPU design fault, or something has changed at OS level, in the last few years, that made this attack possible?

Cyan · Jan 4, 2018

Malo said:
Well they've been at it for 6 months so that's a good indicator AMD is safe with current attack vectors. And with AMD themselves stating without any PR fluff that they're not vulnerable to Meltdown, the powers that be feel that they should be excluded.

well, KB4056892 is the critical update with the fix and it appeared ready to download in my Intel devices, my tablet and my laptop, but not on my AMD Ryzen 1500X computer. It may well be critical 'cos I built a 6 Vega 56 rig for another person using a pirate little pirate and creepy version of Windows 10 and the update appeared ready to install there (I had disabled automatic Windows updates and had to enable the service again, and voilá it was there).

Kaotik · Jan 4, 2018

Cyan said:
well, KB4056892 is the critical update with the fix and it appeared ready to download in my Intel devices, my tablet and my laptop, but not on my AMD Ryzen 1500X computer. It may well be critical 'cos I built a 6 Vega 56 rig for another person using a pirate little pirate and creepy version of Windows 10 and the update appeared ready to install there (I had disabled automatic Windows updates and had to enable the service again, and voilá it was there).

The patch should come on every rig regardless of the CPU, It includes other stuff besides this fix, too.
If it's not installed within 24h of the patch release you either have incompatible antivirus or whichever AV you use (including built-in Defender) forgot to add the reg key which tells it's compatible with the patch

Code:

Key=”HKEY_LOCAL_MACHINE” Subkey=”SOFTWARE\Microsoft\Windows\CurrentVersion\QualityCompat” Value=”cadca5fe-87d3-4b96-b7fb-a231484277cc” Type=”REG_DWORD”

Malo · Jan 4, 2018

Cyan said:
well, KB4056892 is the critical update with the fix and it appeared ready to download in my Intel devices, my tablet and my laptop, but not on my AMD Ryzen 1500X computer. It may well be critical 'cos I built a 6 Vega 56 rig for another person using a pirate little pirate and creepy version of Windows 10 and the update appeared ready to install there (I had disabled automatic Windows updates and had to enable the service again, and voilá it was there).

Microsoft don't turn on the floodgates for windows updates. Keep in mind this goes out to hundreds of millions of computers. Not everyone will get it at the same time so it's not really an indication of exclusion. I also assume the patch would go out irrespective and if there is an exclusion for AMD, it would be in the code. That way if you decide to move the drive to new PC build with an Intel, the patch would already be in place.

Cyan · Jan 4, 2018

Kaotik said:
The patch should come on every rig regardless of the CPU, It includes other stuff besides this fix, too.
If it's not installed within 24h of the patch release you either have incompatible antivirus or whichever AV you use (including built-in Defender) forgot to add the reg key which tells it's compatible with the patch

Code:

Key=”HKEY_LOCAL_MACHINE” Subkey=”SOFTWARE\Microsoft\Windows\CurrentVersion\QualityCompat” Value=”cadca5fe-87d3-4b96-b7fb-a231484277cc” Type=”REG_DWORD”

shall take a look again then.... Maybe it's not as critical as an update as in Intel machines? Checked again, the computer says "Your device is up to date". Of note is the fact that all the computers I have use the built in Defender av.

Malo · Jan 4, 2018

To confirm my assumptions, 2 out of 4 of my Intel Windows 10 PCs have received the patch so far, and my Ryzen Windows 10 PC received it as well. It's just the usual update queues so it's not completely overloading servers.

Grall · Jan 4, 2018

matthias said:
Who will pay the higher cost?

As always, ultimately it will be the end users, IE you and me...

Cyan said:
Checked again, the computer says "Your device is up to date". Of note is the fact that all the computers I have use the built in Defender av.

Intel rig here, and my windows update says my PC is up to date too. So don't fret, you'll get the patch soon enough...

Grall · Jan 4, 2018

Malo said:
my Ryzen Windows 10 PC received it as well.

Did the patch knock a chunk of performance off of your rig, or is Ryzen patch just a "coke light" version for the first vulnerability AMD confirmed they were susceptible to?

entity279 · Jan 4, 2018

Well, for desktops there isn't supposed to be any tangible performance loss. So all good, we're not running databases and virtualization servers too much i'd hope

Gipsel · Jan 4, 2018

DavidGraham said:
AMD and ARM are not in the total clear of Meltdown yet:

6.4 Limitations on ARM and AMD
We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.

Click to expand...

https://web.archive.org/web/20180103223603/https://meltdownattack.com/meltdown.pdf

Well the last part of that quote from the meltdown paper is arguably wrong (edit: as long as that means instructions with memory references which are prone to side channel attacks; a speculative add with register operands or something may happen, but it would have no detectable side effects as it is completely rolled back). Their toy example just shows that one may leak data through out of order speculative execution without crossing privilege levels (=> may be used to create roughly the same effect as spectre variant 1). The data value "leaked" in that toy example is (legally) accessible to the toy example program (it is actually part of it). AMD's hedge against meltdown is basically this: "Our CPUs don't speculate using memory references pointing to locations restricted to higher privilege levels than the running code". And the results (no success to exfiltrate kernel memory with that approach) appear to support that.

Malo · Jan 4, 2018

Grall said:
Did the patch knock a chunk of performance off of your rig, or is Ryzen patch just a "coke light" version for the first vulnerability AMD confirmed they were susceptible to?

Since I didn't perform any pre-patch benchmarking across different types of applications (apart from quick SSD test), I don't really know and I highly doubt it looking at some testing done so far by sources. Since I/O was apparently a factor, I did a quick SSD performance test a few times in my SSD utility, averaging the results, then did them again post-patch with no difference in reads or writes for 128kb or 4kb sizes.

3dilettante · Jan 4, 2018

BRiT said:
Exclude AMD from the PTI enforcement. Not necessarily a fix, but if AMD is so confident that they are not affected, then we should not burden users with the overhead"

So now its a matter of time to see if they can indeed trigger on AMD and if so, then the config will need to change to include them for PTI.

It's quite possible this specific vector won't hit AMD, for now.
Perversely, this exemption may lead to AMD systems becoming less secure over time if additional attacks on KASLR or some other exploit that can be preempted if the kernel is generally unavailable in user space comes along, since that was the original intent of the isolation changes. Additionally, the isolation changes may be the basis for further mitigation strategies for the side-channel attacks mentioned in some of the KAISER proposal's source documents that AMD was vulnerable to but were ignored or listed as "future work".

I think it would behoove AMD to operate under the assumption that it won't enjoy a status of being a beneficiary of a less expensive but provably less safe memory mapping decision forever.
Frankly, it may behoove AMD and Intel, or perhaps many stakeholders in hardware and software, to make some decisions about how to define and manage not just instruction or execution semantics, but the characterization and management of how knowable the behaviors or history a given section of the system or stretches of code can be.

msxyz said:
Since these attacks exploit a 'design quirk' in the implementation of speculative execution inside the CPU pipeline, it would be interesting to know on how many different processors and architectures the attack would be reproducible. While ARM processors take the lions share in today, there are still countless devices using some kind of PowerPC variant (and a simplified Linux as OS). Would those devices equally at risk?

On a side note, in the x86 space, speculative execution has been employed since mid '90s (P6 and 6x86) and the concept is even older and certainly not limited to x86 architectures.
Why nobody came up with this concept of attack before? Is this vulnerability a consequence of a CPU design fault, or something has changed at OS level, in the last few years, that made this attack possible?

Multiple trends, such as:

high-precision timers
desire for more performance in more complex system corners (need to optimize where things haven't been highly optimized, like system calls, synchronization, load/store hazards, correlating behavior across domains)
desire for better profiling, tuning, self-management (perf counters, trace data, event records)
desire for easier development and support
power efficiency/performance (ex. aliasing to simplify hardware and save power, power/clock management)
the increasing difficulty in exploiting browsers and operating systems
virtualization (reduce overhead, latency in a manner that interacts with virtual memory and OS)
hardened software leaning more heavily on isolation measures leveraging the kernel
different cycles in hardware design versus software
different cadences or silo-ed communication for the individual speculation measures
increasingly sophisticated nation-state and criminal organizations for IT crime
maturation of data technologies and large revenues in businesses using data or making data a business
time it took to take research from academic curiosity and realize it had broader potential
the rise of server/cloud infrastructure that ripped out the original assumptions about local control of the hardware

And so on.
I have some thoughts on what some mitigations could be in the future, but also what that might mean for designing and understanding these systems or developing for them. Being able to reason about system behavior and performance, or to understand why things may fall apart cuts both ways, as it's a question of what someone is able to know, and the power that knowledge imparts is directed based on the motivation of the person that has it.

Also, I'm curious about in-order architectures with run-ahead, or in-order architectures with explicit speculation measures that might have side effects, or what other ARM vendors (Nvidia Denver, Samsung's M cores, Qualcomm's server, etc.)

Similarly, if I have time I need to get some references on discussions about hardware speculation, speculation in general, and hardware/software management of system functions.
Linus Torvalds' strong words about speculation remind me of a raft of discussions about dynamic vs static performance measures, prediction vs predication or conditional moves, software vs hardware TLB management, weak memory ordering versus strong ordering with speculation, the decision by Linux and others to have kernel and user memory mappings in the same space, etc.
In isolation, I've found a number of such arguments interesting and compelling, but there's an interesting emergent property to when all those items came together--and then at some point everything was different.

swaaye · Jan 4, 2018

It's unfortunate that Nexus 9 support (NV Denver) recently ended. Right before KRACK too. What a mess security is these days. Lineage at least addressed KRACK in their distribution.

I wonder if NV Shield Tablet will get a security update. That uses Cortex A15. They did one for KRACK but really they have ended support for the tablet. I imagine the TV device will get something since they still sell it.

Android's shit end user support gets scarier every day. So many devices in use out there that have loads of exploits.

3dilettante · Jan 4, 2018

Nvidia's Denver might be subject to some variation on Spectre, although at two levels given its recompilation method coupled with undisclosed branch prediction and runahead execution in the shadow of a data cache miss.
If sufficient knowledge exists as to the ucode generator's capabilities (it does its own re-ordering and "optimizations" as well) or bugs, it could be possible to leverage Denver's speculation and recovery methods to load the L1, and possibly in a way that the native ARM code path does not leave as an obvious vulnerability.

For Meltdown, it's not entirely clear when faults handled in the pipeline. If there's a delay in processing a fault to a final exception-handling write-back phase like Intel and maybe the ARM A75, there could be a time window where a dependent bundle can fire off its own skewed load and hit the data cache before the error is found. There's several stages until writeback, although the architecture could have other safeguards.

Denver does have some kind of transactional handling of writes for its internal bundles, whose handling may or may not have side effects. One possibility, if the data cache is where transactional writes get held, is if that can affect eviction patterns.
Delayed fault detection that loads a value and uses Denver's read-modify-write capability might evict a line in a carefully prepped data cache before the bundle is aborted, and may have enough signal for timing analysis.

On the other hand, the ucode generator might be able to implement detection and mitigation for such problems as well, though it may get expensive covering up for software.

Malo · Jan 4, 2018

swaaye said:
I wonder if NV Shield Tablet will get a security update. That uses Cortex A15. They did one for KRACK but really they have ended support for the tablet. I imagine the TV device will get something since they still sell it.

Nvidia have already announced that patches for their SoCs are impending, including Shield. They're even building mitigation code into their GPU drivers.

Malo · Jan 4, 2018

So I've loaded the patch onto 2 of my Intel systems so far, and my Ryzen system. Looks like the Windows patch doesn't enable the necessary Meltdown protections on my Ryzen since it's not needed:

Ryzen 1700:
Speculation control settings for CVE-2017-5754 [rogue data cache load]

Hardware requires kernel VA shadowing: False

Both Intel's installed:
Speculation control settings for CVE-2017-5754 [rogue data cache load]

Hardware requires kernel VA shadowing: True
Windows OS support for kernel VA shadow is present: True
Windows OS support for kernel VA shadow is enabled: True
Windows OS support for PCID optimization is enabled: False

Malo · Jan 4, 2018

Grall said:
Did the patch knock a chunk of performance off of your rig, or is Ryzen patch just a "coke light" version for the first vulnerability AMD confirmed they were susceptible to?

So now I can answer you with certainty that the patch didn't slow anything down since it does not do anything on AMD CPUs

CPU Security Flaws MELTDOWN and SPECTRE

Malo

Yak Mechanicum

matthias

Gubbi

msxyz

Cyan

orange

Kaotik

Drunk Member

Malo

Yak Mechanicum

Cyan

orange

Malo

Yak Mechanicum

Grall

Invisible Member

Grall

Invisible Member

entity279

Gipsel

Malo

Yak Mechanicum

3dilettante

swaaye

Entirely Suboptimal

3dilettante

Malo

Yak Mechanicum

Malo

Yak Mechanicum

Malo

Yak Mechanicum