CPU Security Flaws MELTDOWN and SPECTRE

The following has some updated information about some of the latest instructions added with the microcode updates.
https://arstechnica.com/gadgets/201...e-and-meltdown-patches-will-hurt-performance/

Zen's indirect branch predictor apparently does not alias addresses when selecting targets, which is one area where it differs from the more readily reverse-engineered Haswell predictor.
Abusing a branch to shift history requires using the target's branch address, rather than one that merely aliases to a subset. Using the whole address seems like it's more expensive. I'm curious if there's an influence here from the TLB.

This is apparently why Zen is not getting the IBRS mode setting (prior cores will). This mode initiates a barrier at various transitions based on settings.
Zen does get the IBPB instruction and STIBP. The former being some kind of barrier that will block branch history prior to it from affecting what follows, and the latter a setting that keeps SMT threads from influencing each other's indirect prediction entries.
For Intel, the IBPB instruction seems like it wipes the indirect predictor, among other things. AMD's version was described as a subset somewhere in some patch discussions, but what the superset is and what parts of it are included isn't clear.

Elsewhere, I did find that IBM's Power 7 and Power 8 are being updated to handle Meltdown and Spectre. Power 9 has something inbound. Earlier versions are under review. I'm curious about the in-order but very aggressive Power 6.
Fujitsu seems to be admitting at least some of these vulnerabilities apply to some SPARC variants, not sure which of them.
News articles indicated that Cavium admitted that the ARM ThunderX2 (formerly Broadcom's Vulcan core that had its sights set on a Haswell-like target) is affected by Meltdown and Spectre.
Not sure about MIPS and the most recent OoO cores.
 
But it'a @ 1080p with a high end card, fps > 100, and they made attepmts to make it CPU limited.
Never the less the issue is real. The Witcher 3 run involves a tour through the city, which stresses the streaming system of the game, this is likely the culprit of the hit in performance (we know the patches affect storage performance more). Which means open world games could see an even bigger hit than this. Games like Fallout 4, GTA 5, and the likes.
 
Fujitsu's list: https://sp.ts.fujitsu.com/dmsp/Publ...E2017-5715-vulnerability-Fujitsu-products.pdf
Fujitsu SPARC servers are "under investigation".

Does Oracle use SPARC cpus?

IBM: https://www.ibm.com/blogs/psirt/potential-impact-processors-power-family/
  • Firmware patches for POWER7+ and POWER8 platforms are now available via FixCentral. POWER9 patches will be available on January 15. We will provide further communication on supported generations prior to POWER7+ including firmware patches and availability.
  • Linux operating systems patches are now available through our Linux distribution partners Redhat, SUSE and Canonical.
  • AIX and IBM i operating system patches will be available February 12. Information will be available via PSIRT.
------------------------
https://www.hardocp.com/news/2018/01/11/amd_doubles_down_on_previous_spectre_meltdown_statments
https://www.reuters.com/article/bri...-gpz-variant-1-or-gpz-variant-2-idUSFWN1P60X7
AMD SAYS THERE IS NO CHANGE TO AMD’S POSITION ON OUR SUSCEPTIBILITY TO GPZ VARIANT 1 OR GPZ VARIANT 2 (COLLECTIVELY CALLED SPECTRE IN NEWS REPORTS)
 
Last edited:
https://newsroom.intel.com/news/intel-security-issue-update-addressing-reboot-issues/

Haswell & Broadwell based machines are seeing unexpected reboots after patches

One plausible cause from https://www.realworldtech.com/forum/?threadid=174100&curpostid=174102 is triple-faults resulting from missed changes to fault handlers that have assumptions about the mapping of pages in the kernel and user domains after KPTI. Fault handling can itself cause a fault, and if specific configurations or software have old assumptions buried in their deepest layers, things can nest further. However, x86 cuts things off at three faults deep.

This is something that came up withe PS4 jailbreak, which involved a browser engine exploit and modifications to an unsecured interrupt descriptor table that would allow faults to be rerouted for arbitrary code execution at elevated privilege.
This exploit was made more difficult by the hack operating at two faults already, and a triple fault would crash the PS4.
https://cturt.github.io/ps4-3.html

On a side note, HPE has reported that Intel indicated Itanium is immune to all three vulnerabilities. I did check to see there are several reasons why Meltdown doesn't apply. There is branch predition, but it's possible the specific methods or indirect predictor structures do not apply.
Bounds check escape would seemingly not happen due to it not being able to speculate on a long miss--in hardware.
I'm curious if some of the more aggressive code transformations could cause something like this to happen, but since it's software Intel could just shrug it off.
 
So Itanic really was the future afterall? :runaway:


Pretty amazed by how widespread this has wound up being, I figured it'd be something limited to specific architecture rather than a generic widespread issue across many types of chip :oops:
 

This demo shows the reconstruction of a previously viewed photo which is still in memory using the Meltdown vulnerability. The video also demonstrates that the read data does not have to be cached in L1, as the image is larger than the L1 cache (40kb photo vs. 32kb L1 cache).
 
So Itanic really was the future afterall? :runaway:


Pretty amazed by how widespread this has wound up being, I figured it'd be something limited to specific architecture rather than a generic widespread issue across many types of chip :oops:
I tried to review some of the remaining articles on Itanium and its variants, and from what I can see part of the reasons for its immunity to the branch injection variant of Spectre are its focus on tighter software control of the speculative pipeline, and keeping to a two-level scheme and more tightly linked L1 cache line and branch data structure that keeps predictions more 1:1 between predicted branch and its originating thread. Indirect branches are handled with an explicit set of registers, which would not be touched by a different thread trying to spoof a predictor.
However, that also means the techniques in question predate some of the best predictors we've had for years, as the global-type or shared history predictors can be eerily effective at finding correlations in behavior and making good use of the limited storage capacity of the prediction logic--at the cost of a lack of isolation for a security threat years down the road that the designers hadn't seen coming.
Further, there were certain elements that would have made it harder, such as a front-end pipeline that did have a facility in place to detect incorrectly predicted jump addresses that was able to intercept it long before it could get to an execution stage.
Additionally, there's an element that is sort of in the ballpark of why I think Zen's prediction path has some extra difficulty in being exploited--Itanium's specific handling of its L1 caches and their TLBs means the TLB is insinuated into in the prediction path, in some ways like Zen's front end. For Itanium's use of pre-validated L1 caches, the hit logic uses a set of values that tie a TLB entry to an L1 line that belongs to it. Since a good chunk of the predictor's resources are tied on a per-line basis, and you can't get to a line without getting through the TLB's full address check, that's not something that can be spoofed from another thread using a different address.

I'm not sure if Zen does anything like this, but the specific shortcut of linking an entry to the TLB translation that spawns it creates a more tight link between a branch and its direct address, and the TLB can provide a smaller set of values that can address a specific subset of the branch buffers. It's not without downsides like needing the TLB to be in a very tight linkage, and potentially limiting the capacity of other parts of the pipeline based on the the TLB's storage limitations. Involving it might explain what AMD said about using the whole address, which can get expensive when trying to track many branches.

Bounds check escape is negated in hardware because of the inability to get past the check in software without stalling. The prediction logic is more limited to steering the front end and those stages between fetch and execution, rather than guiding execution past that point. However, saying that makes Itanium immune requires shifting the blame to any software using predication, non-faulting and advance loads, and a pipeline designed to enable the compiler to explicitly build code that can race past bound checks, with explicit cleanup code. To decide not to have that is to give up a notable reason for using EPIC at all.

The Meltdown immunity stems from this explicit software stance. Intel was forced to make a choice with its speculative load functionality about how it would handle speculative loads that might cause a fault, and there was no luxury of having hardware intervention and cleanup. If they didn't block the value from use, there was no telling what would happen since EPIC would be handing the bad data over to the program. So Itanium adds a flag or special value indicating a piece of data is invalid, and the actual value is dumped. If the speculative code isn't smart enough to dump it, using that flagged data throws a likely fatal error. One of the common sources of trouble with software using Itanium is that it takes that flag very seriously, and its wide-ranging speculation could leave booby-trapped registers hanging around from rolled-back paths or subroutines in a nest of very complex code transformations, any of which could kill the program long after the point that created the value and with little data to track down the bug.


*Quote in post: "This demo shows the reconstruction of a previously viewed photo which is still in memory using the Meltdown vulnerability. The video also demonstrates that the read data does not have to be cached in L1, as the image is larger than the L1 cache (40kb photo vs. 32kb L1 cache)."
Just to clarify what that quote is claiming, that is trivially true. The way Meltdown vulnerability works requires carefully preparing the cache so that a controlled eviction can be timed, and there's a limited amount of data per attempt that can be communicated. The cache needs to be cleared at first, then a load loop run through afterwards to see which specific lines were evicted. That means almost all the cache cannot be used, not that it's necessary to get the data eventually. The cache's limited combination of lines that can be evicted limits the number of bits from the internal value that can be leaked at one time, and then there's a full reset cycle to try to get the next set of bits, plus extra attempts to overwhelm any noise due to other events causing evictions or context switches. The extraction rate is very low because of all this.
 
Well AWS was an example of how Meltdown patch hit cloud services, but seems fine tuning of the patch may be helping with performance since 12th, it seems no other explanation given by reports.
Stability issues still seem to be reported though.

Update – Jan 12, 2018

As of 10:00 UTC this morning, we are noticing a steep reduction in CPU usage across our instances. It is unclear if there are additional patches being rolled out, but CPU levels appear to be returning to pre-HVM patch levels.

New EC2 hot patches for Meltdown/Spectre rolling out? Previous CPU bumps appear to be dropping off starting after 10:00 UTC this morning
https://blog.appoptics.com/visualizing-meltdown-aws/
 
Last edited:
I think Spectre is going to be a massive headache for nearly every modern generation CPU with regards to the branch prediction vulnerability.
Werner Haas who was one of the three teams involved in identifying these vulnerabilities said:
"The [Spectre] attack scenario is not as simple as user code reading kernel data, as it is conceivable to have cross-application attacks without OS involvement," Haas said. "On the other hand, branch prediction or speculation is such an integral part of high-performance CPUs that I lack the fantasy for a straightforward micro-architectural fix.

"So a generic solution as with Meltdown (either fix protection information processing in the pipeline, or change virtual memory handling in the OS) seems unlikely. As a consequence, I expect combined hardware/software mitigations with the caveat that plugging Spectre holes might become an ongoing process."

Defending against Spectre will involve trade-offs beyond the already widely reported processor performance hits, Haas added.

"I suspect we will see a compromise between legacy software support, energy efficiency goals, and security requirements. The three new capabilities (= MSRs) announced by Intel smell like testability features that help address some of the issues immediately. As such they are probably not ideally suited to counter Spectre attacks. Longer term, I wish there was a broader discussion on what kind of Branch Prediction Unit control would be useful."

And even though AMD has stated Spectre does not affect Ryzen, they then released an update 11/01 saying it "could"/"may" and will be releasing microcode fix.
GPZ Variant 2 (Branch Target Injection or Spectre) is applicable to AMD processors.
  • While we believe that AMD’s processor architectures make it difficult to exploit Variant 2, we continue to work closely with the industry on this threat. We have defined additional steps through a combination of processor microcode updates and OS patches that we will make available to AMD customers and partners to further mitigate the threat.
  • AMD will make optional microcode updates available to our customers and partners for Ryzen and EPYC processors starting this week. We expect to make updates available for our previous generation products over the coming weeks. These software updates will be provided by system providers and OS vendors; please check with your supplier for the latest information on the available option for your configuration and requirements.
  • Linux vendors have begun to roll out OS patches for AMD systems, and we are working closely with Microsoft on the timing for distributing their patches. We are also engaging closely with the Linux community on development of “return trampoline” (Retpoline) software mitigations.
Against that background is the fact the researcher actually tested the vulnerability against Ryzen and mention it in their papers:
1.3 Targeted Hardware and Current Status Hardware.
We have empirically verified the vulnerability of several Intel processors to Spectre attacks, including Ivy Bridge, Haswell and Skylake based processors.
We have also verified the attack’s applicability to AMD Ryzen CPUs.
Finally, we have also successfully mounted Spectre attacks on several Samsung and Qualcomm processors (which use an ARM architecture) found in popular mobile phones.
Current Status.
Using the practice of responsible disclosure, we have disclosed a preliminary version of our results to Intel, AMD, ARM, Qualcomm as well as to other CPU vendors. We have also contacted other companies including Amazon, Apple, Microsoft, Google and others. The Spectre family of attacks is documented under CVE-2017-5753 and CVE-2017-5715.

So I wondered about IBM with their background also in RISC processors such as the good old RS/6000 that in theory should be fine and so looked beyond the modern Power x series with the issue; and yep was surprised the vulnerability for Spectre exists even for IBM System Z series mainframes - not saying this is based upon RS/6000 but more from a design team background knowledge/R&D/purpose and one most likely from IBM to be resistant if at all possible, which sadly it is not.

I really do think this is going to be a big headache for about nearly every modern CPU.
Edit:
Just to say after re-reading this post I appreciate that in essence it could be said Power x series of CPUs are sort of descendants of RS/6000, but more surprised with the vulnerability in System Z with its background.

Edit 2:
Just realised some may feel the Spectre paper section I quoted to be a bit vague on confirming it also affects Ryzen, context suggesting this vulnerability is much broader than some may expect in terms of influence on modern CPU designs.
Further into the paper they also stated
Experiments were performed on multiple x86 processor architectures, including Intel Ivy Bridge (i7-3630QM), Intel Haswell (i7-4650U), Intel Skylake (unspecified Xeon on Google Cloud), and AMD Ryzen.
The Spectre vulnerability was observed on all of these CPUs.
 
Last edited:
I think Spectre is going to be a massive headache for nearly every modern generation CPU with regards to the branch prediction vulnerability.
Werner Haas who was one of the three teams involved in identifying these vulnerabilities said:


And even though AMD has stated Spectre does not affect Ryzen, they then released an update 11/01 saying it "could"/"may" and will be releasing microcode fix.

Against that background is the fact the researcher actually tested the vulnerability against Ryzen and mention it in their papers:


So I wondered about IBM with their background also in RISC processors such as the good old RS/6000 that in theory should be fine and so looked beyond the modern Power x series with the issue; and yep was surprised the vulnerability for Spectre exists even for IBM System Z series mainframes - not saying this is based upon RS/6000 but more from a design team background knowledge/R&D/purpose and one most likely from IBM to be resistant if at all possible, which sadly it is not.

I really do think this is going to be a big headache for about nearly every modern CPU.
Edit:
Just to say after re-reading this post I appreciate that in essence it could be said Power x series of CPUs are sort of descendants of RS/6000, but more surprised with the vulnerability in System Z with its background.

Edit 2:
Just realised some may feel the Spectre paper section I quoted to be a bit vague on confirming it also affects Ryzen, context suggesting this vulnerability is much broader than some may expect in terms of influence on modern CPU designs.
Further into the paper they also stated
You've mixed things up, a lot.

AMD has always been vulnerable to Spectre variant 1 and they never claimed they're not vulnerable. Variant 1 is fixed with OS updates like on every other platform.
AMD said originally that they are in theory vulnerable to Spectre variant 2, but no-one has been able to bring up PoC to prove that vulnerability. This is still the case and they're releasing optional microcode update for those who want to patch it just in case.
Only thing AMD said they're not vulnerable to is Meltdown and that still holds true.
 
You've mixed things up, a lot.

AMD has always been vulnerable to Spectre variant 1 and they never claimed they're not vulnerable. Variant 1 is fixed with OS updates like on every other platform.
AMD said originally that they are in theory vulnerable to Spectre variant 2, but no-one has been able to bring up PoC to prove that vulnerability. This is still the case and they're releasing optional microcode update for those who want to patch it just in case.
Only thing AMD said they're not vulnerable to is Meltdown and that still holds true.
It is not that clear cut.
You notice AMD changed their view on "Variant 2" from not being applicable and Ryzen is safe to now being applicable but as "may be"/"possibly" since 11/01, also AMD were a bit dismissive of the three research teams and some of them have been on record to say they are not impressed with AMD's response to Spectre.
You can also say some vague comments around Intel as the researchers identified that some indirect branch prediction attacks worked on Skylake but not Haswell.
The only case for Ryzen is that the neural network is a more complex speculative behaviour, this does not necessarily make it better, anyway in context of the Spectre research paper the OS update will not fix Ryzen, which is probably why AMD updated their point about "Variant 2".
Time will evolve these Spectre concept attacks to be even more successful.

I am not being critical of AMD, just was including them to show how broad this problem is for modern CPU design..
 
It is not that clear cut.
You notice AMD changed their view on "Variant 2" from not being applicable and Ryzen is safe to now being applicable but as "may be"/"possibly" since 11/01, also AMD were a bit dismissive of the three research teams and some of them have been on record to say they are not impressed with AMD's response to Spectre.
You can also say some vague comments around Intel as the researchers identified that some indirect branch prediction attacks worked on Skylake but not Haswell.
The only case for Ryzen is that the neural network is a more complex speculative behaviour, this does not necessarily make it better, anyway in context of the Spectre research paper the OS update will not fix Ryzen, which is probably why AMD updated their point about "Variant 2".
Time will evolve these Spectre concept attacks to be even more successful.

I am not being critical of AMD, just was including them to show how broad this problem is for modern CPU design..
AMD originally said Spectre v2 has not been demonstrated on their systems and they believe it to be be a near 0 threat. I haven't seen them change their stance on anything. The only thing they have done since the original description has been giving more information on the specifics. Do you have specific sources on any of this?
 
And even though AMD has stated Spectre does not affect Ryzen, they then released an update 11/01 saying it "could"/"may" and will be releasing microcode fix.
To my knowledge, AMD's position was that its architecture made it immune to Meltdown, and that architectural differences made the branch-injection variant of Spectre difficult so as to make the risk near-zero. AMD indicated the bounds check bypass version of Spectre affected it.

So I wondered about IBM with their background also in RISC processors such as the good old RS/6000 that in theory should be fine and so looked beyond the modern Power x series with the issue; and yep was surprised the vulnerability for Spectre exists even for IBM System Z series mainframes - not saying this is based upon RS/6000 but more from a design team background knowledge/R&D/purpose and one most likely from IBM to be resistant if at all possible, which sadly it is not.
There may be specific elements to a system architecture that could have negated Meltdown, but it's something of a choice made long ago as to how supervisor and user memory accesses differentiated or handled with hardare.
It's not an unreasonable decision for performance and aiding software targeting the platform, particularly in architectures laid down long before the prospect of multiple remote and untrusted users across the globe could have the means, knowledge, or motivation to do anything to exploit this. Even without those hostile actors, the technologies being leveraged now didn't fully exist, and it still took decades for this kind of threat to occur to public actors.
MIPS might be able to avoid the Meltdown method as we know it, as it has a different method of handling supervisor memory that might be able to avoid the late permissions check scenario. At least some of the descriptions of its architecture I've seen indicate privileged memory apparently exists outside of the usual paging functionality, but implementation isn't clear for the most recent OoO variants. How its system event handling works in ways that might make some Spectre scenarios like spoofing the kernel's prediction more difficult, because the model seems to restrict what logical processors can work in system mode. That might come down to implementation details.

Timing attacks have for a long time focused on revealing cryptographic keys, and many of the traditional exploits involved crafting messages that could help reveal the hidden values or compromise the key generation process. There's a subtle disconnect in that methodology's mindset versus the leveraging of speculative hardware, where the more widely researched methods have more knowledge of the data (example: crafted plaintext) and algorithm (AES, entropy source, or other established algorithms) processed rather than tripping the hardware pipeline to reveal something assumed to be "over the wall" in kernel space. For Spectre, a significant portion of the problem is defining an architecture that can conclusively state that this kind of information leakage is wrong. Up to now, it wasn't a case that had been articulated as incorrect. Branch injection--particularly when targeting the kernel--may be a more clearly definable case than bounds check escape within user code.

In some ways, if the decision isn't made to avoid such a problem (and it's hard to see how that case could be rationally explained to leadership or other designers if you were in that position years prior to the existence of the problem) a philosophy that encourages simplifying hardware and leaning on software management can leave wider gaps.
For Meltdown's vulnerability, there's a case for having a single well-validated point of speculative rollback nearer the end of the pipeline, rather than having hardware behavior shifting less predictably in a very complicated and critical area of the microarchitecture and possibly opening a window for problems at multiple points in the critical loop. It can constrain the hardware or future changes to the architecture that are hard to divine from the outside. It seems reasonable now, at least for Meltdown, to add something that can render the speculative data useless earlier than late-stage rollback.

There are possible avenues for changing the architectures and mitigations that could be added, but the methods for hardening cryptographic implementations should be noted for how much performance, efficiency, and transparency they willingly give up to avoid vulnerabilities like this. Algorithms will purposefully hang if certain paths are faster based on key or content, and hardware can salt, transform, or duplicate work to avoid probing of power consumption based on an imbalance of 1s and 0s. Event counters and other trace info are a vector for exploit, and may be broken or not implemented in these cases.
The enclaves are often physically and electrically isolate, and usually closed to review or modification by most coders that would be using them.


Just to say after re-reading this post I appreciate that in essence it could be said Power x series of CPUs are sort of descendants of RS/6000, but more surprised with the vulnerability in System Z with its background.
System Z is also a very aggressively speculative and highly clocked architecture, and a lot of its single-threaded focus is the result of targeting the hardware/software emulation of a legacy architecture. IBM's various architectures often have a large set of checkpointing and recovery that can pull in firmware or software methods, and the key element is any kind of rollback that misses any number of side-effects. This may also be where Power 6 sees some problems despite being in-order.
This rollback element also could explain why Nvidia's Denver cores are likely affected by Meltdown and Spectre, in part because performance considerations brought in more traditional speculative elements to its code-morphing scheme and exposed its execution in ways Transmeta's long-obsolete cores did not.
 
AMD originally said Spectre v2 has not been demonstrated on their systems and they believe it to be be a near 0 threat. I haven't seen them change their stance on anything. The only thing they have done since the original description has been giving more information on the specifics. Do you have specific sources on any of this?
I guess it comes down to interpretation when AMD states "Differences in AMD architecture mean there is a near zero risk of exploitation of this variant. Vulnerability to Variant 2 has not been demonstrated on AMD processors to date."
And then 10 days later they update to say:
"GPZ Variant 2 (Branch Target Injection or Spectre) is applicable to AMD processors
While we believe that AMD’s processor architectures make it difficult to exploit Variant 2, we continue to work closely with the industry on this threat.
We have defined additional steps through a combination of processor microcode updates and OS patches that we will make available to AMD customers and partners to further mitigate the threat
".

I am busy so cannot spend much time finding information from the researchers thoughts about AMD apart from a quick example here to the original AMD statement:
Haas is far more critical of AMD's handling of the problem.
"AMD's reaction has been a complete disappointment. I still have not figured out whether I should feel insulted by their claim about 'a highly knowledgeable team with detailed, non-public information about the processors targeted'.
"Well, of course we feel flattered by the first part, but I strongly reject the notion that we at Cyberus used any kind of internal details from our previous jobs at Intel!
And calling 'Information Security is a Priority' while discounting the research findings three sentences later does not quite match in my eyes."
 
Last edited:
To my knowledge, AMD's position was that its architecture made it immune to Meltdown, and that architectural differences made the branch-injection variant of Spectre difficult so as to make the risk near-zero. AMD indicated the bounds check bypass version of Spectre affected it.


There may be specific elements to a system architecture that could have negated Meltdown, but it's something of a choice made long ago as to how supervisor and user memory accesses differentiated or handled with hardare.
It's not an unreasonable decision for performance and aiding software targeting the platform, particularly in architectures laid down long before the prospect of multiple remote and untrusted users across the globe could have the means, knowledge, or motivation to do anything to exploit this. Even without those hostile actors, the technologies being leveraged now didn't fully exist, and it still took decades for this kind of threat to occur to public actors.
MIPS might be able to avoid the Meltdown method as we know it, as it has a different method of handling supervisor memory that might be able to avoid the late permissions check scenario. At least some of the descriptions of its architecture I've seen indicate privileged memory apparently exists outside of the usual paging functionality, but implementation isn't clear for the most recent OoO variants. How its system event handling works in ways that might make some Spectre scenarios like spoofing the kernel's prediction more difficult, because the model seems to restrict what logical processors can work in system mode. That might come down to implementation details.

Timing attacks have for a long time focused on revealing cryptographic keys, and many of the traditional exploits involved crafting messages that could help reveal the hidden values or compromise the key generation process. There's a subtle disconnect in that methodology's mindset versus the leveraging of speculative hardware, where the more widely researched methods have more knowledge of the data (example: crafted plaintext) and algorithm (AES, entropy source, or other established algorithms) processed rather than tripping the hardware pipeline to reveal something assumed to be "over the wall" in kernel space. For Spectre, a significant portion of the problem is defining an architecture that can conclusively state that this kind of information leakage is wrong. Up to now, it wasn't a case that had been articulated as incorrect. Branch injection--particularly when targeting the kernel--may be a more clearly definable case than bounds check escape within user code.

In some ways, if the decision isn't made to avoid such a problem (and it's hard to see how that case could be rationally explained to leadership or other designers if you were in that position years prior to the existence of the problem) a philosophy that encourages simplifying hardware and leaning on software management can leave wider gaps.
For Meltdown's vulnerability, there's a case for having a single well-validated point of speculative rollback nearer the end of the pipeline, rather than having hardware behavior shifting less predictably in a very complicated and critical area of the microarchitecture and possibly opening a window for problems at multiple points in the critical loop. It can constrain the hardware or future changes to the architecture that are hard to divine from the outside. It seems reasonable now, at least for Meltdown, to add something that can render the speculative data useless earlier than late-stage rollback.

There are possible avenues for changing the architectures and mitigations that could be added, but the methods for hardening cryptographic implementations should be noted for how much performance, efficiency, and transparency they willingly give up to avoid vulnerabilities like this. Algorithms will purposefully hang if certain paths are faster based on key or content, and hardware can salt, transform, or duplicate work to avoid probing of power consumption based on an imbalance of 1s and 0s. Event counters and other trace info are a vector for exploit, and may be broken or not implemented in these cases.
The enclaves are often physically and electrically isolate, and usually closed to review or modification by most coders that would be using them.



System Z is also a very aggressively speculative and highly clocked architecture, and a lot of its single-threaded focus is the result of targeting the hardware/software emulation of a legacy architecture. IBM's various architectures often have a large set of checkpointing and recovery that can pull in firmware or software methods, and the key element is any kind of rollback that misses any number of side-effects. This may also be where Power 6 sees some problems despite being in-order.
This rollback element also could explain why Nvidia's Denver cores are likely affected by Meltdown and Spectre, in part because performance considerations brought in more traditional speculative elements to its code-morphing scheme and exposed its execution in ways Transmeta's long-obsolete cores did not.
I was more interested in Spectre than Meltdown, all reports are saying Meltdown is containable whereas Spectre requires a redraw of CPU hardware to fully mitigate its weakness.
Sure System Z has speculative execution but this system is very different to many out there and also with a scope heavily focused towards security, it has a unique set of R&D engineers (from my time working with IBM engineers albeit 10+ years ago); one complaint from some of those researchers that security has become less of afocus in terms of CPU design these days, trade-off with performance and ease of use-flexibility.
 
Back
Top