CPU Security Flaws MELTDOWN and SPECTRE

Discussion in 'PC Industry' started by Bondrewd, Jan 2, 2018.

  1. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    7,798
    Likes Received:
    2,063
    Location:
    Well within 3d
    The following has some updated information about some of the latest instructions added with the microcode updates.
    https://arstechnica.com/gadgets/201...e-and-meltdown-patches-will-hurt-performance/

    Zen's indirect branch predictor apparently does not alias addresses when selecting targets, which is one area where it differs from the more readily reverse-engineered Haswell predictor.
    Abusing a branch to shift history requires using the target's branch address, rather than one that merely aliases to a subset. Using the whole address seems like it's more expensive. I'm curious if there's an influence here from the TLB.

    This is apparently why Zen is not getting the IBRS mode setting (prior cores will). This mode initiates a barrier at various transitions based on settings.
    Zen does get the IBPB instruction and STIBP. The former being some kind of barrier that will block branch history prior to it from affecting what follows, and the latter a setting that keeps SMT threads from influencing each other's indirect prediction entries.
    For Intel, the IBPB instruction seems like it wipes the indirect predictor, among other things. AMD's version was described as a subset somewhere in some patch discussions, but what the superset is and what parts of it are included isn't clear.

    Elsewhere, I did find that IBM's Power 7 and Power 8 are being updated to handle Meltdown and Spectre. Power 9 has something inbound. Earlier versions are under review. I'm curious about the in-order but very aggressive Power 6.
    Fujitsu seems to be admitting at least some of these vulnerabilities apply to some SPARC variants, not sure which of them.
    News articles indicated that Cavium admitted that the ARM ThunderX2 (formerly Broadcom's Vulcan core that had its sights set on a Haswell-like target) is affected by Meltdown and Spectre.
    Not sure about MIPS and the most recent OoO cores.
     
  2. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    1,925
    Likes Received:
    803
    Never the less the issue is real. The Witcher 3 run involves a tour through the city, which stresses the streaming system of the game, this is likely the culprit of the hit in performance (we know the patches affect storage performance more). Which means open world games could see an even bigger hit than this. Games like Fallout 4, GTA 5, and the likes.
     
  3. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,335
    Likes Received:
    276
    Location:
    Germany
    Fujitsu's list: https://sp.ts.fujitsu.com/dmsp/Publ...E2017-5715-vulnerability-Fujitsu-products.pdf
    Fujitsu SPARC servers are "under investigation".

    Does Oracle use SPARC cpus?

    IBM: https://www.ibm.com/blogs/psirt/potential-impact-processors-power-family/
    • Firmware patches for POWER7+ and POWER8 platforms are now available via FixCentral. POWER9 patches will be available on January 15. We will provide further communication on supported generations prior to POWER7+ including firmware patches and availability.
    • Linux operating systems patches are now available through our Linux distribution partners Redhat, SUSE and Canonical.
    • AIX and IBM i operating system patches will be available February 12. Information will be available via PSIRT.
    ------------------------
    https://www.hardocp.com/news/2018/01/11/amd_doubles_down_on_previous_spectre_meltdown_statments
    https://www.reuters.com/article/bri...-gpz-variant-1-or-gpz-variant-2-idUSFWN1P60X7
     
    #143 Arnold Beckenbauer, Jan 12, 2018
    Last edited: Jan 12, 2018
    Lightman likes this.
  4. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,335
    Likes Received:
    276
    Location:
    Germany
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,275
    Likes Received:
    963
    Location:
    Finland
    Lightman likes this.
  6. Malo

    Malo YakTribe.games
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    5,887
    Likes Received:
    1,885
    Location:
    Pennsylvania
    Lightman likes this.
  7. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    7,798
    Likes Received:
    2,063
    Location:
    Well within 3d
    One plausible cause from https://www.realworldtech.com/forum/?threadid=174100&curpostid=174102 is triple-faults resulting from missed changes to fault handlers that have assumptions about the mapping of pages in the kernel and user domains after KPTI. Fault handling can itself cause a fault, and if specific configurations or software have old assumptions buried in their deepest layers, things can nest further. However, x86 cuts things off at three faults deep.

    This is something that came up withe PS4 jailbreak, which involved a browser engine exploit and modifications to an unsecured interrupt descriptor table that would allow faults to be rerouted for arbitrary code execution at elevated privilege.
    This exploit was made more difficult by the hack operating at two faults already, and a triple fault would crash the PS4.
    https://cturt.github.io/ps4-3.html

    On a side note, HPE has reported that Intel indicated Itanium is immune to all three vulnerabilities. I did check to see there are several reasons why Meltdown doesn't apply. There is branch predition, but it's possible the specific methods or indirect predictor structures do not apply.
    Bounds check escape would seemingly not happen due to it not being able to speculate on a long miss--in hardware.
    I'm curious if some of the more aggressive code transformations could cause something like this to happen, but since it's software Intel could just shrug it off.
     
    pharma, BRiT and ImSpartacus like this.
  8. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,598
    Likes Received:
    203
    So Itanic really was the future afterall? :runaway:


    Pretty amazed by how widespread this has wound up being, I figured it'd be something limited to specific architecture rather than a generic widespread issue across many types of chip :shock:
     
    Lightman and milk like this.
  9. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    388
    Likes Received:
    74
    Location:
    Brazil
    also Atom D2700 is the fastest safe x86 CPU I guess.
     
    Lightman and DavidGraham like this.
  10. swaaye

    swaaye Entirely Suboptimal
    Legend Subscriber

    Joined:
    Mar 15, 2003
    Messages:
    8,287
    Likes Received:
    451
    Location:
    WI, USA
    Thankfully my Haswell web server at work isn't rebooting on the new Centos 6 kernel. Not yet, anyway.
     
  11. Clukos

    Clukos Bloodborne Ps4 Pro when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,224
    Likes Received:
    3,454


     
    Malo and matthias like this.
  12. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    7,798
    Likes Received:
    2,063
    Location:
    Well within 3d
    I tried to review some of the remaining articles on Itanium and its variants, and from what I can see part of the reasons for its immunity to the branch injection variant of Spectre are its focus on tighter software control of the speculative pipeline, and keeping to a two-level scheme and more tightly linked L1 cache line and branch data structure that keeps predictions more 1:1 between predicted branch and its originating thread. Indirect branches are handled with an explicit set of registers, which would not be touched by a different thread trying to spoof a predictor.
    However, that also means the techniques in question predate some of the best predictors we've had for years, as the global-type or shared history predictors can be eerily effective at finding correlations in behavior and making good use of the limited storage capacity of the prediction logic--at the cost of a lack of isolation for a security threat years down the road that the designers hadn't seen coming.
    Further, there were certain elements that would have made it harder, such as a front-end pipeline that did have a facility in place to detect incorrectly predicted jump addresses that was able to intercept it long before it could get to an execution stage.
    Additionally, there's an element that is sort of in the ballpark of why I think Zen's prediction path has some extra difficulty in being exploited--Itanium's specific handling of its L1 caches and their TLBs means the TLB is insinuated into in the prediction path, in some ways like Zen's front end. For Itanium's use of pre-validated L1 caches, the hit logic uses a set of values that tie a TLB entry to an L1 line that belongs to it. Since a good chunk of the predictor's resources are tied on a per-line basis, and you can't get to a line without getting through the TLB's full address check, that's not something that can be spoofed from another thread using a different address.

    I'm not sure if Zen does anything like this, but the specific shortcut of linking an entry to the TLB translation that spawns it creates a more tight link between a branch and its direct address, and the TLB can provide a smaller set of values that can address a specific subset of the branch buffers. It's not without downsides like needing the TLB to be in a very tight linkage, and potentially limiting the capacity of other parts of the pipeline based on the the TLB's storage limitations. Involving it might explain what AMD said about using the whole address, which can get expensive when trying to track many branches.

    Bounds check escape is negated in hardware because of the inability to get past the check in software without stalling. The prediction logic is more limited to steering the front end and those stages between fetch and execution, rather than guiding execution past that point. However, saying that makes Itanium immune requires shifting the blame to any software using predication, non-faulting and advance loads, and a pipeline designed to enable the compiler to explicitly build code that can race past bound checks, with explicit cleanup code. To decide not to have that is to give up a notable reason for using EPIC at all.

    The Meltdown immunity stems from this explicit software stance. Intel was forced to make a choice with its speculative load functionality about how it would handle speculative loads that might cause a fault, and there was no luxury of having hardware intervention and cleanup. If they didn't block the value from use, there was no telling what would happen since EPIC would be handing the bad data over to the program. So Itanium adds a flag or special value indicating a piece of data is invalid, and the actual value is dumped. If the speculative code isn't smart enough to dump it, using that flagged data throws a likely fatal error. One of the common sources of trouble with software using Itanium is that it takes that flag very seriously, and its wide-ranging speculation could leave booby-trapped registers hanging around from rolled-back paths or subroutines in a nest of very complex code transformations, any of which could kill the program long after the point that created the value and with little data to track down the bug.


    Just to clarify what that quote is claiming, that is trivially true. The way Meltdown vulnerability works requires carefully preparing the cache so that a controlled eviction can be timed, and there's a limited amount of data per attempt that can be communicated. The cache needs to be cleared at first, then a load loop run through afterwards to see which specific lines were evicted. That means almost all the cache cannot be used, not that it's necessary to get the data eventually. The cache's limited combination of lines that can be evicted limits the number of bits from the internal value that can be leaked at one time, and then there's a full reset cycle to try to get the next set of bits, plus extra attempts to overwhelm any noise due to other events causing evictions or context switches. The extraction rate is very low because of all this.
     
    entity279, Clukos and Kaarlisk like this.
  13. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    1,620
    Likes Received:
    662
    Well AWS was an example of how Meltdown patch hit cloud services, but seems fine tuning of the patch may be helping with performance since 12th, it seems no other explanation given by reports.
    Stability issues still seem to be reported though.

    https://blog.appoptics.com/visualizing-meltdown-aws/
     
    #153 CSI PC, Jan 16, 2018
    Last edited: Jan 16, 2018
  14. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    1,620
    Likes Received:
    662
    I think Spectre is going to be a massive headache for nearly every modern generation CPU with regards to the branch prediction vulnerability.
    Werner Haas who was one of the three teams involved in identifying these vulnerabilities said:
    And even though AMD has stated Spectre does not affect Ryzen, they then released an update 11/01 saying it "could"/"may" and will be releasing microcode fix.
    Against that background is the fact the researcher actually tested the vulnerability against Ryzen and mention it in their papers:
    So I wondered about IBM with their background also in RISC processors such as the good old RS/6000 that in theory should be fine and so looked beyond the modern Power x series with the issue; and yep was surprised the vulnerability for Spectre exists even for IBM System Z series mainframes - not saying this is based upon RS/6000 but more from a design team background knowledge/R&D/purpose and one most likely from IBM to be resistant if at all possible, which sadly it is not.

    I really do think this is going to be a big headache for about nearly every modern CPU.
    Edit:
    Just to say after re-reading this post I appreciate that in essence it could be said Power x series of CPUs are sort of descendants of RS/6000, but more surprised with the vulnerability in System Z with its background.

    Edit 2:
    Just realised some may feel the Spectre paper section I quoted to be a bit vague on confirming it also affects Ryzen, context suggesting this vulnerability is much broader than some may expect in terms of influence on modern CPU designs.
    Further into the paper they also stated
     
    #154 CSI PC, Jan 16, 2018
    Last edited: Jan 16, 2018
    DavidGraham likes this.
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    7,275
    Likes Received:
    963
    Location:
    Finland
    You've mixed things up, a lot.

    AMD has always been vulnerable to Spectre variant 1 and they never claimed they're not vulnerable. Variant 1 is fixed with OS updates like on every other platform.
    AMD said originally that they are in theory vulnerable to Spectre variant 2, but no-one has been able to bring up PoC to prove that vulnerability. This is still the case and they're releasing optional microcode update for those who want to patch it just in case.
    Only thing AMD said they're not vulnerable to is Meltdown and that still holds true.
     
    Squilliam, Silent_Buddha and Malo like this.
  16. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    1,620
    Likes Received:
    662
    It is not that clear cut.
    You notice AMD changed their view on "Variant 2" from not being applicable and Ryzen is safe to now being applicable but as "may be"/"possibly" since 11/01, also AMD were a bit dismissive of the three research teams and some of them have been on record to say they are not impressed with AMD's response to Spectre.
    You can also say some vague comments around Intel as the researchers identified that some indirect branch prediction attacks worked on Skylake but not Haswell.
    The only case for Ryzen is that the neural network is a more complex speculative behaviour, this does not necessarily make it better, anyway in context of the Spectre research paper the OS update will not fix Ryzen, which is probably why AMD updated their point about "Variant 2".
    Time will evolve these Spectre concept attacks to be even more successful.

    I am not being critical of AMD, just was including them to show how broad this problem is for modern CPU design..
     
  17. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    543
    Likes Received:
    232
    AMD originally said Spectre v2 has not been demonstrated on their systems and they believe it to be be a near 0 threat. I haven't seen them change their stance on anything. The only thing they have done since the original description has been giving more information on the specifics. Do you have specific sources on any of this?
     
  18. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    7,798
    Likes Received:
    2,063
    Location:
    Well within 3d
    To my knowledge, AMD's position was that its architecture made it immune to Meltdown, and that architectural differences made the branch-injection variant of Spectre difficult so as to make the risk near-zero. AMD indicated the bounds check bypass version of Spectre affected it.

    There may be specific elements to a system architecture that could have negated Meltdown, but it's something of a choice made long ago as to how supervisor and user memory accesses differentiated or handled with hardare.
    It's not an unreasonable decision for performance and aiding software targeting the platform, particularly in architectures laid down long before the prospect of multiple remote and untrusted users across the globe could have the means, knowledge, or motivation to do anything to exploit this. Even without those hostile actors, the technologies being leveraged now didn't fully exist, and it still took decades for this kind of threat to occur to public actors.
    MIPS might be able to avoid the Meltdown method as we know it, as it has a different method of handling supervisor memory that might be able to avoid the late permissions check scenario. At least some of the descriptions of its architecture I've seen indicate privileged memory apparently exists outside of the usual paging functionality, but implementation isn't clear for the most recent OoO variants. How its system event handling works in ways that might make some Spectre scenarios like spoofing the kernel's prediction more difficult, because the model seems to restrict what logical processors can work in system mode. That might come down to implementation details.

    Timing attacks have for a long time focused on revealing cryptographic keys, and many of the traditional exploits involved crafting messages that could help reveal the hidden values or compromise the key generation process. There's a subtle disconnect in that methodology's mindset versus the leveraging of speculative hardware, where the more widely researched methods have more knowledge of the data (example: crafted plaintext) and algorithm (AES, entropy source, or other established algorithms) processed rather than tripping the hardware pipeline to reveal something assumed to be "over the wall" in kernel space. For Spectre, a significant portion of the problem is defining an architecture that can conclusively state that this kind of information leakage is wrong. Up to now, it wasn't a case that had been articulated as incorrect. Branch injection--particularly when targeting the kernel--may be a more clearly definable case than bounds check escape within user code.

    In some ways, if the decision isn't made to avoid such a problem (and it's hard to see how that case could be rationally explained to leadership or other designers if you were in that position years prior to the existence of the problem) a philosophy that encourages simplifying hardware and leaning on software management can leave wider gaps.
    For Meltdown's vulnerability, there's a case for having a single well-validated point of speculative rollback nearer the end of the pipeline, rather than having hardware behavior shifting less predictably in a very complicated and critical area of the microarchitecture and possibly opening a window for problems at multiple points in the critical loop. It can constrain the hardware or future changes to the architecture that are hard to divine from the outside. It seems reasonable now, at least for Meltdown, to add something that can render the speculative data useless earlier than late-stage rollback.

    There are possible avenues for changing the architectures and mitigations that could be added, but the methods for hardening cryptographic implementations should be noted for how much performance, efficiency, and transparency they willingly give up to avoid vulnerabilities like this. Algorithms will purposefully hang if certain paths are faster based on key or content, and hardware can salt, transform, or duplicate work to avoid probing of power consumption based on an imbalance of 1s and 0s. Event counters and other trace info are a vector for exploit, and may be broken or not implemented in these cases.
    The enclaves are often physically and electrically isolate, and usually closed to review or modification by most coders that would be using them.


    System Z is also a very aggressively speculative and highly clocked architecture, and a lot of its single-threaded focus is the result of targeting the hardware/software emulation of a legacy architecture. IBM's various architectures often have a large set of checkpointing and recovery that can pull in firmware or software methods, and the key element is any kind of rollback that misses any number of side-effects. This may also be where Power 6 sees some problems despite being in-order.
    This rollback element also could explain why Nvidia's Denver cores are likely affected by Meltdown and Spectre, in part because performance considerations brought in more traditional speculative elements to its code-morphing scheme and exposed its execution in ways Transmeta's long-obsolete cores did not.
     
  19. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    1,620
    Likes Received:
    662
    I guess it comes down to interpretation when AMD states "Differences in AMD architecture mean there is a near zero risk of exploitation of this variant. Vulnerability to Variant 2 has not been demonstrated on AMD processors to date."
    And then 10 days later they update to say:
    "GPZ Variant 2 (Branch Target Injection or Spectre) is applicable to AMD processors
    While we believe that AMD’s processor architectures make it difficult to exploit Variant 2, we continue to work closely with the industry on this threat.
    We have defined additional steps through a combination of processor microcode updates and OS patches that we will make available to AMD customers and partners to further mitigate the threat
    ".

    I am busy so cannot spend much time finding information from the researchers thoughts about AMD apart from a quick example here to the original AMD statement:
     
    #159 CSI PC, Jan 16, 2018
    Last edited: Jan 16, 2018
    pharma and DavidGraham like this.
  20. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    1,620
    Likes Received:
    662
    I was more interested in Spectre than Meltdown, all reports are saying Meltdown is containable whereas Spectre requires a redraw of CPU hardware to fully mitigate its weakness.
    Sure System Z has speculative execution but this system is very different to many out there and also with a scope heavily focused towards security, it has a unique set of R&D engineers (from my time working with IBM engineers albeit 10+ years ago); one complaint from some of those researchers that security has become less of afocus in terms of CPU design these days, trade-off with performance and ease of use-flexibility.
     

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...