AMD RyZen CPU Architecture for 2017

Discussion in 'PC Industry' started by fellix, Oct 20, 2014.

Tags:
  1. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,445
    Likes Received:
    326
    Location:
    Varna, Bulgaria
    AMD’s Lisa Su: high-end ‘Zen’ x86 cores set to be available in 2016
     
  2. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,170
    Location:
    La-la land
    About time AMD embraces SMT. I for one welcome this development.
     
  3. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    Yeah but I don't remember hearing anything from AMD itself about using SMT, although they were more or less clear about dropping CMT.
     
  4. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,357
    Likes Received:
    320
    Location:
    Somewhere over the ocean
    I'll wait the next conference in wich they will present the all new powerfull excavator while explaining that it was only a wrong turn and that it will be put in the trash can as soon as possible
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,975
    Likes Received:
    2,418
    Location:
    Well within 3d
    It is one thing to say that Piledriver-based Opterons are outdated and another to trash Excavator.
    AMD's non-APU Opteron line is quite outdated, and the focus on the low-power line indicates they will probably do the updating with Seattle and repurposed Excavator chips. Absent more data, it still sounds like AMD's ducking the Opteron line proper.

    Is there a quote from AMD that corresponds with what the article asserts as far as CMT goes?
     
  6. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,357
    Likes Received:
    320
    Location:
    Somewhere over the ocean
    cmt is an interesting idea, there's any technical limitation to implement it with per core smt?
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,975
    Likes Received:
    2,418
    Location:
    Well within 3d
    It would be possible, but scaling in threads introduces different scaling costs between the per-core and shared resources.

    Assuming a BD-type module, the front end sees the demands placed on maintaining adequate branch prediction and residency in the instruction cache rise at a 2xthread rate instead of per-core, and the FPU sees the necessary amount of scheduling and arbitration rise at 2xthread. The issue mechanism from each integer core to the FPU would likely see additional complication above just doubling, because currently there is just one thread at a core level that can issue FP instructions without worrying about another thread contending for the same link.

    The tiny data caches and the write-combining cache are more likely to have problems. There could be issues with maintaining latencies for inter-core communication. The BD line is already prone to weird througput problems when it comes sharing or potentially sharing between modules or between cores in a module, and in my admittedly jaundiced opinion I would bet on it making things weirder and worse.
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    I believe both threads within a module can issue an instruction to either FP pipe in BD/PD/SR.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,975
    Likes Received:
    2,418
    Location:
    Well within 3d
    Since there are two cores in a module, there is no need for arbitration for instructions within a single core when sending to the FPU.
    Making the cores SMT would change that, requiring logic for a situation that didn't exist prior to that.
     
  10. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    Yes, sorry, I read your previous post too quickly.
     
  11. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,193
    Likes Received:
    397
    Location:
    Romania
    Simplifying it ( maybe not for eloquence but rather because I'm not proficient enough to grasp the details ;) ) but you would want SMT in order to achieve efficiency in case of a whitish execution stage. AMD's CMT cores are far from being wide enough for SMT to ever make a difference
     
  12. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,445
    Likes Received:
    326
    Location:
    Varna, Bulgaria
    Well, NetBurst also wasn't wide enough, but the SMT implementation there was for different reasons (long pipeline prone to stalls?). Granted, HT worked better when was re-introduced in the much wider Nehalem later.
     
  13. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Well, for be honest, i still think that if the OS was understand what to do with SMT; this will maybe be a bit better than what we have see of it. Who know, maybe in some stage in the future we could see it back.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,975
    Likes Received:
    2,418
    Location:
    Well within 3d
    Did you mean CMT? SMT is commonplace and not going anywhere.
    At least the first Windows scheduler hotfix was to treat a BD chip as if it were an SMT processor with each module treated as a core.
    It's less than ideal, but at the same time AMD's preferred option was that the OS sift through the thread memory access history, guess the future, and track whether a thread should allocate a timeslice on a core it left earlier, and do other things at runtime to schedule threads so that they matched the module layout.

    The CMT model has a general propensity to halve a lot of the upsides that would otherwise be available to a thread at times where they were most needed, whilst simultaneously doubling the downsides by requiring the other half of the module to be active and forcing strange stalls and throughput losses.
    Per the designer that conceived it, the architecture does not make sense unless you intend to do something interesting with it. AMD in the end did nothing interesting with it, and until it proves it is capable of doing something interesting in the future (leveraging aging IP and not progressing on the very difficult interesting future directions isn't it) I don't see why CMT should come back.
     
  15. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,295
    Likes Received:
    3,949
    Then again, by AMD's own statements, they're also saying that the yet-to-be-released Excavator is, at best, just 32% (1,15*1,15) better per watt than a "completetely outdated" and non-competitive Piledriver..

    None of this is new, though. We all knew it would come to this as soon as AMD released that "10-15% performance each year" slide.


    The great news is that AMD hasn't given up on the high-end x86 cores yet. We consumers need competition on those, badly.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,975
    Likes Received:
    2,418
    Location:
    Well within 3d
    General platform features surrounding the Opteron CPUs are also rather dated, so at a non-performance level there is stagnation in IO and system features as well.
    The official AMD position, as worded, is a recognition that chips set down years ago are updated.
    It omits what can be inferred about promised levels of performance improvement, but omissions and the assumption that the audience cannot connect two dots is not out of character.

    High end for whom is the question. If they refuse to commit to taking on Intel in the markets where real high-end x86 cores exist, then they are not high end to a standard outside observers are using.
     
  17. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    415
    Likes Received:
    99
    Location:
    Brazil
    looking at performance, the module with 2 CMT cores seems to deliver the promised performance of around 80% of 2 independent cores I think,

    what is killing "Bulldozer" is the low single thread performance (and poor power efficiency), but maybe CMT is a barrier for them to improve on single thread performance and power efficiency, considering they are not using CMT (and SMT) for their most power efficient CPUs, if "Jaguar" is the basis for the new architecture, it would be natural to drop CMT!?
     
  18. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    Well it was replay that messed up Hyperthreading on Netburst chips. I think an engineer(on RWT? Not sure) said that getting Hyperthreading to be effective as it was on Netburst is much harder on Nehalem because the perf/clock is so much higher.
     
  19. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,462
    Likes Received:
    723
    In a SMT processor, many resources (ROB, store buffers etc) are split between contexts; AMDs rationale for CMT was duplicating integer execution units for each context was a small incremental cost. The premise for this rationale was based on the K7/8 microarchitectures where the integer units were a tiny fraction of a core.

    This of course doubles the cost when you want to improve a single execution unit ( like a fully out-of-order load/store unit or a fast divider) or make the core wider internally with wider instruction issue.

    The consequence is you end up with a core that is narrower with slower and less sophisticated execution units. AMD tried to make up for this by boosting operating frequency, which is ... bizarre considering their competitor was shunning speed racers and embracing power efficiency in CPUs.

    On top of that you have the store-through datacaches to a slow L2 cache.

    Cheers
     
  20. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,445
    Likes Received:
    326
    Location:
    Varna, Bulgaria
    Yep. Probably an alternative VMT implementation was more suitable for P4's pipeline than SMT?
    It wasn't only the INT logic being duplicated, but also the L/S pipelines and data caches, so it wasn't that small, though. AMD banked on the future of IGP, as an integral part of a common architecture, where both intensive and more casual FP code would be "naturally" offloaded. The FPU was left shared, inefficient and mostly underpowered, as a consequence.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...