AMD RyZen CPU Architecture for 2017

Discussion in 'PC Industry' started by fellix, Oct 20, 2014.

Tags:
  1. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    That's basic stuff for graphics programmers. We have all burned our hands with WB several times on many platforms.

    I am mostly interested about about the specific remarks, such as "When a store hits on a write buffer that has been written to earlier with a different memory type than that store, the buffer is closed and flushed.". What does "different memory type" mean? Does it mean that each standard store instruction (to standard cached memory) causes also all open WB buffers to flush? And does this also happen if the other SMT thread does standard stores and the other does NT/WB stores?
     
    Gubbi and pharma like this.
  2. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,591
    Likes Received:
    993
    Write combining buffers are not coherent with the rest of the memory system (heck, they can get flushed in any order the core decides), so would be extremely surprised if cached stores have any influence at all. I think "memory type" has to do with the size of the store. I'm guessing it flushes a partially filled buffer in chunks of the stores used to fill the buffer, when the size changes, the buffer is flushed.
     
  3. Toasty

    Newcomer

    Joined:
    Jul 24, 2002
    Messages:
    51
    Likes Received:
    2
    I interpret it to refer to the different memory types supported by the MTRR register. From 2.13: "AMD Family 17h processor supports the memory type range register (MTRR) and the page attribute table (PAT) extensions, which allow software to define ranges of memory as either writeback (WB), write-protected (WP), writethrough (WT), uncacheable (UC), or write-combining (WC). Defining the memory type for a range of memory as WC allows the processor to conditionally combine data from multiple write cycles that are addressed with in this range into a merge buffer."

    If true then this warning about flushes is rather a corner case. Just don't go interleaving writes to a buffer with VirtualProtect calls to change that buffers MTRR and you'll be okay.
     
  4. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,515
    Likes Received:
    3,872
  5. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    558
    Likes Received:
    95
    "All EPYC 7000 Processors have 8 Channels DDR4 and 128 PCIe Lanes"

    That makes the EPYC 7251 an odd product. A lot of silicon to be selling for $400-$600 with 3/4 of the cores disabled, and neither does it turbo higher than the other models.
     
  6. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,515
    Likes Received:
    934
    Perhaps, but based on the TDP, we're talking about pretty crappy silicon. Recycling 4 very poor dies into a >$400 SKU doesn't sound like such a bad idea to me.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,367
    Likes Received:
    3,959
    Location:
    Well within 3d
    If the data being written is expected to be re-read soon after, is it desirable to skip the cache hierarchy that allows it to be re-read quickly?

    If one thread is writing non-combining data to the same 64-byte region as a write-combining buffer, it would appear that this counts as an event that will close an ongoing write-combining buffer. That event does seem to apply to all write-combining buffers in the core.

    Per AMD's documentation, there are a number of events that can prompt a flush. It would seem that AMD's implementation tries to be conservative in the face of any ambiguities or long-latency events that could interact with an a WC buffer of uncertain state.

    Intel's line-fill buffer method for write combining seems like it could be more aggressive, or it has not chosen to state flush conditions as exhaustively.
    There are fewer events that would flush all WC buffers, but it seems like sufficient cache traffic could evict individual lines more frequently.
    Since not every Intel core does well with write-combining or actually handles NT stores non-temporally, it may not be consistently worse/better.

    I think it comes down to whether there is an event that would prompt other pipeline flushes or if traffic subject to more stringent ordering/visibility rules might clash with WC data or potentially any data co-resident on its cache line. POP or PUSH don't appear on the table.


    The document seems like it's missing a lot of sections, relative to the 15h guide.
    Also, I think it might be incorrect on the load buffer size and FP scheduler size, compared to earlier Zen presentations.

    The data cache seems to be interesting:
    Rather than a static 16 bank scheme for determining conflicts, there's a dynamic component in the DC way determination built into the way prediction/microtag array.
    There's an undisclosed hash function based on the virtual address access history and dynamic behaviors related to aliasing or hash conflicts.
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Cyan and Lightman like this.
  9. shiznit

    Regular Newcomer

    Joined:
    Nov 27, 2007
    Messages:
    338
    Likes Received:
    88
    Location:
    Oblast of Columbia
    I have a feeling that hyperscalers are going to be all over EPYC. Massive savings at scale from 1P systems and potential performance benefits from ditching NUMA.

    I suspect EC2/GCE VMs will still be on Intel for marketing reasons but for managed services like object storage EPYC is a winner.
     
  10. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    It might be comical, 8 CCX with only one core per CCX enabled. But if you're in a situation where you only care about fitting as much memory as you can, it will make for an affordable machine with at least 256GB upgradable to well more than that. E.g., a statistician who has to fit everything into memory because that's easier.
     
    Alexko and Lightman like this.
  11. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    NVDIMM based storage servers or some heterogeneous setups might make use of that. NVDIMMs would have a far larger capacity and benefit more from a large L3 than bunch of cores. Similarly an array of GPUs probably wouldn't need all the cores if serving largely as a PCIe backplane.
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    EPYC is "NUMA on chip" just like Ryzen and Threadripper. Clusters of 4 cores (8 threads) have their own dedicated L3 cache. There's no huge shared LLC like in Intel designs.
     
  13. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,515
    Likes Received:
    3,872
    #2153 Clukos, Jun 16, 2017
    Last edited: Jun 16, 2017
    Cyan likes this.
  14. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,088
    Likes Received:
    5,634
    You know things are going well for your side when you can make bland statements like these and somehow they're very funny.


    [​IMG]
     
    Lightman, Cyan and Clukos like this.
  15. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    768
    Likes Received:
    532
    Kyyla and Lightman like this.
  16. shiznit

    Regular Newcomer

    Joined:
    Nov 27, 2007
    Messages:
    338
    Likes Received:
    88
    Location:
    Oblast of Columbia
    I was aware of the CCX/L3 layout and the infinity fabric but I didn't think memory latency would be as high as going through another socket over QPI.
     
  17. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,822
    Likes Received:
    494
    Location:
    Torquay, UK
    What's even better, it will plug in to my AM4 board, so I will only have to eBay my R7 1700 and buy something with Zen 2 under heatspreader :)

    PS. This is now officially fully stable - not a single crash since Monday and I did a lot of things with my computer:

    [​IMG]
     
    #2157 Lightman, Jun 17, 2017
    Last edited: Jun 17, 2017
    pharma, Clukos and Cyan like this.
  18. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer

    Joined:
    Jun 25, 2014
    Messages:
    4,515
    Likes Received:
    3,872
    That's a very nice overclock! I probably need better cooling if I want to run mine at 4.0GHz :)
     
    Lightman likes this.
  19. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Most likely not as slow as going from socket to socket, but still the software needs to be NUMA aware to achieve the best performance. We already know that Ryzen (8 cores divided in 2 clusters) has some performance problems in consumer software (including games), because these software isn't NUMA aware. Intel consumer chips (since Nehalem) have had a big shared LLC for all cores. Consumer software programmers didn't need to care about this stuff, but now they do. Enterprise software obviously has been NUMA aware for long time, and will continue to be so, meaning that EPYC and Threadripper have no problems there.
     
    shiznit, Lightman and BRiT like this.
  20. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,822
    Likes Received:
    494
    Location:
    Torquay, UK
    Cooling = clocks with Ryzen.
    On AMD Spire RGB I couldn't get any reasonable stability above 3.9GHz even with 1.4V. Water AIO enabled easy 4GHz and today I will find out if not more than that.
     
    Clukos likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...