AMD RyZen CPU Architecture for 2017

I think AMD showed again lack of resources in the lunch, I would have prefer a launch a month later with everything(most of it at least) ready, specially BIOS and DDR support.
 
I really want to see some reviewers playing with SMT and affinity. I'd love to see gaming results with process affinity limited to cores 0-3 and SMT on. Might be a great way to eek some performance out of current games which may be causing possible CCX hopping and hitting that latency.

Early days, still a lot of discovery to go. If MS ends up coming out with OS-level patches like with Bulldozer.... why wasn't this found and done months ago by AMD ready for launch?
 
I think the biggest thing about ryzen is that it is AMD base for the future, yes it may not be perfect but damn AMD couldn't wish for a better base to have. It will be very interesting to see a ryzen+ with all the problems solved, better support from games and some improvements.
Imagine if at the start of this thread someone seriously suggested that the 1st iteration of Zen would be right up there with the latest Intel Expensive Edition on a lot of benchmarks, beating it in more than a few & about even in power while doing it: o_O :rolleyes: :LOL:

I hoped that AMD could at least be in the ball-park so that buying AMD CPU wouldn't make people laugh at you, maybe beat a couple of gens old Intel stuff clock for clock. Hoped but didn't particularly believe it would happen :nope:

Note that Aida64 apparently haven't had a chance to update -> their cache numbers are wrong.

Also ran into this via Reddit
Bunch of marketing guff but some interesting behind the scenes tid-bits in there like apparently they were legit aiming for over 50% IPC increase but didn't dare say it publicly :oops:

Might have thought they'd mention Jim Keller there, but maybe understandable since 'It was all because of this guy who doesn't work for us anymore' wouldn't be the greatest look...
Any sign of comment from him now that its launched?
Also anyone seen any detail around why he left other than 'Set stuff in order & on the right path so skipped the boring bit'?
 
Interesting, can be fixed with BIOS update?
https://www.reddit.com/r/Amd/comments/5x54ww/my_theory_on_why_ryzen_does_not_perform_in_games/
This is what Ryzen currently looks like to Windows:

Logical Processor to Cache Map:
Code:
*---------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
*---------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
*---------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
*---------------  Unified Cache       1, Level 3,   16 MB, Assoc  16, LineSize  64
-*--------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
-*--------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
-*--------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
-*--------------  Unified Cache       3, Level 3,   16 MB, Assoc  16, LineSize  64
--*-------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
--*-------------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
--*-------------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--*-------------  Unified Cache       5, Level 3,   16 MB, Assoc  16, LineSize  64
---*------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
---*------------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
---*------------  Unified Cache       6, Level 2,  512 KB, Assoc   8, LineSize  64
---*------------  Unified Cache       7, Level 3,   16 MB, Assoc  16, LineSize  64
----*-----------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
----*-----------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
----*-----------  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
----*-----------  Unified Cache       9, Level 3,   16 MB, Assoc  16, LineSize  64
-----*----------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
-----*----------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
-----*----------  Unified Cache      10, Level 2,  512 KB, Assoc   8, LineSize  64
-----*----------  Unified Cache      11, Level 3,   16 MB, Assoc  16, LineSize  64
------*---------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------*---------  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------*---------  Unified Cache      12, Level 2,  512 KB, Assoc   8, LineSize  64
------*---------  Unified Cache      13, Level 3,   16 MB, Assoc  16, LineSize  64
-------*--------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
-------*--------  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
-------*--------  Unified Cache      14, Level 2,  512 KB, Assoc   8, LineSize  64
-------*--------  Unified Cache      15, Level 3,   16 MB, Assoc  16, LineSize  64
--------*-------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------*-------  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
--------*-------  Unified Cache      16, Level 2,  512 KB, Assoc   8, LineSize  64
--------*-------  Unified Cache      17, Level 3,   16 MB, Assoc  16, LineSize  64
---------*------  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
---------*------  Instruction Cache   9, Level 1,   64 KB, Assoc   4, LineSize  64
---------*------  Unified Cache      18, Level 2,  512 KB, Assoc   8, LineSize  64
---------*------  Unified Cache      19, Level 3,   16 MB, Assoc  16, LineSize  64
----------*-----  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
----------*-----  Instruction Cache  10, Level 1,   64 KB, Assoc   4, LineSize  64
----------*-----  Unified Cache      20, Level 2,  512 KB, Assoc   8, LineSize  64
----------*-----  Unified Cache      21, Level 3,   16 MB, Assoc  16, LineSize  64
-----------*----  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
-----------*----  Instruction Cache  11, Level 1,   64 KB, Assoc   4, LineSize  64
-----------*----  Unified Cache      22, Level 2,  512 KB, Assoc   8, LineSize  64
-----------*----  Unified Cache      23, Level 3,   16 MB, Assoc  16, LineSize  64
------------*---  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------*---  Instruction Cache  12, Level 1,   64 KB, Assoc   4, LineSize  64
------------*---  Unified Cache      24, Level 2,  512 KB, Assoc   8, LineSize  64
------------*---  Unified Cache      25, Level 3,   16 MB, Assoc  16, LineSize  64
-------------*--  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
-------------*--  Instruction Cache  13, Level 1,   64 KB, Assoc   4, LineSize  64
-------------*--  Unified Cache      26, Level 2,  512 KB, Assoc   8, LineSize  64
-------------*--  Unified Cache      27, Level 3,   16 MB, Assoc  16, LineSize  64
--------------*-  Data Cache         14, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*-  Instruction Cache  14, Level 1,   64 KB, Assoc   4, LineSize  64
--------------*-  Unified Cache      28, Level 2,  512 KB, Assoc   8, LineSize  64
--------------*-  Unified Cache      29, Level 3,   16 MB, Assoc  16, LineSize  64
---------------*  Data Cache         15, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*  Instruction Cache  15, Level 1,   64 KB, Assoc   4, LineSize  64
---------------*  Unified Cache      30, Level 2,  512 KB, Assoc   8, LineSize  64
---------------*  Unified Cache      31, Level 3,   16 MB, Assoc  16, LineSize  64
But it should look more like this:
Code:
Logical Processor to Cache Map:
**--------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**--------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
**--------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
********--------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
--**------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
----**----------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**----------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
----**----------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
------**--------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**--------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
------**--------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--------**------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
--------**------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
--------**------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
--------********  Unified Cache       6, Level 3,    8 MB, Assoc  16, LineSize  64
----------**----  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
----------**----  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
----------**----  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
------------**--  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
------------**--  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
------------**--  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
--------------**  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
--------------**  Unified Cache       9, Level 2,  512 KB, Assoc   8, LineSize  64
 

AMD said they are from 0 to 1% ahead of BW-E and 6 less than 7% behind KL on IPC and thats impressive.

I also don't like this crusade of steve against AMD.
 
I am surprised that AMD didn't release a 6-core Ryzen 7 with higher clocks and same TDP. They could have released a 6-core with 3.8 GHz base clock / 4.2 GHz turbo with 95W.
I think the question might be can AMD do that?
Per the voltage/clock analysis in the Anandtech link, single-core XFR (unsustainable, meager clock increase) to 4.1 is operating in a voltage range that is potentially damaging if sustained.

I've seen signs and discussion of thermal density issues increasing for the more compact form factors and its influence on why the desktop space has lagged mobile in getting the latest Intel cores. One trend, which I may have not noticed analysis for in the client space is the increasing impact of the fragility of modern processes on how much you can go "throw some TDP at this and get X MHz".

For a long time, it was generalized that the transistor budget was growing faster than TDP was scaling, leading to thermal limits that could be pushed if you wanted. Power density, to a certain extent, could be pushed if you went with more expensive cooling, although hot-spot issues may leave only so much margin with any mere mortal thermal solution.
However, enthusiasts have been complaining that they aren't getting to play with voltages anymore, with the response that they voltages they want are damaging.

Sufficient or excess cooling is equivalent to having TDP to spare. However, the ability for any single subsection of the die to receive or sustain the voltages and temperatures locally appears to have become noticeably different than saying that the cooler and board VRM are good enough. It was the case in the past that a core or few cores could reasonably be driven to hog all the TDP, but now I'm wondering exactly when it became noticeably separate for consumer parts.


Keep in mind that these are whole system power consumption figures (including memory, motherboard, etc). But it seems to be that AMD measures TDP differently than Intel. AMDs TDP seems to be closer to Intel's SDP (scenario design power), which roughly means common power usage.
For both, the definition has been pretty consistent for some time. I think it might not have been that different since perhaps the 90nm or 65nm generations, basically when either one hit the point where power consumption became crippling to future scaling and reliability.
It's how much power a thermal solution can be expected to dissipate in a thermally significant period of time, while the chip is running some set of representative software. They really cannot fudge this since it is concerning whether the chip burns itself out, although with modern DVFS and active sensors they can get much closer to the edge without engaging in the old debates over "representative loads". Now, the chips to a significant degree just know if they are on the edge, and can even weight the thermal capacity of the cooler to determine when exactly they can push the bounds of "thermally significant" in order to spike above it.

TDP for Intel means absolute peak power consumption.
That hasn't been true for some time, and Intel likely broke from that first. AMD followed perhaps a little later, once it was able to get more than rudimentary power/thermal monitoring.

Intel CPUs need to run heavy AVX2 (FMA) code on all cores to reach TDP (AVX2 code is known to reduce clocks to maintain TDP).
Not just TDP, since this seems to hit AVX clocks in some products if one core so much as sees one AVX2 instruction, which cannot reasonably overwhelm a cooler. This might go to power delivery/dissipation for specific parts of the chip rather than a global measure of the cooling solution.

AMD lead engineer already said that they have a list of low hanging fruit to improve Zen IPC in the future. It is first iteration of a brand new architecture. Also many sources tell that Global Foundries 14mm process is significantly inferior to Intel's new 14mm+. Equivalent process would increase performance and lower power consumption a lot.
It seems reasonable to count on GF's process being inferior, but absent that Zen's designers specifically noted that they've done things like use low-leakage and high-density cells for most of Zen. That means they've tuned the implementation with slower cells in areas that they reason won't need the significant area and leakage penalties of the cells that have half the linear delay--which holds true as long as the CPU is in some specified clock range. Pushing those efficient cells beyond that requires boosting the current enough to wreck power efficiency, or it takes them to the point of risking rapid physical degradation.
A Zen that can go faster would need to physically uprate its cells, accept larger area consumption and power consumption, and probably require a revamping of the power delivery of the chip and CCX, which appear to have limited headroom. "Balance" means not having a lot of slack outside of the target parameters.

Perhaps Zen's designers were given the philosophy of striving for 100% of the 90% of what can be done, and pare back anything that goes further.

256-bit AVX2 not really applying to more than 10% of the market Zen can hit? Don't do it.
Clocking to 4.2+ GHz when fewer than 10% of the products Zen can get to will ever get near that? Don't make a core that can do it.
 
If the two CCXs are only tied together by a 22GB/s link, that's just appalling.

What would the test be to isolate this bandwidth? The auto-translation of the discussion is a little rough for me, was this measured or disclosed by AMD? In order to profile this, the lines in the other CCX would need to be dirty, otherwise they wouldn't respond (shared does not respond in MOESI). Perhaps this is subject to some kind of mandatory write-back of dirty data if leaving the visibility of a CCX, putting a ceiling on bandwidth+overhead?
 
What would the test be to isolate this bandwidth? The auto-translation of the discussion is a little rough for me, was this measured or disclosed by AMD? In order to profile this, the lines in the other CCX would need to be dirty, otherwise they wouldn't respond (shared does not respond in MOESI). Perhaps this is subject to some kind of mandatory write-back of dirty data if leaving the visibility of a CCX, putting a ceiling on bandwidth+overhead?

Apparently, they got the word directly from AMD.
 
So the software fix is to treat each CCX like a quasi NUMA node?

It seems like that. After the CCX arrangement was revealed, I mused that it looked like the organization was tailored for an easy match for a server hosting a bunch of VM allocations. That kind of workload can scale readily with this hardware, and also doesn't have the hard RAS requirements or hardware investment on the part of AMD.

Apparently, they got the word directly from AMD.
That number is also half the measured the bandwidth of Ryzen's dual-channel memory controller.
 
So how is the 6-core R5 going to work? What method is used to disable 2 cores of a CCX?
 
Back
Top