AMD RyZen CPU Architecture for 2017

I don't know if this article has been posted here before (it's from August 2016), but I recently saw it linked on the Real World Tech forum.

"AMD Finds Zen in Microarchitecture."

There's a lot of information, including SPECint_rate2006 Base estimates for various CPU cores.

U126_Zen_F2.png


David Kanter said:
[This figure] shows our estimates for Excavator and Zen. First, we recalculated the A10-7850K’s benchmark score without libquantum, which ICC has cracked, resulting in an adjusted score of 81.4. Increasing that number by 15% for an Excavator-based design should yield 93.6, and a 40% boost from moving to Zen yields 131. We further expect that using a compiler optimized for Zen (instead of Intel’s compiler) would boost performance by another 10% to 144.

[The figure] also shows recent Intel desktop processors (e.g., Core i7-2600K, 3770K, 4770K, 6700K) and their adjusted SPEC scores (i.e., without libquantum). Overall, our estimate for Zen is fairly close to the score for Intel’s Ivy Bridge but short of Haswell’s and Skylake’s. AMD argues that the Intel products receive an unfair 5–10% boost because the company compiles to 32-bit x86 code, which is unrealistic for many applications. Adjusting by 10% would put Zen about halfway between Ivy Bridge and Haswell for SPECint_rate2006.
 
There are some tidbits in there that would warrant commenting on after I have time to let the information percolate (no stack engine in BD, EX seems to disagree with other sources like Agner's optimization doc, non-inclusive uop cache, etc.), but was libquantum any less broken between AMD and Intel? It might be worthwhile to have an apples to apples comparison if they're equally cheating. Libquantum was actually an area where BD did better, for what that's worth.
 
The other things to remember are that:
AMD have stated they beat 40% target IPC increase
Zen appears to be able to clock higher atleast in the 8 core consumer part.
Those 256bit ops blow the TDP out while down clocking so Zen should be able to clock higher, broadwell/skylake a little lower and the gap reduces a bit.
 
I'm unsure how to interpret the Linley article's point about the L1 Icache being physically addressed, and the uop cache potentially also being physically addressed. If both are due to the TLBs being put into the prediction pipeline, and addresses being translated prior to making a cache access, does this negate the L1 Icache aliasing issue that the VIPT cache of earlier generations? Even with the higher associativity of the Zen L1I, it would still have insufficient associativity for its size to avoid the issue if VIPT.
However, is it really a TLB if the address is calculated before hitting the caches (what is the Look-aside function if it was looked at earlier)?

The article indicates the memory file in the front end is related to the stack engine, although AMD's slides seemed to be a little broader in crediting it for load to store forwarding as a separate bullet point.

The article does indicate Zen uses MOESI, which may have some implications since it seems this and other elements like the L3 share some high-level features with the less-impressive L3/LLC cache subsystems of prior generations. Perhaps if there is a directory or filter system in place, the scalability issues for broadcasts and invalidates are less prominent at higher core/chip counts.

The article describes AMD's implementation of FMA as being two FMA units and two FADD units, rather than bridging MUL and ADD pipes. An FMA would steal a read port from an ADD pipe.
This makes sense, although I wonder if that stands for an optimization in the future if the CPU (and its supposed "neural net") could detect a reused or overwritten register and have the FMA pipe satisfy its third operand slot over multiple cycles. Speculatively, might not something like that be done for operands in the integer pipeline? If there were feedback to the decode/uop cache, a loop might even decode into uops that say "reuse this operand", with care taken for any sort of branch/exception recovery to fall back a more conservative checkpoint.
 
The other things to remember are that:
AMD have stated they beat 40% target IPC increase
Zen appears to be able to clock higher atleast in the 8 core consumer part.
Those 256bit ops blow the TDP out while down clocking so Zen should be able to clock higher, broadwell/skylake a little lower and the gap reduces a bit.

Its the problem with this type of calculation.

they have take quadcore as comparaison point, who ofc for Intel are clocked effectevely rather high ( 6700K@4-4.2ghz ), ( their IPC result is depending on the clockspeed of each model there. )
 
Last edited:
The L2 number is at the upper limit of the rough guess I got from the leaked die shot earlier in the thread.
https://forum.beyond3d.com/posts/1916209/

The hand-waving and error margin were generous in that post, however.
That would seem to put the Zen core in the 5mm2 range give or take, if the error in core estimate is consistent with that of the L2 calculation.

The size difference for some of the Intel elements does lend credence to there being a cost to the vector units and the plumbing in the cache to handle them. The L3's modularity might have cost some density as well.
 
If they're being consistent, then I would say yes. The area per core for Skylake is over 12mm2 if it weren't including L3 in the overall figure. It seems closer to what is expected if this is the area of a core+L3 complex.
 
I read it as larger for Zen: 0,08 mm2 vs 0,058 mm2 ( Intel )
I think he means given that why is the overall area smaller for Zen.

im guessing throughput, intel cache can move twice the data per cycle , intel needs it because of twice the L/S bandwidth per core because of full rate 256bit ops.
 
https://world.taobao.com/item/543819565628.htm?fromSite=main

Edit: Yea, they pulled the link ...

If you visit taobao.com through this link you will notice an entry for Ryzen. Taobao is a Chinese online shopping website similar to eBay, Amazon and owned by the Alibaba Group. The AMD Ryzen processor is listed at 4.2 GHz, other then that few details are given in relation to specs.

What however is interesting is an availbility date on the 28th or February alongside a ¥ 1999.00 pricetag, which translates to 275 euro and 290 USD. The chip is listed at 14nm and yes, that 4.2 GHz turbo clock frequency is mighty interesting, as it does seem 200 Mhz higher then expected.

http://www.guru3d.com/news-story/am...ces-at-taobao-with-28th-feb-availibility.html
 
Last edited by a moderator:
Well, could just been for traffic on the shop. I dont remember if the price in Yen are slighty lower than other country, but this seems way to low for the 8cores, and we know that quadcore and 6cores should come way later..

This said some samples was allready at 4ghz, why not 4.2ghz TB ( at least for a 6cores ).
 
Well, could just been for traffic on the shop. I dont remember if the price in Yen are slighty lower than other country, but this seems way to low for the 8cores, and we know that quadcore and 6cores should come way later..

This said some samples was allready at 4ghz, why not 4.2ghz TB ( at least for a 6cores ).

Do you have any privilege info to back that up? because AMD said "full family chip available from day one.
 
Back
Top