AMD RyZen CPU Architecture for 2017

It seems Zen's L3 is "mostly exclusive" like Bulldozer's L3. It tracks sharers of L3 lines within the core complex if I understand it correctly. But how coherence is handled between complexes, across chips and across packages still remains unknown.

Hotchips Zen Presentation, 1:21:00
 
Last edited:
@3dilettante Thanks, greatly appreciated.

Probably this one for the FP PRF. I didn't remember exactly which (IIRC both are a ISSCC conference paper), but both were definitely available in IEEE Xplore.
http://ieeexplore.ieee.org/abstract/document/5746227/

Edit: This figure sums up those data points in the paper.
http://pc.watch.impress.co.jp/img/pcw/docs/430/801/9.jpg
Thanks for the links, I don't have access to IEEE Xplore but the figure is nice. In the figure it mentions shared ports, that means there are less actual ports right?
 
@3dilettante Thanks, greatly appreciated.


Thanks for the links, I don't have access to IEEE Xplore but the figure is nice. In the figure it mentions shared ports, that means there are less actual ports right?
I am not sure what shared port means though. But 10 read ports and 6 write ports are pretty solid from a high-level view (2*FMAC + 2*VALU).
 
The shared ports number appears to reconcile the fact that there are more reads/writes in total when counting what goes to the side data path handles and the middle data path handle. That might align with what AMD said it did when it cut the number of FP pipelines in later cores and said it didn't have a major effect since the streamlining process combined or removed hardware that was being used by certain instruction mixes that wound up leaving some pipes excluded.

One item I do see is that the FP register file is the same number of entries as Zen's, so this might be an area where Zen inherited some features from BD.
The register file is split and banked, so there would probably be some internal allocation/scheduling to keep full throughput.
 
The FP register file is split into two separate sections. For regular SIMD, the lower half of a register is in the left half and goes left to the lower data path handle, and the upper half goes right.
That doesn't necessarily effect allocation, but it looks to be a way to physically spread out all those data paths, possibly gate off some of them, and to handle compatibility with legacy ops (one half has longer entries, possibly for x87) and miscellaneous operations.

The arrays themselves are split into 10 banks. That probably means that full access throughput requires hitting registers evenly distributed among the banks, and cannot support a truly arbitrary access pattern. If a run of operations somehow requested registers in the same bank, it would probably cause a conflict and stall all but one of them. The renamer and scheduler control a lot of this, so there may be ways to reduce the chance of this happening or to hide some of the delays.
 
My understanding is that the in-order front end works with the ROB and allocates a ROB entry, RS entry, and an LS entry if needed. The pipeline will stall if it can't.
The ROB does serve as a possible data source, but it doesn't do the full monitoring of forwarding and readiness of operands. It's already handling exceptions, writes to the register file, and in-order retirement.

Thank you, that (and the bit you quoted about Nehalem), settles it.

Zen will only have a slight scheduling handicap vs. the newest Intel cores then. With the stack renaming in the frontend (and stack cache), IPC might very well exceed Intel in many scenarios.

Cheers
 
Sounds excellent if it holds up. Really looking forward to a 6-core or 8-core for the next build over my aging 3570k and really hoping to see some competition and good pricing on >4 cores. I'd also like to have Vega available as well to match.
 
While interesting number this bech has the same problems than the previous ones: No system specs, no pics, just plain number that we have to believe without mentioning the lower clock speed(AMD said 3.4 as minimum cock speed). I will hold my credibility on this number until further leaks with more proof of authenticity come out.
 
Does Ryzen have dual or quad channel memory? I see some people saying its dual but I have no idea where they got that info from.

AM4 Desktop Ryzen has 2ch DDR4 memory controller. This is confirmed by slides from AMD and AM4 motherboard specifications.

Numbers look very good considering how close to Haswell/Broadwell IPC seems to be. Biggest questions for me are, final clocks / turbo / OC capability and pricing. I will have no problem recommending AMD CPU's for business machines and it seems to be perfect for workstations. 4c8t variant might be quite competitive for gaming PC's too, depending on final turbo speeds.

Looking forward to 2017 and hopefully my jump form 4c8t Haswell to 8c16t Zen :)
 
While interesting number this bech has the same problems than the previous ones: No system specs, no pics, just plain number that we have to believe without mentioning the lower clock speed(AMD said 3.4 as minimum cock speed). I will hold my credibility on this number until further leaks with more proof of authenticity come out.

Considering the source, I'd call this result about as reliable as a leak can get.
 
Yeah, Sam has very good sources. I don't know how, but he often manages to get his hands on ESs very early, and his reviews are always excellent. And as far as I can remember, he's never leaked anything that turned out to be false.

I've never met him but we've been (somewhat infrequently) exchanging messages for perhaps more than a decade. He's very competent and reliable. I might even buy some AMD stock, considering this.
 
I see, thanks for the answers. But why this sample would have lower clocks than what AMD already showed? maybe testing proposes?

From the OC threat
"AMD has already stated that all CPUs will have SMT enabled. With the Zen CPUs all coming from the same die, IPC will be the same and you just need to choose a chip based in your price range and core/clock needs."

When did AMD said that? I dont remember hearing that statement.
 
Last edited:
The chip reviewed by Canard PC is an A0 engineering sample. The wording of the article is a little bit ambiguous, but it seems that while "second generation" (A1? B0?) samples are in the wild, they require a newer version of the BIOS/AGESA, which wasn't available to CPC.

My guess would be that AMD demoed a second generation sample, or perhaps something more recent still.
 
Last edited:
I see, thanks for the answers. But why this sample would have lower clocks than what AMD already showed? maybe testing proposes?

From the OC threat
"AMD has already stated that all CPUs will have SMT enabled. With the Zen CPUs all coming from the same die, IPC will be the same and you just need to choose a chip based in your price range and core/clock needs."

When did AMD said that? I dont remember hearing that statement.

As said Alexko, the cpu tested is an engineer samples ( test have been conducted a good time ago, as the magazine is now in print, some pages have been leaked ). Its not the author who have put the pages in the wild it seems.

For what i have understand, they are doing in this magazine a special article comparing the different architectures on actual and old CPUs, comparing too the evolution of 7 Intel Cores generations, IBM and AMD. and so they have ask AMD to send them samples of Zen. .. And AMD have do it.

theres many things to consider: board are obviously too engineer samples with bios as they are, let alone the cpu, who was clocked effectively really low. ( as stated in the article, the performance are relative to this clock speed. ).

And when i see the result on a system like that, at thoses clock speed, im really surprised in good.
 
Last edited:
No, no, they didn't go through official channels, it was a leak.

Anyway, Sam also says that the L1 has a latency of 4~5 cycles (7~8 when loading into FP pipes), the L2's latency is 12 cycles, and the L3's, 35 cycles. I might have missed something, but I think this is new information.
 
Back
Top