AMD RyZen CPU Architecture for 2017

I have good experience with my 100MB Micropolis SCSI drive. It has (indeed I still have it, working!, 22 years old) capacitators the size of peas (5 1/2 size, half height, server disk with 1KG steel frame). One day I wanted to water my plant which stood by my SCSI-"array" on the window-sill, and I accidently watered my drive. I didn't realized before I heard the water drops boiling between the contacts. I immediately turned off the power and was scared to death. I have a positive attitude in general, so I went to the disco, letting it dry up. And so it was, it worked perfectly fine after I came home in the morning.
I loved the drive, because literally everything on the board was gigantuan, I always thought I can go to the electronic shop to buy replacements when I need to, because the resistors fe. had standard color-coding. :)


Reminds me of first MFM hard drive I've seen. It was 20MB 5.25'' monster from IBM and was double or even triple height! I also had old Quantum BigFoot hard drive, but that was already 'modern' ATA drive with massive 1200MB capacity :)
 
Some information on the Neural Net Prediction:


This does seem to point to some kind of contextual information in addition to branch history being stored, some of it would appear to be data that was derived from information the branch instruction wouldn't touch. It could be something basic like whether an instruction cache line or page hits the FPU, but perhaps it can record information from execution trends in the back end?

The life span indicates it is temporary and will eventually be overwritten. Saying that it usually loses history after switching applications, if defined as a user switching tabs or something like that, may mean that this history is tied to something that can survive a thread context switch--which would happen more frequently than changing applications. I'm not sure if that means there's a dedicated store for it, or if it's extra context in something that might live for a while, like the instruction cache or uop cache. The predictor/BTB/TLBs might be a place for that, although their space and timing is usually at a premium.

I wonder just how fine grained clock/power gating is in the core? I would assume the big benefit hear comes from the move aggressively you power/clock gate and how fine grained you go.

For example would be pretty cool if they could clock and or power gate of 1/2 the ALU/AGU/PRF but use the neural net prediction to wake it up them up early so they are ready for when the burst of ILP comes.
 
Does anyone know anything about this?

Speaking of which, development of Zen+ already is undergoing. Depending on how market will shift in upcoming months, there will be one of two versions of the CPU. One is with much beefier cores, aparat from obvious things like AVX512, second is that single core can have... 4 threads.

Zen++ combines both concepts
 
Will be interesting to see if it does go that way, 4 threads/core is like power7 I think? If the rumour does pan out, power stays under control and obviously intel don't storm ahead some really beefy 32 core/128 thread (maybe 64/256? dun dun dunnnn) server cpu sounds very nice indeed.

Whole bunch of ifs though.
 
Does anyone know anything about this?
I have seen this rumor before i think its full of crap.

Seeing Pappermaster said 3 years of tocks after Zen, the next tock must be near or already had first silicon back (just like piledriver).
To go 4 wide SMT or 512AVX means big changes again to the core's L/S and cache system, with SMT4 they will need more load store ports. Both of these will make them crap consumer cores especially in laptops and thats a very big market to ignore.

if Zen+ means the next from scratch Uarch ( 4-5 years from now) then maybe, but i still doubt it. Build the strongest core you can on the best cache/interconnect you can and just scale cores. with SMT4 you will have to trade ILP performance for throughput and we know how that turned out with bulldozer...
 
Last edited:
SMT4 has been used without a higher L/S count, per the mentioned POWER 7.
The higher thread counts apply to narrower niches as the amount of performance per thread drops. Per Xeon Phi or the SMT4/SMT8 modes in later POWER chips, it's something with more limited ILP and/or an increasing number of memory/IO stalls.
If it's only targeting a set of mostly-stalled server loads, SMT4 might not need that much of an investment.

AVX 512 might need more work, unless it's a question of the bare-minimum of support, where cracking could get ISA compatibility.
Even that might raise questions since the agreement between Intel and AMD hasn't been refreshed recently, and I am not sure if AVX 512 and friends readily fall under its umbrella. Also, there's the other rumor that Intel is doing some housecleaning of its SIMD instructions, so there might be a risk of skating to where the puck is rather than where it will be.

Cracking or looping with 128-bit chunks does give a path for the high-level architecture to align ARM SVE, if AMD's ARM ambitions return, given SVE's granularity for scaling.
 
SMT4 has been used without a higher L/S count, per the mentioned POWER 7.
The higher thread counts apply to narrower niches as the amount of performance per thread drops. Per Xeon Phi or the SMT4/SMT8 modes in later POWER chips, it's something with more limited ILP and/or an increasing number of memory/IO stalls.
If it's only targeting a set of mostly-stalled server loads, SMT4 might not need that much of an investment.
If their SMT4 implementation is to take the SMT2 core and just add an extra bit to every area that needs to be thread aware and then only sell SMT4 models to markets that can get a benefit ( DB Front end web maybe etc) then i can see it. But a core that is targeted to reach optimal performance across majority of workloads in SMT4 mode i just dont see it, just add more cores.......
 
Knowing dev circles I think we can be sure that all iterations of Zen are already decided and that another team is working in the new one(4 to 5 years from now) and im sure that rumor was just someones attend to be famous. I'm not saying we could not see SMT4 or something like that but Im sure the person who posted it have no idea about any future plans AMD may have.

Haven't seem this yet but you can take a look.

http://hexus.net/tv/show/2017/01/AMD_s_chief_technical_officer_interviewed?sf52081299=1
 
Knowing dev circles I think we can be sure that all iterations of Zen are already decided and that another team is working in the new one(4 to 5 years from now) and im sure that rumor was just someones attend to be famous. I'm not saying we could not see SMT4 or something like that but Im sure the person who posted it have no idea about any future plans AMD may have.

Haven't seem this yet but you can take a look.

http://hexus.net/tv/show/2017/01/AMD_s_chief_technical_officer_interviewed?sf52081299=1
that was one of the better interviews, mainly because the interviewer doesn't babble on.
 
If their SMT4 implementation is to take the SMT2 core and just add an extra bit to every area that needs to be thread aware and then only sell SMT4 models to markets that can get a benefit ( DB Front end web maybe etc) then i can see it. But a core that is targeted to reach optimal performance across majority of workloads in SMT4 mode i just dont see it, just add more cores.......
That was the general idea, a modest investment in tweaks to the core as a value-add in specific workloads that have limited ILP, stalls, or in some cases core-based software licensing.
Just adding more cores implies adding whole CCX blocks, and with Zen's current balance it adds IO, memory channels, and more chips on an MCM as well. The physical product loses flexibility in appealing to any cost-sensitive parts of the market if the solution is Naples-level integration for a front end.
This thread brought up the possibility Zen doesn't scale as well in the highest core counts, and having more cores to host mostly-stalled threads does mean more cores that cannot be gated for either the upper turbo range.
 
That was the general idea, a modest investment in tweaks to the core as a value-add in specific workloads that have limited ILP, stalls, or in some cases core-based software licensing.
Just adding more cores implies adding whole CCX blocks, and with Zen's current balance it adds IO, memory channels, and more chips on an MCM as well. The physical product loses flexibility in appealing to any cost-sensitive parts of the market if the solution is Naples-level integration for a front end.
This thread brought up the possibility Zen doesn't scale as well in the highest core counts, and having more cores to host mostly-stalled threads does mean more cores that cannot be gated for either the upper turbo range.
if a CCX grows to say 6 core ( not just adding more ccx's) then now your at 48 cores for the same number of CCX's as Naples, with intel bringing 6 cores to mainstream socket with cannonlake it would help in those markets as well. This was also one of the rumors of AMD releasing 48core parts in late 2018/2019. I guess the other option is 6 zeppelin like dies but i wonder how likely we are to see 12 channel memory......
 
Not knowing the particulars of how AMD handles coherence within a CCX or between them, the costs for increasing core counts within a CCX are unclear. In this scenario, the product would support 50% more threads with perhaps 50% more CCX area.
AMD seems to be focusing on the modularity of the current architecture, with the CCX being generally unchanged up and down the range. If AMD someday decides to move its whole range to a 6-core granularity, this scheme is maintained. Otherwise, there's a modular architecture with a new module that is no longer applicable across the range.
 
Not knowing the particulars of how AMD handles coherence within a CCX or between them, the costs for increasing core counts within a CCX are unclear. In this scenario, the product would support 50% more threads with perhaps 50% more CCX area.
AMD seems to be focusing on the modularity of the current architecture, with the CCX being generally unchanged up and down the range. If AMD someday decides to move its whole range to a 6-core granularity, this scheme is maintained. Otherwise, there's a modular architecture with a new module that is no longer applicable across the range.

They are going to have to increase core counts going forward one way or another, hopefully whatever way they choose to increase core count they improve the associated latency/throughput of the scaling function to keep each cores latency/throughput the same.

Assuming the 48 core rumor is true i think the 3 obvious choices are:

6 core a ccx, 2 ccx a die , 4 dies a MCM
4 core a ccx , 2 ccx a die 6 dies a MCM
4 cores a ccx , 3 ccx a die , 4 dies a MCM

if they want to increase cores on the long lived AM4 platform then i think 1 looks like most attractive assuming they keep 1 memory controller a CCX.
 
A pity Zen doesn't yet do AVX-512.
No Intel i7/Xeon chips do AVX-512 either. Only Xeon Phi does (and that's different version of AVX-512). AVX-512 instruction set seems to be highly fragmented. Knights Landing, Purley, Cannonlake and Knights Mill all have different AVX-512 variations (only a common subset). If AMD supports AVX-512 at some point, I wonder which version. I don't think we see AVX-512 in games and consumer software at all. AVX-512 seems to be currently limited to Xeons. Even AVX (1 & 2) only see minor usage in consumer software. Low end Skylake chips have AVX disabled, further limiting its adaptation.
 
AVX 2 is pretty usefull for video encoding&such... Not a game changer, but it is still nice. If more and more cpu supporte it, I guess more apps will supporte that too. Like SSE was not used when launched, now a lot of apps need it to run I believe.
 
No Intel i7/Xeon chips do AVX-512 either. Only Xeon Phi does (and that's different version of AVX-512). AVX-512 instruction set seems to be highly fragmented. Knights Landing, Purley, Cannonlake and Knights Mill all have different AVX-512 variations (only a common subset). If AMD supports AVX-512 at some point, I wonder which version. I don't think we see AVX-512 in games and consumer software at all. AVX-512 seems to be currently limited to Xeons. Even AVX (1 & 2) only see minor usage in consumer software. Low end Skylake chips have AVX disabled, further limiting its adaptation.

Would your assessment of potential usage in games be different if AVX-512 were universally supported and unified into a single version? Or do you think it's just not useful in games?
 
Back
Top