AMD RyZen CPU Architecture for 2017

fehu · May 8, 2015

Interposer are too much expensive to use in a cpu to add hbm as ibm does with edram last level cache?

hoom · May 8, 2015

eastmen said:
true but it will certainly help clocks and power usage and allow them to fit more transitors in

The term IPC exists explicitly to separate performance improvement due to better logic from due to just faster clocks.

I said with a bunch of salt because I believe AMD has a recent history of using IPC to include just adding cores/threads, which is a thing that a smaller process can let you do.
But I think everyone hopes they mean 40% better IPC per Thread/Core so that there is a real, significant single-threaded performance improvement

Better clocks & perf/W is to be hoped for too, because just having better IPC is pointless if the clocks are shit or you can heat the room with it.

pTmdfx said:
"Opportunities to expand across AMD's product portfolio"
"Transformational Memory Architecture" - may or may not be introducing HBM, but likely.

Yeah & I really hope so but it is quite weird that they haven't explicitly mentioned HBM for the CPUs.

nutball · May 8, 2015

40% IPC across the board improvement in single-threaded performance in one generation. Even starting from a low ebb such as Bulldozer that would rate as one of the greatest achievements in x86 CPU design in the past couple of decades. Or it could be marketing bullshit - a 40% improvement in something, sometimes, under some circumstances ... that may even be useful, occasionally. I know where my money would go.

I used to like AMD, I've owned a lot of their products over the years (more than the other guys). But the desperate flailing and PR spin over the past few years is just depressing.

Blazkowicz · May 8, 2015

I read(ed) it as +40% IPC for the IPC of a core, which includes the benefit from SMT.
Let's say SMT is responsible for +30% on its own, then the IPC boost would be +7.7%.
At least it's +7.7% over Excavator, which has some undisclosed IPC boost over Kaveri, which is a wee bit better than Piledriver.

Gut feeling : they'll match an 8-core Sandy Bridge, but eating less power and with AVX2.
I wonder about the TDP : max 95 watt or 100 watt?, spare an "exxtreme" version. This is the limit that Intel chose for its consumer socket, and also that of AMD's own FM2/FM2+ sockets.

nutball · May 8, 2015

Blazkowicz said:
I read(ed) it as +40% IPC for the IPC of a core, which includes the benefit from SMT.

Right. That falls under the category of "marketing bullshit" as far as I'm concerned. SMT != single-threaded IPC.

liolio · May 8, 2015

I'm definitely not the expert on those topics but there are things I dislike already wrt those freshly announced Zen cores: foremost they still aim for the moon or I could state differently they aim for Intel's market. Imo it is a mislead choice, they can't succeed. It calls for a pretty complex design, costly, which in turn will affect (negatively) their time to market and their ability to iterate their new architecture.

There are thing I like, for instance it seems that they are to focus on a single architecture. Sadly my take is that they set significant hurdle for themselves (by pursuing what is most likely an extremely wide, complex, with SMT, with an extensive cache hierarchy, etc.).

Imo and for it is worth I think AMd should have gone for a middle of the road type of approach, Intel (for now at least) has at on side of the spectrum Atom and on the other its Core iX. AMD should have aimed in the middle.

If I were in AMD socks (pun intended) I would focus on the mobile segment foremost, I would have leverage the fact that on the X86 segment, Intel (is) aside, they are alone they don't need to reach out for the moon yet they have argument to push.

I think instead of looking at Power 8, Broadwell/skylake or Apple's take on brainiac type of arch, they should have looked at what they have and scale up jaguar cores and iterate/ improve. I would also think that they should pass on following suit with Intel and its efforts to increase the SIMD capabilities of their CPU, it comes at a great cost and complexity to make the most out of it. AMD should align itself on ARM, when it comes to the SIMD capabilities of its CPU, 4 wide SIMD is enough. AMD should have increased the Jaguar L2 fabric, increase the number of supported client say from 4 to 8 (that's already quite a lot) and the maximum amount of L2 to +8MB.
//Shortly something "simpler" they can iterate and improve upon faster. In the meantime (more) resources would be devoted to GPU integration (and bandwidth saving measure for severely power constrained environment, drivers comes to mind too, etc.). For most people (including pros) CPU power is no longer a bottleneck, with Direct X12 around the corner AMD should save itself from a direct confrontation with INtel heavy hitters, they don't need to.

3dilettante · May 8, 2015

nutball said:
40% IPC across the board improvement in single-threaded performance in one generation. Even starting from a low ebb such as Bulldozer that would rate as one of the greatest achievements in x86 CPU design in the past couple of decades.

It is a good improvement, but it is magnified by coming from starting point that is quite poor.
AMD is claiming Zen is a new architectural line, so where it stands relative to the Bulldozer line can also be compared to other CPU lines--from Intel, for example. There have been cores that have improved IPC by 40% over Excavator since before there was Excavator.

Blazkowicz said:
I read(ed) it as +40% IPC for the IPC of a core, which includes the benefit from SMT.

I would hope not. IPC is a term that arose in a single-threaded single-core context. It serves as a factor in determining straightline performance within a context.
Saying in a multi-threaded context that there are instructions being processed per cycle, while correct if using the word's definitions in isolation, is not how it is used or has meant historically. IPC significantly loses its descriptive power if it is bloated to cover another axis.
It wouldn't illustrate anything unless AMD finds a way to use two physical threads to run the same software thread--but that was something Bulldozer's CMT was supposed to try.

40% more IPC than a rather weak core seems fine. There are cores right now that are around that. A hypothetical design that needs to muddy IPC to get to that number is not as noteworthy as the critical reaction AMD would get for trying it.
Integer throughput can be described in other ways if it falls outside of IPC.

liolio said:
Imo and for it is worth I think AMd should have gone for a middle of the road type of approach, Intel (for now at least) has at on side of the spectrum Atom and on the other its Core iX. AMD should have aimed in the middle.

That would be either the top range of Jaguar or the whole range of AMD's Bulldozer line.
There isn't enough room. The smaller Core variants can be adjusted such that they stomp on the top range of Jaguar, or they can be made to handily beat down Bulldozer. The Atom cores can range up to at least the bottom of Jaguar's range and can scale lower than Jaguar can reach. ARM can fit in and around that range as well.

It is difficult for a core to cover all of that range in terms of cost, power, and performance. However, AMD does not have the means to create two acceptable lines, and the middle is more heavily threatened by the improvement in commodity cores and Intel's ability to subsidize its smaller cores and implement variants using its large ones.

There is limited upside for AMD below Zen. Targeting that range also abandons everything in the upper tier of Zen's range, which is one of the areas where AMD is projecting growth.

liolio · May 8, 2015

3dilettante said:
That would be either the top range of Jaguar or the whole range of AMD's Bulldozer line.
There isn't enough room. The smaller Core variants can be adjusted such that they stomp on the top range of Jaguar, or they can be made to handily beat down Bulldozer. The Atom cores can range up to at least the bottom of Jaguar's range and can scale lower than Jaguar can reach. ARM can fit in and around that range as well.

I think from a business POV AMD has to ignore Intel or quit the CPU business altogether. Intel can price Atom actually even Core iX as they want if they have to. The volume they can produce are also out of AMD reach. ARM does not run windows apps (and games) and at this point that is AMD only possible salvation: offering an alternative for some products (not all) to OEM. If gaming turns into more of a serious business in the Android realm then their GPU is going to be a more relevant selling points. You don't choose who you are competing with, in their case it is Intel, there is no winning that one so imo they are better off doing their own things, and actually setting reasonable goals for themselves.

It is difficult for a core to cover all of that range in terms of cost, power, and performance. However, AMD does not have the means to create two acceptable lines, and the middle is more heavily threatened by the improvement in commodity cores and Intel's ability to subsidize its smaller cores and implement variants using its large ones.

If they can't delivers a middle of the road core that performs well enough, they won't delivers an high end one either. As for Intel... sadly they can't do a thing about it.

There is limited upside for AMD below Zen. Targeting that range also abandons everything in the upper tier of Zen's range, which is one of the areas where AMD is projecting growth.

Sadly the whole thing smells (like some others late announcements) like making the "bride" looks beautiful, something like we are doing that too: crazy wide cores with SMT? Check. ARM CPU? Sometime within the next 2 or 5 years, etc. There could be growth in the high end but Intel is there and kicking and if AMD were to pull an ace (close to impossible compared to INtel offering looking further than the CPU by self) that definitely a high margin business where Intel will go ANY length to shunt them.

It does not sound realistic wrt to what seems the state of AMD engineering (be it CPU or GPU they iterate on their design slower and slower), Jaguar which had a lot of potential saw hardly one mild iteration, now they moving on to "shining /singing tomorrows.."

3dilettante · May 8, 2015

liolio said:
I think from a business POV AMD has to ignore Intel or quit the CPU business altogether.

I don't see why this is an "or" proposition. Intel makes CPUs, and it offers products across a vast range of CPUs, short of simple cores in the microcontroller range. How does AMD ignore that and stay in the CPU business?

You don't choose who you are competing with, in their case it is Intel, there is no winning that one so imo they are better off doing their own things, and actually setting reasonable goals for themselves.

This is contradictory. If they are doing their "own thing" then it is something nobody else is doing. If you don't do what others are doing, they cease to be competitors unless they choose to follow along.
If AMD is making CPUs of any notable complexity, they are going to run into Intel. It's not like customers evaluating architectures will forget Intel exists.

If they can't delivers a middle of the road core that performs well enough, they won't delivers an high end one either.

High-end cores have demonstrated their ability to scale down. The comparatively modest performance gains in the single-threaded realm means that the upper tier has experienced more multi-core scaling than single-core improvement.
Cutting down from higher core counts to just a few, and from the highest clocks to more modest targets, tends to cover the good-enough realm.
It starts to falter in the most power or cost constrained markets, but AMD's position is either inferior or seriously threatened at the low end with these measures.

Alexko · May 8, 2015

Liolio, you call Zen a crazy wide core, but frankly I don't see any basis for that. We don't really know how wide it is, and the advertised +40% IPC doesn't exactly suggest something huge, especially when you consider that Bulldozer's main weaknesses were arguably the cache hierarchy, and the somewhat half-assed CMT scheme. Fixing those issues does not imply a particularly large core, nor does it entail any kind of work that AMD wouldn't have to do for a narrower, more modest micro-architecture.

ninelven · May 8, 2015

Indeed, I doubt it is any wider than Intel, Apple, or Nvidia's latest...

I, for one, am really excited for Zen. If I can get 8 real cores at a reasonable price with decent single thread performance in 2016 that will be wonderful. Of course, I've been let down before, but the tone of everything I am reading and hearing suggests to me that AMD has genuinely got it right this time (or at least is headed in the right direction).

jacozz · May 8, 2015

One thing I wonder about is... according to the 2016 roadmap it clearly states that there will be ZEN cores on the top end FX on the new AM4 platform, but the APU:s is also AM4, but they are not based on Zen. So what is it? Godaveri+ with pin compatible socket? or what?

Alexko · May 8, 2015

ninelven said:
Indeed, I doubt it is any wider than Intel, Apple, or Nvidia's latest...

I, for one, am really excited for Zen. If I can get 8 real cores at a reasonable price with decent single thread performance in 2016 that will be wonderful. Of course, I've been let down before, but the tone of everything I am reading and hearing suggests to me that AMD has genuinely got it right this time (or at least is headed in the right direction).

Given the very tight schedule, it will probably be rough around the edges, but I have a good overall feeling about it. I think they'll get the most important parts right. After that, they might be able to get a decent +10% per generation for a generation or two at least, assuming they follow their usual pattern.

jacozz said:
One thing I wonder about is... according to the 2016 roadmap it clearly states that there will be ZEN cores on the top end FX on the new AM4 platform, but the APU:s is also AM4, but they are not based on Zen. So what is it? Godaveri+ with pin compatible socket? or what?

I hope it's at least Carrizo shrunk to 14nm and updated with newer GCN blocks, but knowing AMD, well, I'm not terribly optimistic about that.

3dilettante · May 8, 2015

Alexko said:
Given the very tight schedule, it will probably be rough around the edges, but I have a good overall feeling about it. I think they'll get the most important parts right. After that, they might be able to get a decent +10% per generation for a generation or two at least, assuming they follow their usual pattern.

The time frame for Zen is a bit compressed for a wholly new design, assuming it didn't get going until after Keller was hired.
It might be playing it safe in some regards, or could be basing some of its units off of what came from existing cores.
A Zen+ that comes later could have the benefit of having the chance of replacing elements that strongly resemble what came before.

I am curious about the integer pipeline. Bulldozer's was narrow, but other than that I have not seen significant criticism of the scheduler. It does lack the ability to eliminate moves in the renamer, although later variants were able to use the AGU ports to consume moves.
Could two or more of those schedulers be clustered together? That might leave some performance on the table until a more comprehensive scheduler or one with better renaming capability follows in a descendant.
The load/store pipeline is another unknown. That can interact with the new cache hierarchy, and there may be refinements available once those caches are run in the real world.

I hope it's at least Carrizo shrunk to 14nm and updated with newer GCN blocks, but knowing AMD, well, I'm not terribly optimistic about that.

It could also be knowing the likely demand for that node by everybody, and knowing Globalfoundries. A limited-volume and higher-margin segment like enthusiast desktop, workstations, and servers does insulate from the likely demand and uncertainties about the supply.

Alexko · May 9, 2015

3dilettante said:
The time frame for Zen is a bit compressed for a wholly new design, assuming it didn't get going until after Keller was hired.
It might be playing it safe in some regards, or could be basing some of its units off of what came from existing cores.
A Zen+ that comes later could have the benefit of having the chance of replacing elements that strongly resemble what came before.

I am curious about the integer pipeline. Bulldozer's was narrow, but other than that I have not seen significant criticism of the scheduler. It does lack the ability to eliminate moves in the renamer, although later variants were able to use the AGU ports to consume moves.
Could two or more of those schedulers be clustered together? That might leave some performance on the table until a more comprehensive scheduler or one with better renaming capability follows in a descendant.
The load/store pipeline is another unknown. That can interact with the new cache hierarchy, and there may be refinements available once those caches are run in the real world.

That was pretty much my thinking as well. It's also possible that AMD started working on a brand new design as soon as they realized Bulldozer sucked (maybe in 2010 or 2011) but there's a good chance that corporate politics created some inertia and that nothing serious was put in motion before Keller's arrival. The latter's apparent fondness for wide cores (if Swift and Cyclone are any indication) suggests something wider than Bulldozer, but beyond that, I don't know.

As wide as Cyclone is, its clock speed is very low, and I wonder whether that might be true of Zen too, especially since AMD didn't say anything about clock speeds, and their latest process choices do not favor high frequencies. Obviously, by "slow" I mean something on the order of 3GHz, not the 1.5GHz at which Cyclone tops out.

3dilettante said:
It could also be knowing the likely demand for that node by everybody, and knowing Globalfoundries. A limited-volume and higher-margin segment like enthusiast desktop, workstations, and servers does insulate from the likely demand and uncertainties about the supply.

Yes, but the 14nm supply/demand situation is hard to read. The only real data points we have, as far as I know, are the following:

— Samsung is already selling devices with 14nm chips, but of course they are real me… I mean, they have fabs;
— AMD plans to have a 14nm discrete GPU lineup in 2016;
— NVIDIA has similar plans, or at least they plan to have at least one Volta GPU with a FinFET process and HBM;
— Apple will possibly be buying wafers from Samsung, but some reports say TSMC will make the rest, not GloFo after all: http://appleinsider.com/articles/15...tsmc-for-30-of-a9-chip-orders-for-next-iphone

So if we're talking about a mid-2016 release (basically, Carrizo + 12 months) then 14nm should be doable as far as GloFo is concerned. Whether AMD should bother shrinking an obsolete design, however, is unclear. It might make more sense to just make do with a mildly reheated 28nm Carrizo and make a Zen-based APU on 14nm as soon as possible. A 6-month delay between the CPU and the APU sounds reasonable.

Kaotik · May 9, 2015

Pascal, not Volta. Volta comes after Pascal in 2017 or so.
GlobalFoundries is already producing chips on the 14nm process licensed from Samsung

Blazkowicz · May 9, 2015

"Middle of the road CPU" is about what Carrizo is, and after that well this is what the quad core Zen is for. I expect to be a triple-core cut down variant.

Carrizo-L is some "Jaguar+" with newer GCN so here's is the thing for the lower end market with later a choice of dual core Zen or 64bit ARM. So, there are enough chips already?

sebbbi · May 9, 2015

Alexko said:
As wide as Cyclone is, its clock speed is very low, and I wonder whether that might be true of Zen too, especially since AMD didn't say anything about clock speeds, and their latest process choices do not favor high frequencies. Obviously, by "slow" I mean something on the order of 3GHz, not the 1.5GHz at which Cyclone tops out.

I would hope that AMD aims for a lower clock ceiling than Intel (with Haswell and Broadwell). Haswell maximum clock ceiling (turbo) is 4.4 GHz (flagship 4 core i7 Extreme with 4.0 GHz base clock). With 6 cores (i7 Extreme flagship) the clock ceiling already drops to 3.7 GHz. If you browse through the E5/E7 server CPUs, no Haswell model has a clock ceiling over 3.5 GHz (and the average base clock is around 2.5 GHz). On mobile side there are a few i7 models (starting at around 400$) that have higher than 3.5 GHz clock ceiling. Most of the mobile and server chips have been configured to run at a significantly lower maximum clock.

Intel can afford to use a CPU design targeting 4.4 GHz clock ceiling across the whole range (down to Core M clocks). But this is in no way an optimal solution for performance/watt and for the chip complexity (and manufacturing cost). AMD should aim for a lower clock ceiling, because desktop computers are no longer a huge market (and the market is declining all the time). Servers and laptops are all about performance/watt. A CPU with a slightly lower clock ceiling can sport a shorter pipeline among other advantages. With a 3.5 GHz clock ceiling AMD would be able to optimally cater the server and laptop markets. This kind of a CPU design would also better scale down to tablets and ultraportables than Core M. If AMD intends to drop the "cat" CPU series completely, it would make a perfect sense to aim for a lower clock ceiling. This kind of design would be perfect for all the other areas except for top of the line desktops. Even the high end workstation CPUs tend to have 8 cores nowadays (and Intel doesn't have anything above 3.0 GHz base / 3.5 GHz turbo there, at $999).

Aiming for ultra low clock ceiling doesn't make any sense either. Mobile CPU clocks have increased rapidly in the last years. Apple is at 1.5 GHz and many Android phones are at 2 GHz already. A good CPU design needs some clock headroom for turbo (race to sleep is important for consumer devices). I'd say the optimal clock ceiling for AMD would be somewhere around the middle between the Jaguar and the Haswell. This should (at least theoretically) give them a little edge over Haswell/Broadwell in the clock range that matters for today's (and tomorrow's) market.

Let's assume that the AMDs marketing material is correct. Zen has 40% IPC advantage over Excavator. Now a 3.5 GHz turbo clock multiplied by 1.4 would "match" a 4.9 GHz Excavator. Currently the top end AMD desktop CPU is based on Piledriver architecture and runs at 5.0 GHz turbo (at a whopping 220W TPD). Steamroller has 9% higher single thread IPC and 18% higher multithread IPC over Piledriver. Excavator is going to give some additional gains over this. This means that a 3.0 GHz (3.5 GHz turbo) Zen would handily beat all the older (200W+) AMD CPUs in desktop performance (at a much lower TDP). This allows AMD to increase the core counts of their high end desktop and workstation CPUs. Obviously AMD could not compete with Intel's highest clocked i7 Extreme desktop/workstation 2-4 core CPUs in single threaded performance, but at 6-8 cores and above, they could be very competitive indeed (in both performance and performance/watt). High core counts obviously need sophisticated cache hierarchies and data transport inside the chip. I hope that AMD can deliver this time. At least they mention much improved cache bandwidth and latency in their Zen marketing slides. Hopefully they have similar improvements planned that Intel did with Sandy and Haswell.

Blazkowicz · May 9, 2015

I will shut down my computer till the clock ceiling is increased.

jacozz · May 11, 2015

Alexko said:
So if we're talking about a mid-2016 release (basically, Carrizo + 12 months) then 14nm should be doable as far as GloFo is concerned. Whether AMD should bother shrinking an obsolete design, however, is unclear. It might make more sense to just make do with a mildly reheated 28nm Carrizo and make a Zen-based APU on 14nm as soon as possible. A 6-month delay between the CPU and the APU sounds reasonable.

Yes I agree.
It makes no sense to go trough all the work and resources it will take for a die shrink on Carrizo, instead I think AMD will stretch the fm2+-socket trough most of 2016 with 28 nm APU:s. On the financial webcast Mark Papermaster clearly states that Zen will replace bulldozer and cats with ZEN.

Since the cost for the first wafers silicon probably will be high, it makes sense to first release ZEN FX on the desktop, then the server version, (due to the validation time), and finally the lower cost consumer APU:s, hopefully in the end of 2016 or at least in the beginning of 2017.

Anyway, thats how I interpret the roadmap.

AMD RyZen CPU Architecture for 2017

fehu

hoom

nutball

Blazkowicz

nutball

liolio

Aquoiboniste

3dilettante

liolio

Aquoiboniste

3dilettante

Alexko

ninelven

PM

jacozz

Alexko

3dilettante

Alexko

Kaotik

Drunk Member

Blazkowicz

sebbbi

Blazkowicz

jacozz

Similar threads