AMD RyZen CPU Architecture for 2017

I would hope that AMD aims for a lower clock ceiling than Intel (with Haswell and Broadwell). Haswell maximum clock ceiling (turbo) is 4.4 GHz (flagship 4 core i7 Extreme with 4.0 GHz base clock).
Unless Zen is much more along the lines of Bulldozer's clock speed emphasis, or potentially more so, it would be unlikely due to Intel's ability to pair process and design much more closely. AMD is explicitly foregoing specialized processes, which is going to hamstring it when the foundries provide processes that have to cater to a market that wants to build ASICs.
The continual downhill march in clock speed for the speed-optimized Bulldozer line as AMD has shifted to more conventional processes shows the general trend before assuming a design with the complexity to achieve 40% more IPC is added.

Servers and laptops are all about performance/watt.
There are server and workstation markets that still care about top-end performance. Either due to software licensing costs that scale per core, or latency sensitivity (high-frequency trading?), getting high per-clock performance at the highest clocks possible is something worth paying a premium for.
AMD may very well not want to take the fight to that niche, but the money is real enough.

I think it's probable that AMD's solution will in the end be
1) uncomfortably late given that it is coming out in a few product bands in 2016 and barring process delays could be challenged by 10nm Cannonlake chips by the time it is out in force
2) inferior to Skylake, much less its successors, in IPC
3) inferior in clock speed

In the face of a clock and IPC disadvantage, I'd like to imagine that AMD could offer some other value-add. However, APUs are a 2017 proposition and elements like IO and features are unclear and may be out of date if the rumored HPC APU slide is accurate as to the standards in use.
Possibly, the new interconnect or the Seamicro IP about there is almost no information about might help in some niche.
Fancier instruction support and speculation techniques would be nice, but a noteworthy gap in Bulldozer's lineage was the enhanced speculation CMT could have provided. I still wonder if that line initially was trying for something exotic and got burned, so Zen might be going in the other direction when it comes to taking risks.

There's still the value play in tiers it has no presence in now (so more cash by default), and hopefully the architecture is able to compete with Intel's mid-range and maybe lower high range rather than trading blows with an i3.
 
In the face of a clock and IPC disadvantage, I'd like to imagine that AMD could offer some other value-add.

More cores would be a fairly simple option. This is what AMD did with the Phenom X6 with some success, and with Bulldozer, albeit with a less fortunate outcome. After all, most Intel products (now and presumably for a few years) remain stuck at 4 cores, so the opportunity is there.
 
Am I right to assume that a server-oriented APU with a sizeable proportion of GCN CUs would probably downsize on fixed-function parts, like TMUs and ROPs?
If it's GPGPU oriented, they're probably better off spending those transistors on more CUs and ACEs, right?
Unless, of course, the same chip doubles on becoming a consumer-oriented chip, which hasn't been reported so far.

There's still the value play in tiers it has no presence in now (so more cash by default), and hopefully the architecture is able to compete with Intel's mid-range and maybe lower high range rather than trading blows with an i3.

Isn't that what AMD's current range already does?
The FX8000 and FX6000 lines compete in price with the higher-end Pentiums up to the i3, whereas the FX4000 competes with the Celerons.
Let's hope they can do a little better than that, when the time comes..
 
More cores would be a fairly simple option. This is what AMD did with the Phenom X6 with some success, and with Bulldozer, albeit with a less fortunate outcome. After all, most Intel products (now and presumably for a few years) remain stuck at 4 cores, so the opportunity is there.
The basic desktop been stuck at four cores for a host of reasons. Intel would sell higher core counts at these tiers if it could sell more chips because of it.
At least for Intel, those that can perceive a gain over a quad core or quad core with hyperthreading go for the higher-end SKUs based on workstation/server chips.

Am I right to assume that a server-oriented APU with a sizeable proportion of GCN CUs would probably downsize on fixed-function parts, like TMUs and ROPs?
TMUs are rather tightly packed into the read path of the CUs, and there are some imaging workloads that do like to use them.

If it's GPGPU oriented, they're probably better off spending those transistors on more CUs and ACEs, right?
Unless, of course, the same chip doubles on becoming a consumer-oriented chip, which hasn't been reported so far.
It might give them better volumes to provide a consumer-level variant. It's basically a high-class console.
I'm still curious why this can't be put on an expansion board, if only for additional volume to amortize costs over.

In some wacky future with HSA, or CPU cores that can steal some of the compute/post process load, or even a return of a high-bandwidth cable (borrowed from an HPC context where they will need them), it might actually be something an AMD platform could promise as a workstation or high-end value-add.

Isn't that what AMD's current range already does?
It is what they do now, so I said I hope they can pursue alternate and more lucrative SKUs.
 
That would be the above-i3 range I was considering. I'm not sure where it would fit versus the highest clocked i7 quads, but if Zen could also get near the lower range of the LGA2011 that would be a big improvement.
 
I don't see why this is an "or" proposition. Intel makes CPUs, and it offers products across a vast range of CPUs, short of simple cores in the microcontroller range. How does AMD ignore that and stay in the CPU business?
I don't mean to play on word what I mean is that they can't "win" this so (to some extend) tehy have to ignore what Intel is doing and pray for the best, Intel could kill them anytime they want/like.
This is contradictory. If they are doing their "own thing" then it is something nobody else is doing. If you don't do what others are doing, they cease to be competitors unless they choose to follow along.
If AMD is making CPUs of any notable complexity, they are going to run into Intel. It's not like customers evaluating architectures will forget Intel exists.
It is contradictory but so is the world. AMD has good GPU they are imo not leveraging it properly in their APU/CPU lines to have a boutique offering that could find its own (even tiny) niches.
High-end cores have demonstrated their ability to scale down. The comparatively modest performance gains in the single-threaded realm means that the upper tier has experienced more multi-core scaling than single-core improvement.
Cutting down from higher core counts to just a few, and from the highest clocks to more modest targets, tends to cover the good-enough realm.
It starts to falter in the most power or cost constrained markets, but AMD's position is either inferior or seriously threatened at the low end with these measures.
Intel has demonstrated that they can have their big cores to scale down, they are pretty much the only ones, Apple cores are both less complex and it is unclear how they would operate at higher frequency neither if they would be as reliable for serious computation operations as INtel. INtel cpu are server grade made available to the average costumers, they are putting everybody out of business in the high end segment, AMD forecasts growth... for Intel on the high end segment... Intel might rejoice.

I'm sorry but it is incredibly difficult to have faith in AMD at this point, they had a good basis, Jaguar, which shipped with its issue (power management) but still was decent CPU for multithreaded applications and had a decent GPU, they failed to iterate on it when it is the best basis they had in years. Now we are to believe that they are to provide (in its first iteration) a competitive high end cores? It sounds to point like the perfect point to sell to mislead investors. The point is AMD is no longer hunting for the niches where they could have some success but on the paper they are hunting a unicorn and in the real world fire people. It may work with some investors and some people that want them to succeed but to me I see no way this will turn out well:
What I hear is: we no longer have the manpower (and will) to iterate on Jaguar, say at the pace ARM does), we are giving up on our current high end (wannabee) offering, and in the GPU realm we have and still are significantly slowing down the pace at which we iterate on our products. We are to have ARM CPU to replace our low power offering within 2 (if not 3 looking at when they stopped working on Jaguar) years. ALl of that is obviously the best case scenario: no delay or problem, no in-house financial turmoil and leaving the global economic how of the picture.
That is only my pov but clearly they are past the point where I used to give them credit (for powerpoint slides) "apriori".
 
Last edited:
Liolio, you call Zen a crazy wide core, but frankly I don't see any basis for that. We don't really know how wide it is, and the advertised +40% IPC doesn't exactly suggest something huge, especially when you consider that Bulldozer's main weaknesses were arguably the cache hierarchy, and the somewhat half-assed CMT scheme. Fixing those issues does not imply a particularly large core, nor does it entail any kind of work that AMD wouldn't have to do for a narrower, more modest micro-architecture.
By crazy wide I mean INtel type of wide, INtel who is putting IBM and its Power server line out of business, not The "Mill " or some VLIW design type of architecture. // 4 Issue with many internal execution ports // crazy complicated //Intel won this for now (I would think they might remain unchallenged (and out of reach) for a good while).
 
It is contradictory but so is the world. AMD has good GPU they are imo not leveraging it properly in their APU/CPU lines to have a boutique offering that could find its own (even tiny) niches.
Can you expand on what additional leveraging they can do? I would say AMD has leveraged the GPU quite a bit, to a fault since it has done more leveraging than improving.

GCN has been modified at times, but not so readily or cheaply that it can race into a lot of tiny niches.
It's not clear at all that AMD's expertise is in a range of solutions that don't require significant volumes. It doesn't do cheap stuff profitably, and its semicustom division has a high lower bound in terms of revenue, which points to either prices or volumes that rule out small targets.
I'm not convinced that GCN is all that great, besides being a nice-ish start years ago.

I'm sorry but it is incredibly difficult to have faith in AMD at this point, they had a good basis, Jaguar, which shipped with its issue (power management) but still was decent CPU for multithreaded applications and had a decent GPU, they failed to iterate on it when it is the best basis they had in years.

I noted this in the Carizzo thread, but David Kanter commented that the Jaguar team is pretty much gone. I knew of at least one team lead that wasn't there any longer, and that others were gone, but the extent seems to have been near-total.

If the team's gone, there's neither the familiarity with the design or its needs, or the spare resources from other projects to reconstitute the effort.

Now we are to believe that they are to provide (in its first iteration) a competitive high end cores?
At this point it is not clear what else they would be good for, and the ballpark figure for competitive is more forgiving than the fact that every other option is uneconomical or not possible with AMD as it exists now.

The point is AMD is no longer hunting for the niches where they could have some success but on the paper they are hunting a unicorn and in the real world fire people.
AMD is too big and has to design in a space too expensive for niches.
I don't necessarily disagree with your projection of their endgame.
 
I wonder what's left of the GPU teams. Some notable people left right about when GCN launched and Rory Read came in.
 
Last edited:
Carrizo is starting to look like a great lesson in how to make lemonade out of a CPU architecture that is a clear lemon, although independent reviews are of course warranted to confirm this.

But there's one thing I find curious:

Slide%2014%20-%20Video%20Playback%20Processing%20Pipe.png


Why would you go through memory at all?
 
Why would you go through memory at all?
Possible reasons: to merge the image with other data on the desktop? To save power, as in race to sleep? Video is a very low BW operation, so you'd burst the scaled video to memory and then sit on it?
I admit that it's not necessarily a convincing case. ;)
 
The display has to wait for refresh to display the decoded frame, so there's going to be some reason to buffer there.

In the old way of doing things, the post process step ('GFX') may also work on fully decoded or at least partially decoded frames, so again you'll need to store that somewhere.

Back on the wider topic of Carrizo, it really does seem that AMD have come a long way over the last four years with only a half node shrink and despite losing their "speed racer" friendly node.

If Bulldozer hadn't been such an ill conceived idea AMD could have been in a much better place. [Edit: stating the obvious?]
 
Possible reasons: to merge the image with other data on the desktop? To save power, as in race to sleep? Video is a very low BW operation, so you'd burst the scaled video to memory and then sit on it?
I admit that it's not necessarily a convincing case. ;)

Right, but wouldn't it make much more sense to have a small, dedicated buffer in the display unit? Then you could decode a bunch of frames (say, 2 seconds of video) send them to the display buffer, and power gate just about everything else until the next 2 seconds.

Edit: never mind, 2 seconds of uncompressed 1080p video with 32 bits per pixel is almost 400MB of data. I guess that's why going through memory is necessary.
 
Like I said, buffering!

Anyway, looking at the AMD provided figures of '819 Gflops', with 512 shaders that's 800 mHz. Which seems high. So I'm guess that's either max turbo or speed for the 35W mode (or both?).

Pity that AMD can't put 1GB of HBM on there. Performance would be great, and might even save power if you could drop to a 64 bit main memory bus.
 
If APUs would get HBM support, I would rather see it as a full replacement for the system memory. HBM can't be used as a traditional cache (no custom logic for that), while the option for a sideband buffer would needlessly complicate the whole design. Probably won't happen until HBM2, due to economy of scale and memory density.
 
Some interesting cache & frontend improvements in Carrizo that may bode well for Zen

http://www.anandtech.com/show/9319/...leap-of-efficiency-and-architecture-updates/4
Doubled L1 Data cache, bumped up to 8 way associative (previously AMD have only been 2 way as far as I know) while reducing latency & improving power efficiency is a pretty big change.
I think thats not low branch fruit either.
The L1 data cache for Bulldozer has been 4-way and 16KB. AMD seems to have doubled the number of arrays so that there is a straight doubling of capacity and associativity.

The L1 instruction cache has been 2-way up until Steamroller replicated one of the arrays and made it 3-way.
I am very unclear as to that article's description of better branch prediction making data cache accesses more efficient.
What latencies were improved is unclear. The lower targeted clock ceiling and smaller L2 might have lead to cycles being shaved off in the L2 or the interface.
The L1 would be operating in a speed range that the pre-Bulldozer cores could run with a 3-cycle latency, but that would be somewhat unusual to change something that is tightly integrated into the memory pipeline for what is meant to be an optimization of an existing line.

Whether this has bearing on Zen is unclear.
 
If APUs would get HBM support, I would rather see it as a full replacement for the system memory. HBM can't be used as a traditional cache (no custom logic for that), while the option for a sideband buffer would needlessly complicate the whole design. Probably won't happen until HBM2, due to economy of scale and memory density.
I know AMD has hinted at something HBM APU in the future, but I don't see HBM happening any time soon for APU in the near future for consumer applications for the same reasons as for lower performance discrete GPU SKUs: I believe we've seen hints in the past the AMD APUs supported GDDR5? And relatively wide DDR3 memory interfaces? But in the field, those never materialize, and we only ever see low performance models. I think the market is just not there: mediocre CPUs with mediocre GPU performance that would be slightly less mediocre with a heavily underused HBM.

Now if AMD hits it really out of the ballpark with Zen (or whatever future CPU) and then links it to a truly outstanding high performance GPU that really needs all the bandwidth, maybe there's a case. But I think we're a technology node or two away from that.

My theory is that AMD is hinting at some HPC CPU/GPU combo where the cost of HBM is no objection...
 
Back
Top