AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

D

Deleted member 13524

Guest
Since we're already in the quarter where AMD is supposedly on schedule to announce the EPYC CPUs with the new 7nm Zen 2, I thought we'd have a thread to discuss what will/could be the next-gen CCX, next-gen "Zeppelin", next-gen APU, etc.


AMD has been relatively tight-lipped about the next chips, probably because Zen is selling really well and they don't want to Osborne it too soon, but recently a couple of rumours/leaks appeared at ChipHell with impressive Cinebench R15 scores:

https://www.chiphell.com/forum.php?mod=viewthread&tid=1916028&page=1


It shows a 128-core / 256-thread system. AMD is bringing Zen 2 on EPYC first, and AFAIK EPYC has a 2P limit.
Assuming this is a 2P system, we're looking at two 64-core CPUs. Assuming again that these are fully enabled 4-chip (4x Zeppelin 2) EPYC CPUs, it would put each chip with 16 cores.
If true, this gives us two options:

- Each Zeppelin 2 chip has 4x CCX
- Each "CCX 2" now has 8 cores (IMO more likely)

At the same time AMD could be expanding their IC portfolio with different chips for server and consumer markets, though I find that unlikely after all the investment in Infinity Fabric.

8 cores in a single CCX would also bring a massive upgrade in CPU performance to APUs. AMD would probably expand the APUs upwards in their line of consumer chips, with 12-16 core Zeppelin 2 being pushed up in the price range as high-end offerings that compete with the Core i9 Skylake-X.
 
Adding cores to a CCX would definitely help some tasks, but so could replacing that area with added cache. Assuming future apps are more NUMA aware or designed around clusters, I'm not sure more cores in a CCX would be worthwhile.

Some of the work with heterogenous memory systems might be interesting if some chips we're designed with NVRAM or HBM in mind to facilitate different tasks. Along the lines of a 32 core Threadripper, but with added NVRAM or HBM in an Epyc socket. Would provide quite a few interesting combinations that could be tailored towards certain loads along with the possibility of an APU merged in there.

Personally I think adding HBM, if enough can be produced, would be a better option than more cores. Past 128 cores as rumored, a GPU may start making more sense. Lots more bandwidth as a LLC and capacity options with NVRAM on top of it. Should save a good deal of power as well which may be significant going forward. More cores may just get choked by lack of memory bandwidth.
 
Adding cores to a CCX would definitely help some tasks, but so could replacing that area with added cache. Assuming future apps are more NUMA aware or designed around clusters, I'm not sure more cores in a CCX would be worthwhile.

Adding more cores would mean adding a proportional amount of L3 cache, of course. An 8-core CCX would mean 16MB L3 per-CCX, 32MB total for a 2*CCX Zeppelin 2.

I initially thought AMD would either just reduce die-size (while being more aggressive on the price) or increase CCX core amount to 6-cores-per-CCX. These recent rumors with 64-core EPYC CPUs are clearly pointing to each Zeppelin 2 getting 16 cores, though.

Going to the APU side, 8 cores is what Intel is already offering with the just-announced Coffee Lake Refresh. The Core i7-9700K and i9-9900K are 8-core CPUs with a GT2 iGPU. In this case, AMD would simply be fighting on equal-ish terms on the CPU side but getting a much more powerful iGPU to compete with.
They don't even need to upgrade from the Vega 11 in Raven Ridge, since the 128-bit DDR4 bus will still be the bottleneck for higher iGPU performance.
 
Adding more cores would mean adding a proportional amount of L3 cache, of course. An 8-core CCX would mean 16MB L3 per-CCX, 32MB total for a 2*CCX Zeppelin 2.

The L3 is a victim cache, private to each CCX. In a 32 core EPYC, you have 64 MB aggregate L3, but only 8MB for any individual CCX.

They could opt for a solution with less (even none) L3 cache and then add a cache layer at the memory channels.

As for integration. I expect them to leverage TSMCs fan-out packaging expertise (InFo). It offers very high wire densities on organic substrates. That'll allow for a lot more inter-soc Infinity Fabric bandwidth. I wouldn't be surprised if they had 8 SOCs, each with 8 cores, on a single EPYC 2 package.

Cheers
 
Do we know if the Ryzen 2 will be compatible with the current motherboards?

I feel burned by Intel when I jumped on a 6700k with a Z170 chipset and then I was no longer able to upgrade. I need to build another home PC this year, but am hesitant to do so if it means getting stuck again before the next AMD Ryzen ships.
 
Do we know if the Ryzen 2 will be compatible with the current motherboards?

Just to clarify a problem in AMD's branding: Ryzen 2 has already been released, and uses the Zen 1 core (as did Ryzen 1). The chip we're waiting for is the Ryzen 3, which will use the Zen 2 core. Yes, none of this makes a lick of sense.

In any case, both should be compatible with existing AM4 boards. Of course, things like better memory support might require new boards and/or chipsets.
 
They could go for 8 core CCX and the die would still be smaller than a Zen+ 4 core CCX iirc. Will be interesting to see what they opt for.
 
Charlie's saying Rome is an 8-die MCM (well, 8+1). If true, we're still dealing with 8-core dies:
https://www.semiaccurate.com/2018/10/29/more-details-about-amds-rome-cpu-leak/

Those would be pretty tiny dies, though, unless there's a ton of cache.

8 dies using 8-core CCX, plus an extra IO die.
8 dies seems ridiculous to me, unless the IO die works as a very high-speed hub for L3 + RAM coherency in the center of it. Something like this:

mMUbwQv.png


But this way, all cache interconnects would need an extra hop. That IO had better work really fast.

OTOH, a single CCX chip should be tiny at 7nm. If Zeppelin is 213mm^2, these dies should be around 100mm^2.

Plus, if Rome carries an 8-channel memory controller, it's possible that each "Zeppelin 2" has a single DDR4 PHY. That said, a high-end consumer AM4 solution (10-16 cores) could forego the IO chip (direct connection between 2 dies in a formation similar to Epyc embedded) and always use 2 dies in the MCM.
The APU would have to be a whole different beast and a significantly larger chip, though. However, if it has the same 8-core CCX then a single APU chip would be used for a very broad range of performance targets.


I wonder if AMD could make the IO chip at GlobalFoundries using 12nm, at least to guarantee some compliance with the fab agreement.
 
Last edited by a moderator:
Plus, if Rome carries an 8-channel memory controller, it's possible that each "Zeppelin 2" has a single DDR4 PHY. That said, a high-end consumer AM4 solution (10-16 cores) could forego the IO chip (direct connection between 2 dies in a formation similar to Epyc embedded) and always use 2 dies in the MCM.

It is far more likely that each die has a memory channel. I would expect the cache structure to change too, instead of having a massive victim L3 per CCX, we will see less and then a big chunk of cache in front of every memory controller. By intercepting memory transaction at the memory controller, all CCXs benefit from the cache. I expect the infinity fabric PHYs to double in width (to 2x64 lanes 4x serdes) and be decoupled from the DRAM speeds.

Cheers
 
8 dies using 8-core CCX, plus an extra IO die.
8 dies seems ridiculous to me, unless the IO die works as a very high-speed hub for L3 + RAM coherency in the center of it. Something like this:
32 core Threadripper already approximates this to some degree. Removing the memory controllers from the CCX may be an interesting design choice. CCX+cache+IF to memory controllers. That's more or less what half the CCX's are doing in the above mentioned Threadripper. That would leave the IO die to determine memory channels, chiplets, and possibly type of memory. AMD has been doing a lot of work on heterogenous memory models lately. This may be a solution to use the same cores for DDR4 and NVRAM. Even HBM or GDDR along with a possible APU if they desired.
 
32 core Threadripper already approximates this to some degree. Removing the memory controllers from the CCX may be an interesting design choice. CCX+cache+IF to memory controllers. That's more or less what half the CCX's are doing in the above mentioned Threadripper. That would leave the IO die to determine memory channels, chiplets, and possibly type of memory. AMD has been doing a lot of work on heterogenous memory models lately. This may be a solution to use the same cores for DDR4 and NVRAM. Even HBM or GDDR along with a possible APU if they desired.

But not having a memory controller in the CCX die would make any IOX-less implementation impossible.. A 16-core implementation that uses 2 chips would now require 3 chips because they wouldn't work without the IOX.
This would make the new architecture less flexible at a degree.
 
But not having a memory controller in the CCX die would make any IOX-less implementation impossible.. A 16-core implementation that uses 2 chips would now require 3 chips because they wouldn't work without the IOX.
This would make the new architecture less flexible at a degree.
True, but that added expense may be offset by the smaller, more specialized CCX die and ability to use different memory systems or possibly cache arrangements. Deunify the Northbridge, but with a MCM. Non-IOX models may be APUs which already exist separately and avoid the IOX. A dual APU solution could be interesting for video encode/decide. For a 2+ chip solution this shouldn't be an issue.

The two chip Ryzens however would be interesting, but added ability to bin across memory models and chips might be worthwhile. Less likely, but perhaps even use a GPU as the IOX. Getting the memory PHY away from the CCX should shrink the die considerably or allow more cache/cores which may prove worthwhile. Higher latency, but possibly cost effective at scale. 64 cores with two memory channels and no wasted controller space would then be plausible. Or 1 CCX with 8/16 memory channels and fewer idle cores. Either design may be useful for certain processing tasks, albeit a bit odd.
 
I'm not sure on what leaks to believe with regards to the packaging or socket details for Rome, or what details may have changed for the interconnect topology. If there is an IO chip, it may among other things be a value add for a high-end product, perhaps a high-bandwidth or enterprise set of interconnects that might be out of place on a die that needs to scale downmarket.Zen2 or Zen3 awkwardly straddle a transition like the PCIe 4.0 or 5.0 transition, an updated chip could mitigate the impact.

I'm unsure about the idea of the IO chip being surrounded by Zen2 chips.

EPYC's package layout reserved the left and right sides of the MCM for the DDR4 routing, and the up and down direction for the PCIe/xGMI IO. Part of the optimization of the design was saving layers in the substrate, and while layers can allow for overlap in the package, escaping the package and routing on the board seems to constrain what directions IO can go. Additionally, the EPYC presentations seem to show a need for a path around the footprint of the chips. Flanking an IO chip with CPUs and the path for memory seems constraining. Also, of the IO chip was connected to all 8 other chips, that's a fair amount of perimeter lost to non-IO purposes.
Placement in the middle seems impractical if the IO chip were to host a full set of channels or even one DDR4 bus with no clear path to the edge of the package.

A large IO chip might be able to provide connections that are used up by the 8 die interconnect, or could be hemmed in depending on how the dies are distributed. If the chip is in the middle, it might get a swath of the top and bottom edges devoted to its IO, with the other sides taken up by CPUs and DDR channels.

If there is a possibility that Rome or a variant of EPYC2 can go into a socket with more memory channels, perhaps the IO die can sit on a tab further out to the side like the Omnipath tab on the side of a Knights Mill package, giving more space for the memory channels while allowing some room for maintaining EPYC's more generous IO complement.
 
From what I can get from that, if the general arrangement is correct the IO die is flanked on two sides by CPUs. It gets at least the top and bottom as paths to escape, and may get a significant fraction of the left and right sides. A large die with two sides free was one more conventional scenario, although if the DDR is also hosted there it is enhanced further by reducing a significant fraction of the CPU blockage aside from their power/ground footprint. If a change in the packaging allows for a denser interconnect between the chips, this can free up more paths to the outside for the IO chip.
 
Back
Top