With how AMD isn't discussing anything about the IO die, I think some secret sauce is likely to be in it and eDRAM could be a possibility.This agrees with something I saw on anandtech and twitter: The newest normal process from GloFo is the 12nm (which is pretty much a marketing name...), and AMD already sells products made with it. Why would they call the IO die 14nm? Because GloFo also has the 14nm HP, which can use eDRAM. According to the published density of the 14nm eDRAM cell, a 512MB cache would fit on a 440mm^2 die, if only barely.
Is this GF's ST-RAM product? The last I saw, its marketed write endurance was on the order of 10^8 cycles, which is ten or so orders of magnitude below what some may argue is "unlimited".Really well. They actually do much better at IO than finfets, their deficits are on the logic side. I commented elsewhere that since GloFo is dabbling in SRAM-like MRAM, which would have a density ~4-5 times better than SRAM on the same process, an IO die made using that for cache would be a really neat fit for the technology. It would, of course, also depend on multiple different unproven tech, so not likely as something you risk your main new product introduction on.
This sounds like IBM's 14nm process, which GF acquired. I'm not sure that's been offered to anyone else, and IBM's process is a complex many-layer SOI node.This agrees with something I saw on anandtech and twitter: The newest normal process from GloFo is the 12nm (which is pretty much a marketing name...), and AMD already sells products made with it. Why would they call the IO die 14nm? Because GloFo also has the 14nm HP, which can use eDRAM. According to the published density of the 14nm eDRAM cell, a 512MB cache would fit on a 440mm^2 die, if only barely.
A ~213mm2 Zeppelin die has ~88mm2 bound up in CCX area, with the rest being "other" including DDR, PCIe, fabric links, and other IO.Then if there's no SRAM, why is that thing so huge?
It's not like it's using 45nm or even 28nm. It's using 14nm.
I doubt this very much, I/O doesn't scale and new processes are significantly more expensive per mm*2.There was a reason why all that is part of the IO die got integrated into the CPU over the past 30 years, to go backward to that is just going to make things cost more and perform less.
Nice.EPYC Naples and EPYC Rome side-by-side:
A scaled down I/O hub for consumer SKUs is possible, but the question remains how well that MCM approach will fit into the razor-thin margins of the Ryzen series, compared to a monolithic design.On the other hand yes you get a smaller 7nm chiplet & being made up mostly of copy/paste blocks I guess its probably relatively cheap to develop several sizes of IO chip.
Just occurs to me: could the big one be designed with room to be able to be chopped into half/quarters to create the smaller versions?
That way one tapeout could produce 3 sizes of IO chip, even off each wafer, that'd satisfy my desire for efficiency/simplicity.
Would make sense, as ideally there is a L4/LLC used with NVDRAM to lower power draw and increase performance. I still think a single stack of HBM may have been a superior solution for capacity with MCM, but AMD did seem fond of their 14nm custom SRAM cell design for Zen. This may explain why.With how AMD isn't discussing anything about the IO die, I think some secret sauce is likely to be in it and eDRAM could be a possibility.
They still have APUs that could bypass the IO chiplet design. However, they really need more bandwidth in that segment anyways. With the heterogenous memory support AMD has been working on, there is likely another memory controllers in the mix somewhere. Even a single stack of HBM connected like a CCX chiplet over IF could be huge.A scaled down I/O hub for consumer SKUs is possible, but the question remains how well that MCM approach will fit into the razor-thin margins of the Ryzen series, compared to a monolithic design.
If the chiplet yields are good and EPYC/TR sales are consistent enough to capture a significant market share, the production volume could make it profitable to reuse the same architecture for the consumer segment.
I don't know why people expect that consumer Ryzen will use chiplets.
There is no reason to believe that consumer ryzen 3000 series will look anything like this. If I had to bet, I would expect it to be a traditional CPU because when you aren't dealing with this many cores, you lose all the advantage of splitting the die into these chiplets.
And why do you assume it doesn't make sense?That worked for zen 1, doesn't mean it makes sense for zen 2.
That sounds a bit unfair to everything they presented to improve IPC. AVX2 performance is supposedly doubled.What zen 2 really does well is the scale from 32 to 64 cores.
It's a rather safe bet to assume they don't have to repeat that same logic 4 times over, though..Naively plunking down the non-CCX area of 4 Zeppelin die would cover most of the IO die.
So far there's one small die being made on an expensive high-end process and one large die on a cheap low-end process.To now make multiple dies on different processes with a bunch of them incompatible with other market segments is throwing away that incredible efficiency.
The 256MB, if true, is likely to be an aggregate cache memory number. I could imagine each core having a 1MB L2 cache and 1MB L3 victim cache slice for at total of 16MB cache on each chiplet and 128 MB cache in the IO chip, likely sliced as 16MB per DDR4 PHY
Cheers
There's 4 IF links on the die, which AMD cites as allowing them to fully connect all dies in a package with a minimal number of package layers, despite the chips' orientations changing to keep the DDR PHY facing the outside of the package.Are there any specifics on infinity fabric link bandwidth ?
Ryzen dies had 3 IF links each 2x32 bits wide (running at 4x DRAM command speed). Given the topology of EPYC 2 each chiplet only needs one IF link, so I'd expect it to be at least twice as wide as a single link. I'd also expect the operating frequency to be decoupled from DRAM command rate (because that was never really a good idea). Ie. a 2x64 lane link running at 2GHZ (8GT/s rate) it would have 64GB/s bandwidth in each direction.
Cheers
Was this stated by AMD? I saw some people drawing an inference that this was so, but the center of the IO die's right and left sides is taken up by IO blocks whose length is somewhat shorter than the DDR interfaces on the top and bottom. Perhaps that diagram is not representative, but if it is close to reality I'm not sure what other IO needs that much perimeter.Keep in mind that each 8-core chiplet carries 16 PCIe 4.0 lanes, which an I/O chip could easily convert into different combinations of up to 32 PCIe 3.0 lanes. I'd say if there were absolutely no plans of using the chiplets into anything other than EPYC, why would the chiplets have PCIe in there at all? It would always be cheaper to save die area on the chips being made on the more expensive process.
That depends on what could be considered optional while maintaining or improving upon the capabilities offered by Naples. There appear to be server/workstation products that have more SATA or USB links coming out of the socket than one Zen chip can provide, but potentially can be handled by the hardware equivalent of two southbridges.It's a rather safe bet to assume they don't have to repeat that same logic 4 times over, though..
The 256MB, if true, is likely to be an aggregate cache memory number. I could imagine each core having a 1MB L2 cache and 1MB L3 victim cache slice for at total of 16MB cache on each chiplet and 128 MB cache in the IO chip, likely sliced as 16MB per DDR4 PHY
I did a quick sketch-up of all phy interfaces presumed for the I/O hub, sized from the 14nm Zeppelin layout and factoring in all the interfacing logic and pad stacking, not much space is left for a large eDRAM array.
I'm not quite convinced of this. In particular assuming there's L4, the benefits of having such large L3 may not be all that much, and I'd expect it to not increase (per core). 16MB extra L3 might not need THAT much area on 7nm, but it's still probably ~10mm^2.8 Zen1 cores already have combined 16MB of L3. The shrink from GF 14nm -> TSMC 7nm shrinks SRAM arrays much more than it shrinks logic. I would be extremely surprised if the chiplet dies do not have 32MB of L3 per chiplet. Those put together are 256MB. Of course, a single core can only make use of 32MB max.
Seems rather unlikely to me. You'd need different routing on the package depending on how many chiplets are connected. Seems much simpler to me if you handle all the pcie from the io die as well - saving precious area on the 7nm chiplets.The rest of the PCI-E which provides the off-socket IF and connection to devices doesn't come from the IO die, but from the chiplets. That is, each chiplet has a total of 32 lanes of PCI-E. This nicely means that adding or removing chiplets doesn't change the amount of PCI-E lanes available in the system -- if you only fit 4 chiplets, you get a total of 64 lanes from the chiplets and 64 free from the IO die.