AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

Epyc Rome zen 2 with 64 cores in half a year, intel and amd seem to be going for core count.

Zen 3 in 2020 and zen 4 could see the light 2021.
 
Last edited:
It's their answer to a deep learning yet flexible performance GPU. Not to be confused with Ray Tracing.

Fair point, but I'm not. I'm saying that, of the two features that seem necessary for an RTX approach to ray tracing - ML capability and BVH accelerators - AMD are shipping a product, by the end of this year, that contains one of those.

Honestly, I don't quite get why my posts were moved here, since I was talking about the way AMD's M160 announcement has a bearing on their future GPU designs, and therefore, the likelihood of RTRT in the next generation.

@iroboto care to take this to the next generation speculation thread?
 
Last edited:
So what does this imply for consumer level chips?
1 chiplet & a smaller IO chip? 2 chiplets & same IO chip? APUs only?

I think AMD could, and arguably should do both:

Mainstream platform:
- 1 chiplet + IO die: 8 cores
- 2 chiplets + IO die: 16 cores

ThreadRipper platform:
- 4 chiplets + IO die: 32 cores
- 8 chiplets + IO die: 64 cores (optional version, depending on the competitive situation)

The question is how many different IO die designs they would need.
 
Called it. Ha!
(Ok so it's not 12nm it's 14nm, bleh..)

Things I want to know but they didn't release:

1 - Where's the L3 cache? One would think it would belong in the chiplets to reduce latency and the new IF links don't seem to be wide enough to keep up with the bandwidth. OTOH the chiplets are tiny (way less than half of a Zeppelin) and that IO die is huge. There's no way the IO die has only DDR4 PHYs and IF glue, it has to have some huge cache for coherency (L4?).

2 - Chiplet diagram. Is it still two 4-core CCXs? One 8-core CCX? Argh..

The cache situation is the part I'm worried/puzzled about. While 14nm probably makes a lot of sense for IO because it's cheap and IO doesn't scale well anyway, SRAM is a different story; in fact, it's probably the part that scales best. So why use 14nm instead of 12nm, for example? Would a 7nm chiplet full of SRAM tightly coupled to a 14nm IO die make any sense? Would it just be too expensive at this point?

How would FD-SOI processes fare for IO?

So many questions.
 
Last edited:
The question is how many different IO die designs they would need.

For server/desktop, three:
1. Eight IF ports, 8 DRAM channels, 128 PCIe 4 lanes (EPYC)
2. Four IF ports, 4 DRAM channels, 64 PCIe 4 lanes (Threadripper)
3. Two IF ports, 2 DRAM channels, 32 PCIe 4 lanes. Optional GPU (or extra CPU) chiplet hanging off the second IF port. (Ryzen)

For mobile, one:
1. Two IF ports, 2 DRAM channels, 8-16 PCIe 3/4 lanes, one or two GDDR6 x32 channels for on-package GDDR6 (120GB/s bandwidth using 15Gbit/s GDDR6).

Cheers
 
Last edited:
How large a directory cache in the I/O hub must be to index all L3 + L2 cachelines (or tags) -- couple of megabytes?
 
How would FD-SOI processes fare for IO?

Really well. They actually do much better at IO than finfets, their deficits are on the logic side. I commented elsewhere that since GloFo is dabbling in SRAM-like MRAM, which would have a density ~4-5 times better than SRAM on the same process, an IO die made using that for cache would be a really neat fit for the technology. It would, of course, also depend on multiple different unproven tech, so not likely as something you risk your main new product introduction on.

However, one of the gains from disaggregating the IO die from the cores is that you can iterate on them separately. You first make the IO die with the most proven process that you have the most experience on, just to make sure you don't flub a major product launch because failures there. Then, you can take risks on introducing new unproven tech as a point upgrade to the server chip line. If it fails, the hit isn't nearly as bad. And if it succeeds, you can in turn introduce new chiplets with the now tested IO chip.

Another important thing is that DRAM standard transitions just became much easier for AMD. They can keep selling upgraded CPUs for SP3 as long as there is any market for it, as after the DDR5 transition they can still mate a new Zen chiplet with the old IO die. Most of the cost of a server is in ram nowadays, and if they can sell their customers better utilization of that sunk cost, there might be a lot of interest.
 
Since amd will need a new i/o die for ryzen 3000 with 2 memory channels, wouldn't it be economically wiser to include a gpu in it (making all ryzen 3000, apus) using the same 7nm chiplets, instead of designing both a new i/o die and and a new apu chiplet (either on 7nm or 12/14nm) ?
>65W desktop APUs using 1 or 2 Zen2 chiplets plus a separate iGPU+IO die could make a lot of sense if AMD is planning on using the same chiplets for consumer products.
However for mobile solutions, integration = higher power efficiency, and AMD should want the iGPU to take advantage of 7nm too. Especially now that Intel will be betting more on their GPU line and has the will + resources to pump out different monolithic chips with different CPU cores / iGPU variations.


Wouldn’t this give atrocious latency?
Between the CPU and System RAM? Probably yes, for certain CPU-intensive tasks.
But does that make a lot of difference in office applications, games, GPGPU or professional rendering that leverages the GPU for raytracing? Maybe not..

It wasn't all that long ago that CPUs had the memory controller on the northbridge. On consoles, the 1st-gen X360 had the memory controller embedded in the GPU, and so did the original Xbox with the Pentium III accessing the RAM through NV2A.


While 14nm probably makes a lot of sense for IO because it's cheap and IO doesn't scale well anyway, SRAM is a different story; in fact, it's probably the part that scales best.
Then if there's no SRAM, why is that thing so huge?
It's not like it's using 45nm or even 28nm. It's using 14nm.

Maybe there's a whole bunch of eDRAM working as L4 in there, akin to Crystalwell but for inter-chiplet coherency?
 
Then if there's no SRAM, why is that thing so huge?
It's not like it's using 45nm or even 28nm. It's using 14nm.

Maybe there's a whole bunch of eDRAM working as L4 in there, akin to Crystalwell but for inter-chiplet coherency?

I do think there's a lot of SRAM in there, it's just that 14nm doesn't seem like a great choice for that. Perhaps 7nm was just deemed too expensive, but I find it rather odd. I guess that leaves one obvious way for Milan to improve upon Rome.
 
I don't know why people expect that consumer Ryzen will use chiplets.

There is no reason to believe that consumer ryzen 3000 series will look anything like this. If I had to bet, I would expect it to be a traditional CPU because when you aren't dealing with this many cores, you lose all the advantage of splitting the die into these chiplets.
 
I don't know why people expect that consumer Ryzen will use chiplets.

There is no reason to believe that consumer ryzen 3000 series will look anything like this. If I had to bet, I would expect it to be a traditional CPU because when you aren't dealing with this many cores, you lose all the advantage of splitting the die into these chiplets.

My expectations exactly
 
>65W desktop APUs using 1 or 2 Zen2 chiplets plus a separate iGPU+IO die could make a lot of sense if AMD is planning on using the same chiplets for consumer products.
However for mobile solutions, integration = higher power efficiency, and AMD should want the iGPU to take advantage of 7nm too. Especially now that Intel will be betting more on their GPU line and has the will + resources to pump out different monolithic chips with different CPU cores / iGPU variations.



Between the CPU and System RAM? Probably yes, for certain CPU-intensive tasks.
But does that make a lot of difference in office applications, games, GPGPU or professional rendering that leverages the GPU for raytracing? Maybe not..

It wasn't all that long ago that CPUs had the memory controller on the northbridge. On consoles, the 1st-gen X360 had the memory controller embedded in the GPU, and so did the original Xbox with the Pentium III accessing the RAM through NV2A.



Then if there's no SRAM, why is that thing so huge?
It's not like it's using 45nm or even 28nm. It's using 14nm.

Maybe there's a whole bunch of eDRAM working as L4 in there, akin to Crystalwell but for inter-chiplet coherency?
I previously found an AMD patent for prioritizing CPU RAM requests over GPU in a HSA environment, but the whole prospect makes me nervous.
 
For server/desktop, three:
1. Eight IF ports, 8 DRAM channels, 128 PCIe 4 lanes (EPYC)
2. Four IF ports, 4 DRAM channels, 64 PCIe 4 lanes (Threadripper)
3. Two IF ports, 2 DRAM channels, 32 PCIe 4 lanes. Optional GPU (or extra CPU) chiplet hanging off the second IF port. (Ryzen)
Threadripper is an extreme niche product, the only reason it exists is probably because it can reuse all the ordinary chips. So I highly doubt there'd be a special IO die for it - sure using such a large io die may seem wasteful, but not prohibitively so (plus the defective ones can be sold there). There's imho no way the expected number of parts sold would warrant an extra die (just look at how amd avoided extra dies for zen1 for things shipping in thousand times higher volume...).

As for ryzen, it's not obvious AMD is going to do the same approach (but if yes it definitely needs another io die), or going with a monolithic die. For the APU, I'd strongly suspect it's going to be monolithic.
 
Next year, about the same time next consoles release?

Hopefully RTRT capable. Even if it's worse than Nvidia's implementation, at least it'll be in the hands of console developers for a generation and therefore see widespread adoption.

In the interest of not cluttering up this thread, I've made quite a lengthy post in the next generation speculation thread, as has iroboto before me. You may wish to join us there.
 
I don't know why people expect that consumer Ryzen will use chiplets.

There is no reason to believe that consumer ryzen 3000 series will look anything like this.

You mean apart from the fact that with Zen1 AMD used the exact same Zeppelin die for Ryzen, Threadripper and Epyc, and they never stopped bragging about how that level of scalability allowed them to compete on several fronts using a single chip?

Yup, apart from that, no reason at all...
 
EPYC Naples and EPYC Rome side-by-side:

o4d30iW.png
 
You mean apart from the fact that with Zen1 AMD used the exact same Zeppelin die for Ryzen, Threadripper and Epyc, and they never stopped bragging about how that level of scalability allowed them to compete on several fronts using a single chip?

Yup, apart from that, no reason at all...

Well they could keep ryzen as it is with all in one die. Or they could do what you say, use chiplets for the new Ryzen as well.
Both choices imply relying on something proven : one, a high level arch used succesfully in the past, the other on a design aproach which will have been cemented with Rome by then.

It's not immediately obvious that AMD will choose one over the other, pending further disclosures. Except to you. You seem heavily biased on that choice for some reason.
 
You mean apart from the fact that with Zen1 AMD used the exact same Zeppelin die for Ryzen, Threadripper and Epyc, and they never stopped bragging about how that level of scalability allowed them to compete on several fronts using a single chip?

Yup, apart from that, no reason at all...
That worked for zen 1, doesn't mean it makes sense for zen 2. What zen 2 really does well is the scale from 32 to 64 cores. Even on zen1, the APU used a separate die, no reason to expect them to only use 1 die for every platform. There was a reason why all that is part of the IO die got integrated into the CPU over the past 30 years, to go backward to that is just going to make things cost more and perform less.

Threadripper 3 will probably be the same configuration as Epyc 2. I have my doubts Ryzen 3 desktop will do this.
 
Back
Top