AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

Gipsel · Nov 6, 2018

That's how the arrangement looks like:

DieH@rd · Nov 6, 2018

Zen 2 info from today's AMD presentation was interesting. Should be a great upgrade.

On a GPU front, I did not find 7nm Vega to be interesting. They just optimized it a lot for professional use. The wait for Navi continues.

fellix · Nov 6, 2018

Another angle:

cheapchips · Nov 6, 2018

Does the MI50 provide the first real world suggestion of next gen transistor count? 13.2bn on a 331mm² chip.

CarstenS · Nov 6, 2018

And another one. Someone should train an AI the can reduce the margin of error on the die's edges in not-so-optimal fotos....

mrcorbo · Nov 6, 2018

DieH@rd said:
Zen 2 info from today's AMD presentation was interesting. Should be a great upgrade.

On a GPU front, I did not find 7nm Vega to be interesting. They just optimized it a lot for professional use. The wait for Navi continues.

Expanding on this a bit, the "chiplet" architecture for Epyc also caught my attention. Is a next-gen console with standalone CPU and GPU chiplets connected to a shared IO hub chip on a MCM plausible?

Esrever · Nov 6, 2018

No matter how you measure, that io die is HUGE. Looks like the io is about the same size as all 64 cores put together.

I wonder how much cache it contains.

fellix · Nov 6, 2018

Using dodgy Photoshop measuring:

Chiplet -- 71 mm²
I/O Hub -- 442 mm²

Assuming the package is the same as FCLGA-4094 for the current EPYC series (58.5 mm × 75.4 mm).

no-X · Nov 6, 2018

My result is very close, 73 mm² / 434 mm². Despite the huge 14nm central core, core to mm² ratio is still 67 % better compared to original Epyc.

anexanhume · Nov 6, 2018

mrcorbo said:
Expanding on this a bit, the "chiplet" architecture for Epyc also caught my attention. Is a next-gen console with standalone CPU and GPU chiplets connected to a shared IO hub chip on a MCM plausible?

Even the Rome IO die doesn’t have the necessary bandwidth for GDDR6. They have to solve that and prove latency won’t kill performance.

3dilettante · Nov 6, 2018

Comparing the diagram that was shown first to the MCM picture indicates the IO die has 4 DDR channels on the top and bottom edge. The left and right sides have a pair of infinity fabric blocks next to each corner, with the middle third of the IO die's side given over to a non-differentiated IO block.
Elsewhere, it's been stated that Rome has adopted PCIe 4.0.

The CPU chips are grouped in an interesting fashion compared to the prior EPYC layout. In each quadrant of the MCM, there is a pair of CPU chiplets with much less space between them than there is between them and the IO die, and between them and the nearest chiplet pair below them.
The horizontal midline of the package may be wider partly from the path the IO must take from the central die to the left and right sides.

The lack of space between CPU pairs could mean there's a different topology than previously--and perhaps not as uniform as proposed earlier, if there's short-range links between them and the die nearest the IO chip is connected to both fabric links. While that may introduce an incremental amount of additional latency to the other die, this could give each die more overall bandwidth than giving each CPU chip one link to the IO block. If scaling the PCIe 4.0 bandwidth means xGMI is similarly double-bandwidth, raising the bandwidth of the on-package links to match would give each CPU pair enough link bandwith to utilize all 8 channels in some peak demand situation.

Another possibility is the midline gap is broader in part to let each die's links wend their way to the IO die or to other clients as well.

tunafish · Nov 6, 2018

Esrever said:
I wonder how much cache it contains.

Assuming the die size measures above, and that it is as dense as the L3 of Zeppelin (including the sizes of tag arrays), the die composing of only cache would be ~160MB. 128MB seems reasonable, given inefficiencies and the space needed by IO and presumed directory entries (for dual socket coherency).

That's actually quite a bit less than I expected.

3dilettante said:
Another possibility is the midline gap is broader in part to let each die's links wend their way to the IO die or to other clients as well.

Another hypothesis is power. You want the main power and ground lines to go straight through to the chip, and the socket is designed for Zeppelins to be where those chiplets were placed.

anexanhume · Nov 6, 2018

29% IPC gain over Zen 1 in DKERN RSA+.

https://www.amd.com/en/press-releas...ance-datacenter-computing-to-the-next-horizon

And based on this, Zen 2 is around 70-80mm^2. That’s no memory controller, but it does include PCIe. That’s pretty good news for next gen.

Tkumpathenurple · Nov 6, 2018

I found it quite interesting that they're touting performance of 7.4TF FP64, 14.8TF FP32, 29.5 FP16, 59.5 INT8, and 118 INT4.

It puts AMD very close to RTRT hardware by the end of this year.

Am I right in thinking that some BVH accelerating hardware would take up little of the M160's 331mm2 die, but would put it in the same realm as the RTX2070 and 2080?

If so, I'm very much coming round to the idea of RTRT in the next gen consoles. I'd still like a two tier launch, and at this rate, it looks like even a low tier option could be looking at some 8-10TF of RTRT capable hardware.

The 6TF X1X renders Red Dead Redemption 2 at native 4K. Imagine what it would look like with an additional 4TF of RTRT and ML hardware taking care of all of the shadows, upscaling, and AA. Maybe reflections too, if that's not too costly.

:yep2:

Deleted member 13524 · Nov 7, 2018

anexanhume said:
29% IPC gain over Zen 1 in DKERN RSA+.

https://www.amd.com/en/press-releas...ance-datacenter-computing-to-the-next-horizon

And based on this, Zen 2 is around 70-80mm^2. That’s no memory controller, but it does include PCIe. That’s pretty good news for next gen.

Not having a memory controller on the CPU chiplet but keeping the PCIe actually leaves it with some interesting options.
We know Vega 20 uses an out-of-chip Infinity Fabric with 50GB/s full-duplex per link. A future APU could use the same chiplet connected through one or two of these IF links, they'd use the GPU's memory controller (which has HBCC anyways). Most I/O could still be implemented through the chiplet using the PCIe lanes.

Now, I don't think Sony or Microsoft will go back to multi-chip solutions when AMD seems to be able to mix&match CCXs with modular GPUs with relative ease (like the Sudor SoC). But some PC OEMss could now order MCM APUs to put into SFF PC/console hybrids, without having to pay for the development of a custom chip.

Tkumpathenurpahl said:
Am I right in thinking that some BVH accelerating hardware would take up little of the M160's 331mm2 die, but would put it in the same realm as the RTX2070 and 2080?

Yes, though RTRT performance would still be terrible.

beyondtest · Nov 7, 2018

Could someone give me more insight?

I've seen many negative comments in Anand saying the new product is 50% more dense but only 20% better in performance?

DavidGraham · Nov 7, 2018

Tkumpathenurpahl said:
found it quite interesting that they're touting performance of 7.4TF FP64, 14.8TF FP32, 29.5 FP16, 59.5 INT8, and 118 INT4.

It puts AMD very close to RTRT hardware by the end of this year.

TF numbers alone are not enough to extrapolate performance between NVIDIA and AMD. NVIDIA has higher performance than their TF number would suggest. Because they compensate by having better polygon throughput, pixel filtrate, texturing and higher effective memory bandwidth. Also they have more advanced Tile Rendering than AMD.

Deleted member 13524 · Nov 7, 2018

ToTTenTranz said:
I wonder if AMD could make the IO chip at GlobalFoundries using 12nm, at least to guarantee some compliance with the fab agreement.

Called it. Ha!
(Ok so it's not 12nm it's 14nm, bleh..)

Things I want to know but they didn't release:

1 - Where's the L3 cache? One would think it would belong in the chiplets to reduce latency and the new IF links don't seem to be wide enough to keep up with the bandwidth. OTOH the chiplets are tiny (way less than half of a Zeppelin) and that I/O die is huge. There's no way the I/O die has only DDR4 PHYs and IF glue, it has to have some huge cache for coherency (L4?).

2 - Chiplet diagram. Is it still two 4-core CCXs? One 8-core CCX? Argh..

anexanhume · Nov 7, 2018

beyondtest said:
Could someone give me more insight?

I've seen many negative comments in Anand saying the new product is 50% more dense but only 20% better in performance?

1.25x performance for the same power. This is on TSMC, not AMD. Interconnect resistivity likely rearing its ugly head. Rumors were that Apple weren’t pleased either.

ToTTenTranz said:
Not having a memory controller on the CPU chiplet but keeping the PCIe actually leaves it with some interesting options.
We know Vega 20 uses an out-of-chip Infinity Fabric with 50GB/s full-duplex per link. A future APU could use the same chiplet connected through one or two of these IF links, they'd use the GPU's memory controller (which has HBCC anyways). Most I/O could still be implemented through the chiplet using the PCIe lanes.

Now, I don't think Sony or Microsoft will go back to multi-chip solutions when AMD seems to be able to mix&match CCXs with modular GPUs with relative ease (like the Sudor SoC). But some PC OEMss could now order MCM APUs to put into SFF PC/console hybrids, without having to pay for the development of a custom chip.

Yes, though RTRT performance would still be terrible.

Wouldn’t this give atrocious latency?

Tkumpathenurple · Nov 7, 2018

DavidGraham said:
TF numbers alone are not enough to extrapolate performance between NVIDIA and AMD. NVIDIA has higher performance than their TF number would suggest. Because they compensate by having better polygon throughput, pixel filtrate, texturing and higher effective memory bandwidth. Also they have more advanced Tile Rendering than AMD.

I get that Nvidia's generally considered to be better, but AMD's going to provide the tech behind the PS5, for certain, and probably the XBoxTwo.

And if a late 2018, 7nm iteration of Vega can be used for ML, their hardware isn't that far away from being comparable in RTRT aptitude.

So my point isn't that AMD are going to come along with the best RTRT solution, just that it's promising that they're already so close to having one. It bodes well for industry support and inclusion in at least one of the PS5 and XBoxTwo.

AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

Gipsel

DieH@rd

fellix

cheapchips

CarstenS

Moderator

Attachments

mrcorbo

Foo Fighter

Esrever

fellix

no-X

anexanhume

3dilettante

tunafish

anexanhume

Tkumpathenurple

Deleted member 13524

Guest

beyondtest

DavidGraham

Deleted member 13524

Guest

anexanhume

Tkumpathenurple

Similar threads