AMD: RDNA 3 Speculation, Rumours and Discussion

Deleted member 13524 · May 24, 2021

Bondrewd said:
Oh you better hope you have enough money to pay for N31

Well if there's a 160CU ~50 TFLOPs part coming then I can't imagine it being cheap.
Especially considering it's coming right after a crypto craze.

Here's hoping the slightly cut-down version with 2×60 or 2×72CUs activated don't break the bank.

dskneo · May 24, 2021

A 160CU RDNA GPU does sound surreal. But we now have two precedents. The ryzen model in the CPU space, and the new pricing paradigm of 2021, which demonstrates that you could make a halo procut for 3000€ and still be able to sell it.

If anything, the most viable argument to justify how such a GPU will not be made, is the memory bandwidth requirement to allow for linear scalling across the board.

Deleted member 13524 · May 24, 2021

dskneo said:
If anything, the most viable argument to justify how such a GPU will not be made, is the memory bandwidth requirement to allow for linear scalling across the board.

RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache. Assuming there's a third I/O chip (large but cheap 12nm GlobalFoundries ASIC?) that takes in most of the non-compute related parts like PCIe, display output, video codecs, GDDR6 PHYs, etc. then at 5nm the main GPU chips could even increase the relative amount of L3.
I wouldn't be surprised if 160CU Navi 31 got away with a 384 or 320bit wide GDDR6 bus, especially if it uses 18Gbps chips.

They could also adopt HBM3, where the lower chip in the stack could be that same I/O chip along with die-to-die communication, which would forego the necessity of an interposer. 2x HBM3 stacks at the predicted 512GB/s each would provide 1TB/s total external bandwidth.

dskneo · May 24, 2021

ToTTenTranz said:
RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache.

Its precisely because of Infinity Cache that you know the memory requirements are very high. I dont see how just doubling the infinity cache size would provide linear scalling in the current RDNA2 gpu, let alone in a 160CU monster.

I believe the 160CU is possible. But i'm more intrigued by the memory solution to feed it.

Deleted member 90741 · May 24, 2021

ToTTenTranz said:
RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache. Assuming there's a third I/O chip (large but cheap 12nm GlobalFoundries ASIC?) that takes in most of the non-compute related parts like PCIe, display output, video codecs, GDDR6 PHYs, etc. then at 5nm the main GPU chips could even increase the relative amount of L3.
I wouldn't be surprised if 160CU Navi 31 got away with a 384 or 320bit wide GDDR6 bus, especially if it uses 18Gbps chips.

They could also adopt HBM3, where the lower chip in the stack could be that same I/O chip along with die-to-die communication, which would forego the necessity of an interposer. 2x HBM3 stacks at the predicted 512GB/s each would provide 1TB/s total external bandwidth.

Latest patent seems to indicate the PHYs to be located on the Chiplet, and the Bridge chiplet is hosting the L3 and acts as interconnect

20210097013 :
ACTIVE BRIDGE CHIPLET WITH INTEGRATED CACHE

Abstract

A chiplet system includes a central processing unit (CPU) communicably coupled to a first GPU chiplet of a GPU chiplet array. The GPU chiplet array includes the first GPU chiplet communicably coupled to the CPU via a bus and a second GPU chiplet communicably coupled to the first GPU chiplet via an active bridge chiplet. The active bridge chiplet is an active silicon die that bridges GPU chiplets and allows partitioning of systems-on-a-chip (SoC) functionality into smaller functional chiplet groupings.

Click to expand...

Also it looks like SoIC so they would need to be codesigned and therefore cannot use GF 12nm for example.
I think they are doing it mainly for yield

Bondrewd · May 24, 2021

ToTTenTranz said:
Well if there's a 160CU ~50 TFLOPs part coming then I can't imagine it being cheap.

Arguably not the hard part here.
Packaging is.
Granted, technically N31 has two parents (MI200 and the shortcake) for trial duties, but still.

ToTTenTranz said:
large but cheap 12nm GlobalFoundries ASIC?)

MCD is N6.
2022 AMD is nearly exclusively a TSMC shop.

ToTTenTranz said:
They could also adopt HBM3

That one is pretty away.

Bondrewd · May 24, 2021

dskneo said:
But i'm more intrigued by the memory solution to feed it.

The ultimate fancy in all things fancy, yes.
Lower end parts like N33 are also real shit tbh, and they're still single die and N6.

madhatter · May 24, 2021

Bondrewd said:
Granted, technically N31 has two parents (MI200 and the shortcake) for trial duties, but still.

What's that? Is it a CPU?

Bondrewd said:
Lower end parts like N33 are also real shit tbh, and they're still single die and N6.

I'm curious about N33, not sure about some of the current rumors. It being N6 is to be expected, but is it really an 80 CU part?

Globalisateur · May 24, 2021

ethernity said:
Latest patent seems to indicate the PHYs to be located on the Chiplet, and the Bridge chiplet is hosting the L3 and acts as interconnect

View attachment 5497

Also it looks like SoIC so they would need to be codesigned and therefore cannot use GF 12nm for example.
I think they are doing it mainly for yield

Looks like RDNA2 GPUs with their infinite cache (L3) was actually a prototype GPU for RDNA3 chiplet + L3 design. Logically those chiplet GPUs could have same functionning as RDNA2 + L3 GPUs.

PSman1700 · May 24, 2021

Kaotik said:
It's relevant to thread when you try to use it as argument (and pretty much as the only argument)

Tensor cores do absolutely nothing for DLSS except for speed. You can run the same calculations without matrix crunchers too. And we don't know if matrix crunchers are even optimal for DLSS, let alone any possible competitors.

AMD hasn't shown anything because it's not ready to be shown.

Acceleration is only about speed, you constantly try to argue it would matter for quality. It doesn't. There isn't any "reconstruction hardware".

And again, we don't know whether AMD will even use ML or not, algorithmic solution is always superior if you can match the quality.

Well, lol. Ray tracing hardware does absolutely nothing for RT, except for speed. You can run RT on pascal hardware (or any hardware) aswell. Its just that, speed is the differerinator.
RDNA3 gpus probably will perform much better in all tasks as opposed to RDNA2 and earlier ones.

dskneo said:
Its precisely because of Infinity Cache that you know the memory requirements are very high. I dont see how just doubling the infinity cache size would provide linear scalling in the current RDNA2 gpu, let alone in a 160CU monster.

I believe the 160CU is possible. But i'm more intrigued by the memory solution to feed it.

Infinity Cache is a really good addition to the RDNA2 gpus, without it would be much less performant.

dskneo · May 24, 2021

PSman1700 said:
Infinity Cache is a really good addition to the RDNA2 gpus, without it would be much less performant.

That is what I said.

Bondrewd · May 24, 2021

madhatter said:
Is it a CPU?

Yes.

madhatter said:
but is it really an 80 CU part?

Seems so.
Granted, 2022 GPU mobility stuff from AMD is currently tbd thus can't answer which segment and how much ass does it kick.

Deleted member 90741 · May 24, 2021

Bondrewd said:
Granted, technically N31 has two parents (MI200 and the shortcake) for trial duties, but still.

Aldebaran should launch along side Trento within a quarter so we will find out.
Both of them have exotic packaging, should provide a good base for RDNA3

Bondrewd · May 24, 2021

ethernity said:
Both of them have exotic packaging

Trento is actually very simple purpose-built stick (Milan + IF3).

Deleted member 13524 · May 24, 2021

madhatter said:
I'm curious about N33, not sure about some of the current rumors. It being N6 is to be expected, but is it really an 80 CU part?

MCD for Navi 31 is N6, that's the Memory Complex Die, the chiplet with L3 that guarantees coherency between the Graphics Complex Dies (the ones with the CUs).
Unless @Bondrewd confirms the GCDs are also N6, I'm assuming those are N5.

I also don't think @Bondrewd suggested N33 is a 80CU part. Considering the power and performance requirements of the previous Navi x3 parts, as well as it being a N6 monolithic chip, I think it's more likely that Navi 33 is closer to 40CUs.

Bondrewd said:
Arguably not the hard part here.
Packaging is.

Are we looking at the MCD being stacked on top / below the GCDs then?
Or is it just using an incredibly dense substrate? Or even an interposer?

Bondrewd · May 24, 2021

ToTTenTranz said:
I'm assuming those are N5.

Compute tiles are that, yes.
N5p, really.

ToTTenTranz said:
Are we looking at the MCD being stacked on top / below the GCDs then?

The latter; now imagine the poor thing going thru thermal cycles with many many watts radiating from thingies above it.

iroboto · May 24, 2021

Bondrewd said:
That one is pretty away.

Hard for me to believe we could feed such a wide GPU without it. You sure HBM3 isn’t in the cards for consumers somewhat within the next 5-7 years? We have to be hitting an inflection point soon.

Bondrewd · May 24, 2021

iroboto said:
Hard for me to believe we could feed such a wide GPU without it.

AMD went from Vega10 to Navi21 perf at nearly the same offchip bandwidth.

iroboto said:
You sure HBM3 isn’t in the cards for consumers somewhat within the next 5-7 years?

Maybe but 2022/2023 are still HBM2e milking.

iroboto · May 24, 2021

Bondrewd said:
AMD went from Vega10 to Navi21 perf at nearly the same offchip bandwidth.

Maybe but 2022/2023 are still HBM2e milking.

Imo; that just signals that another bottleneck was present that failed to allow compute and to saturate the amount of bandwidth available; thus bandwidth was negligible in its performance.

I’m not confident that 160CUs would be able to operate the same way. Increasing L3 cache has its limitations. You will need to hit slow ram eventually. And if all 160CUs want to hit slow ram often, bandwidth needs to be part of that equation

Bondrewd · May 24, 2021

iroboto said:
I’m not confident that 160CUs would be able to operate the same way

You're really not supposed to.
Those things are brand spanking new and no one really did GPUs like that before.

iroboto said:
Increasing L3 cache has its limitations

Honestly, yes, we need another huge LLC GPU in the wild (Ponte Vecchio) to see and compare.

AMD: RDNA 3 Speculation, Rumours and Discussion

Deleted member 13524

Guest

dskneo

Deleted member 13524

Guest

dskneo

Deleted member 90741

Guest

Bondrewd

Bondrewd

madhatter

Globalisateur

Globby

PSman1700

dskneo

Bondrewd

Deleted member 90741

Guest

Bondrewd

Deleted member 13524

Guest

Bondrewd

iroboto

Daft Funk

Bondrewd

iroboto

Daft Funk

Bondrewd

Similar threads