AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Oh you better hope you have enough money to pay for N31

Well if there's a 160CU ~50 TFLOPs part coming then I can't imagine it being cheap.
Especially considering it's coming right after a crypto craze.

Here's hoping the slightly cut-down version with 2×60 or 2×72CUs activated don't break the bank.
 
A 160CU RDNA GPU does sound surreal. But we now have two precedents. The ryzen model in the CPU space, and the new pricing paradigm of 2021, which demonstrates that you could make a halo procut for 3000€ and still be able to sell it.

If anything, the most viable argument to justify how such a GPU will not be made, is the memory bandwidth requirement to allow for linear scalling across the board.
 
If anything, the most viable argument to justify how such a GPU will not be made, is the memory bandwidth requirement to allow for linear scalling across the board.

RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache. Assuming there's a third I/O chip (large but cheap 12nm GlobalFoundries ASIC?) that takes in most of the non-compute related parts like PCIe, display output, video codecs, GDDR6 PHYs, etc. then at 5nm the main GPU chips could even increase the relative amount of L3.
I wouldn't be surprised if 160CU Navi 31 got away with a 384 or 320bit wide GDDR6 bus, especially if it uses 18Gbps chips.


They could also adopt HBM3, where the lower chip in the stack could be that same I/O chip along with die-to-die communication, which would forego the necessity of an interposer. 2x HBM3 stacks at the predicted 512GB/s each would provide 1TB/s total external bandwidth.
 
Last edited by a moderator:
RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache.

Its precisely because of Infinity Cache that you know the memory requirements are very high. I dont see how just doubling the infinity cache size would provide linear scalling in the current RDNA2 gpu, let alone in a 160CU monster.

I believe the 160CU is possible. But i'm more intrigued by the memory solution to feed it.
 
RDNA2's memory bandwidth requirements are relatively low because of Infinity Cache. Assuming there's a third I/O chip (large but cheap 12nm GlobalFoundries ASIC?) that takes in most of the non-compute related parts like PCIe, display output, video codecs, GDDR6 PHYs, etc. then at 5nm the main GPU chips could even increase the relative amount of L3.
I wouldn't be surprised if 160CU Navi 31 got away with a 384 or 320bit wide GDDR6 bus, especially if it uses 18Gbps chips.


They could also adopt HBM3, where the lower chip in the stack could be that same I/O chip along with die-to-die communication, which would forego the necessity of an interposer. 2x HBM3 stacks at the predicted 512GB/s each would provide 1TB/s total external bandwidth.

Latest patent seems to indicate the PHYs to be located on the Chiplet, and the Bridge chiplet is hosting the L3 and acts as interconnect
20210097013 :
ACTIVE BRIDGE CHIPLET WITH INTEGRATED CACHE

Abstract

A chiplet system includes a central processing unit (CPU) communicably coupled to a first GPU chiplet of a GPU chiplet array. The GPU chiplet array includes the first GPU chiplet communicably coupled to the CPU via a bus and a second GPU chiplet communicably coupled to the first GPU chiplet via an active bridge chiplet. The active bridge chiplet is an active silicon die that bridges GPU chiplets and allows partitioning of systems-on-a-chip (SoC) functionality into smaller functional chiplet groupings.



upload_2021-5-24_13-14-56.png

Also it looks like SoIC so they would need to be codesigned and therefore cannot use GF 12nm for example.
I think they are doing it mainly for yield
 
Latest patent seems to indicate the PHYs to be located on the Chiplet, and the Bridge chiplet is hosting the L3 and acts as interconnect




View attachment 5497

Also it looks like SoIC so they would need to be codesigned and therefore cannot use GF 12nm for example.
I think they are doing it mainly for yield
Looks like RDNA2 GPUs with their infinite cache (L3) was actually a prototype GPU for RDNA3 chiplet + L3 design. Logically those chiplet GPUs could have same functionning as RDNA2 + L3 GPUs.
 
It's relevant to thread when you try to use it as argument (and pretty much as the only argument)

Tensor cores do absolutely nothing for DLSS except for speed. You can run the same calculations without matrix crunchers too. And we don't know if matrix crunchers are even optimal for DLSS, let alone any possible competitors.

AMD hasn't shown anything because it's not ready to be shown.

Acceleration is only about speed, you constantly try to argue it would matter for quality. It doesn't. There isn't any "reconstruction hardware".

And again, we don't know whether AMD will even use ML or not, algorithmic solution is always superior if you can match the quality.

Well, lol. Ray tracing hardware does absolutely nothing for RT, except for speed. You can run RT on pascal hardware (or any hardware) aswell. Its just that, speed is the differerinator.
RDNA3 gpus probably will perform much better in all tasks as opposed to RDNA2 and earlier ones.

Its precisely because of Infinity Cache that you know the memory requirements are very high. I dont see how just doubling the infinity cache size would provide linear scalling in the current RDNA2 gpu, let alone in a 160CU monster.

I believe the 160CU is possible. But i'm more intrigued by the memory solution to feed it.

Infinity Cache is a really good addition to the RDNA2 gpus, without it would be much less performant.
 
Granted, technically N31 has two parents (MI200 and the shortcake) for trial duties, but still.
Aldebaran should launch along side Trento within a quarter so we will find out.
Both of them have exotic packaging, should provide a good base for RDNA3
 
I'm curious about N33, not sure about some of the current rumors. It being N6 is to be expected, but is it really an 80 CU part?
MCD for Navi 31 is N6, that's the Memory Complex Die, the chiplet with L3 that guarantees coherency between the Graphics Complex Dies (the ones with the CUs).
Unless @Bondrewd confirms the GCDs are also N6, I'm assuming those are N5.

I also don't think @Bondrewd suggested N33 is a 80CU part. Considering the power and performance requirements of the previous Navi x3 parts, as well as it being a N6 monolithic chip, I think it's more likely that Navi 33 is closer to 40CUs.

Arguably not the hard part here.
Packaging is.
Are we looking at the MCD being stacked on top / below the GCDs then?
Or is it just using an incredibly dense substrate? Or even an interposer?
 
That one is pretty away.
Hard for me to believe we could feed such a wide GPU without it. You sure HBM3 isn’t in the cards for consumers somewhat within the next 5-7 years? We have to be hitting an inflection point soon.
 
AMD went from Vega10 to Navi21 perf at nearly the same offchip bandwidth.

Maybe but 2022/2023 are still HBM2e milking.
Imo; that just signals that another bottleneck was present that failed to allow compute and to saturate the amount of bandwidth available; thus bandwidth was negligible in its performance.

I’m not confident that 160CUs would be able to operate the same way. Increasing L3 cache has its limitations. You will need to hit slow ram eventually. And if all 160CUs want to hit slow ram often, bandwidth needs to be part of that equation
 
Status
Not open for further replies.
Back
Top