AMD RDNA3 Specifications Discussion Thread

Btw, 6nm for the infinitycache is still a good process. On paper, something like 12nm wouldn't be possible ? For a cheaper product ?
Those dice are so smol they have to be really cheap anyway.
So much so compared to increasing the main die with their area..
 
It actually seems to be working pretty well. Without co-issue you’re looking at 30 tflops which is just 30% more than the 6900xt. In order to hit AMDs 50-70% numbers there must be quite a bit of coissue firing. Or is there some other trickery where multiple wavefronts can execute in the same clock?
It's that old problem of VS, PS and CS bottlenecks - and with AMD talking about "front end" stuff - it's looking like it'll be really hard to rationalise this.

I'm assuming that VS and CS both work in native hardware thread mode, i.e. 32 work items.

We have this unanswered question about whether the hardware can dual-issue from two hardware threads. If not, then it's down to instruction-level parallelism within a single hardware thread. There's no doubt there is some...

There have been many changes in the CU: with things like bigger caches and more vector registers you've gotta wonder whether just those changes back-ported to RDNA 2 would have made a big difference. The occupancy shown here:

170564660-48460e41-ca11-4233-91b3-83593b168e7a.png


from:


seems pretty bad. 1% full occupancy?

I haven't been through the slide deck yet, though...
 
Doesn't leave much room going by the lowerish numbers. Assuming ~1.5x is average ish for 7900XTX.
7900XTX ~1.5x Navi21
7900XT ~1.35x Navi21
7800XTX ~1.2x Navi21
7800XT ~1.05x Navi21
7700XTX <Navi21
Best case would seem to put the highest Navi33 somewhere between 6800 and 6800XT which is a pretty big gap, so likely closer to 6800.

Edit- Navi33 is rumored to be ~300mm2 wasn't it?
From what I saw from N31, I wouldn't be surprised If N33 is just a bit faster than N22.
The biggest miscalculation was the low clockspeed, even Nvidia is clocked higher.
I seriously thought at least 3GHz wouldn't be a problem.
 
Those dice are so smol they have to be really cheap anyway.
So much so compared to increasing the main die with their area..
Looking at die costs and excluding the extra cost of packaging and any extra wafer prep for the interconnect...
MCDs are ~$4 each, <$5. Assuming +95% yield.
GCD are ~$90-$95 each, <$100. Assuming ~90% yield.
If it is on an interposer, add another ~$10.
2nd Edit- Forgot the slide deck said they are using Elevated Fanout Bridge for the interconnect, so supposedly "standard flipchip processing."

~525mm2 on 5nm ~$170-$190 each, <$200. Assuming ~85% yield.
So say ~$130 for Navi31 which would be ~25-30% cheaper than if they went monolithic, based on my rough estimations.

Edit- Also found the apparent ratio of the MCD interesting, looking at the attached slide and assuming it is somewhat accurate.

Using my very, very rough estimates...
Interconnect on the side closest to the GCD appears to be ~10% the area, ~3.7mm2.
64bit MC appears to be ~25% the area, ~9mm2.
The 16MB of 2nd Gen IC appears to be ~66% the area, ~24.4mm2.

The CPU V-cache is 64MB @ 41mm2. So v-cache ~10mm2 per 16MB and 16MB of GPU IC is more than twice the size?
Is the size difference due to the speed necessary for GPU cache or is there extra logic involved with GPU IC?
 

Attachments

  • AMD RDNA 3 Tech Day_Press Deck 20_575px.png
    AMD RDNA 3 Tech Day_Press Deck 20_575px.png
    141 KB · Views: 21
  • AMD ACP Press Deck_23.jpg
    AMD ACP Press Deck_23.jpg
    177.4 KB · Views: 26
Last edited:
"Maxxed out settings" can mean high resolutions at high framerates too. The person purchasing a $900 GPU may not want to play these games at 1080p to get access to the 120fps mode.

Consoles still provide significantly different experiences in many aspects of gaming, the PC as a gaming platform did not only gain worth simply because Nvidia had better performing RT. Like you're saying there's no point to PC gaming unless you go full RT, good luck watching the quality of ports shit the bed even harder then when your install base is so minuscule.

Yeah I respect that if you want ultra high frame rates at high resolutions above all else then RT might be a non-starter on any card. But it seems to me a strange compromise to strive for crazy high framerates and particularly for very high native res (especially where upscaling is available) on a 50-60TF $900+ GPU while at the same time having the core graphics look worse than what is available on a $300 4TF Xbox Series S - which would certainly be the case in some titles without RT.

e-sports players where high frame rates are essential are a clear exception to this of course. But putting aside frame rate, if I care enough about graphics to insist they must be running at 4K native, but then have a series S pushing better core graphics than me outside of resolution, I'm not sure that would sit right. If I'm going to spend that much on a GPU, I want better frame rate, better image quality and better graphics than a $300 or even a $500 console. To me, that's why RT performance is important. Without it, I can potentially only have 2 of those things, and not all 3, which isn't great if I'm spending $900+ on a GPU alone.

And look around at the ASP of cards these days! $900-$1k isn't some exotic price tier anymore, I sure as fuck wish it was the endpoint for the high end, but far from it.

I hope you're wrong but after this launch am starting to suspect you're right. Unfortunately I'm not sure PC gaming can survive this pricing structure if we're going to have to spend near double the price of a console 2 years after its launch on the GPU alone just to get for something that's significantly (as in 2x or more) faster.
 
Unfortunately I'm not sure PC gaming can survive this pricing structure if we're going to have to spend near double the price of a console 2 years after its launch on the GPU alone just to get for something that's significantly (as in 2x or more) faster
Moore's law is dead.
RDNA3 went tiles to keep costs sane.
RDNA4 will be even more tiled to keep costs sane.
After that all we can do is spam more silicon at the problem aka costs go up again and by a lot.
 
Moore's law is dead.
RDNA3 went tiles to keep costs sane.
RDNA4 will be even more tiled to keep costs sane.
After that all we can do is spam more silicon at the problem aka costs go up again and by a lot.
I'd like to see sli/crossfire type solutions come back. It be great if I could buy two cards and just combine them to reach higher resolutions instead of just one card that is drawing a crazy amount of power
 
no, please no that kind of solution again....

No way.
You're getting very hueg MCP configs instead for that.
Unfortunately big money yes.

Why not.

For instance I have a 3080 right now. I would love to buy a used 3080 and get a lot more performance for my games at a fraction of the cost. I can buy another 3080 for under $400 and I've seen some go for $300. Of course scaling would have to be good
 
I'd like to see sli/crossfire type solutions come back. It be great if I could buy two cards and just combine them to reach higher resolutions instead of just one card that is drawing a crazy amount of power

Yes..... I always had a dual GPU set-up back in the day, SLI/Crossfire being killed off is part of the reason we have these massive and expensive GPU's now.

The people who always bought two/three of the top-end GPU for the absolute best performance are now the people who buy GPU's like the RTX4090.

The old ultra high-end, uber GPU's used to have two smaller cores on a single PCB and worked extremely well.

Using the old way of doing it the RTX3080 would have been the largest single GPU and the RTX3090 would have been a pair of RTX3070/3080 cores on a single PCB (That would give more performance than the RTX3090 we ultimately got (scaling dependant) and maybe even cheaper to buy)
 
Last edited:
Edit- Also found the apparent ratio of the MCD interesting, looking at the attached slide and assuming it is somewhat accurate.

Using my very, very rough estimates...
Interconnect on the side closest to the GCD appears to be ~10% the area, ~3.7mm2.
64bit MC appears to be ~25% the area, ~9mm2.
The 16MB of 2nd Gen IC appears to be ~66% the area, ~24.4mm2.
The image in the press deck is pretty blurry, but the cache portion is significantly smaller than that. I brightened it a bit:

amdrdna3techday_press4mece.png


I would estimate around a 44% ratio of the total cache part (so not just the SRAM arrays) of the 37mm² for the MCD making it maybe ~16mm².
2nd Edit- Forgot the slide deck said they are using Elevated Fanout Bridge for the interconnect, so supposedly "standard flipchip processing."
Where did you see that? Is that a conclusion a few tech sites draw or did AMD specifically said this? I can't find it in the slide deck and I can't remember hearing it in the presentation.
EFBs are not the cheapest option here. A very fine pitch organic redistribution layer (TSMC calls this "InFo-R" or "InFo-oS", the difference is not entirely clear to me) could also be used, I guess.
 
Last edited:
The image in the press deck is pretty blurry, but the cache portion is significantly smaller than that. I brightened it a bit:

amdrdna3techday_press4mece.png


I would estimate around a 44% ratio of the total cache part (so not just the SRAM arrays) of the 37mm² for the MCD making it maybe ~16mm².

Where did you see that? Is that a conclusion a few tech sites draw or did AMD specifically said this. I can't find it in the slide deck and I can't remember hearing it in the presentation.
EFBs are not the cheapest option here. A very fine pitch organic redistribution layer (TSMC calls this "InFo-R" or "InFo-oS", the difference is not entirely clear to me) could also be used, I guess.
Thanks. ~16mm2 would make more sense.

I thought I had attached it to my other post but underneath the EFB example image they listed the "advantages".
AMD ACP Press Deck_23.jpg
Edit- And if you meant where the slide come from it was from Anandtech's article.
For the purposes of today’s announcement, AMD has not gone into great depth on how they managed to make a chiplet-based GPU work, but they have confirmed a few important details. First and foremost, in order to offer the die-to-die bandwidth needed have the memory subsystem located off-chip, AMD is using their Elevated Fanout Bridge (EFB) packaging technology, which AMD first used for their Instinct MI200 series accelerators (CDNA2). On those accelerator parts it was used to hook up the monolithic GPUs to each other, as well as HBM2e memory. On RDNA 3, it’s being used to hook up the MCDs to the GCD.

Notably, Elevated Fanout Bridge is a non-organic packaging technology, which is to say it’s complex. That AMD is able to get 5.3TB/second of die-to-die bandwidth via it underscores its utility, but it also means that AMD is undoubtedly paying a good deal more for packaging on Big Navi 3x than they were on Navi 21 (or Ryzen 7000).

Internally, AMD is calling this memory-to-graphics link Infinity Link. Which, as the name implies, is responsible for (transparently) routing AMD’s Infinity Fabric between dies.
https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-december
 
Last edited:
From what I saw from N31, I wouldn't be surprised If N33 is just a bit faster than N22.
It's basically just Navi 23 with an upgraded architecture. Same CUs, similar die size, same memory bus, same memory type, same process family. Which should make it pretty interesting to look at.

Not expecting anything more than 6700XT performance, but they'll be able to offer it at $300-350.
 
How much energy does the chiplet interconnection cost?
No Idea.
Much lower bandwidth connections from Intel, TSMC, AMD range from .5pj/bit to 2pj/bit.
The only thing I'm aware of that is even close is Apple's on the M1 Ultra at 2.5TB/s but no idea on power.
Haven't really looked into the HPC stuff to compare.
 
Last edited:
Did I miss something? Is there somewhere that put out more performance information than yesterday's presentation? I'm seeing a ton of analysis about how the 7900 compares to the 4090 that I just didn't see in that presentation and I'm wondering if some people aren't getting a bit carried away in their interpolations about performance. I don't think we really know yet and the reason they're not pushing much RT info could be they're still working on it in drivers as they're always working on stuff in drivers. :p

Just a quick reality check, please continue speculating.

EDITED BITS: I haven't spoke to anyone at AMD about this launch at all so all my opinions are my own, don't think I was talking to Terry Makedon and I'm trying to spread information. I'm not, this is just me being me.
 
Last edited:
I thought I had attached it to my other post but underneath the EFB example image they listed the "advantages".
View attachment 7442
Edit- And if you meant where the slide come from it was from Anandtech's article.
For the purposes of today’s announcement, AMD has not gone into great depth on how they managed to make a chiplet-based GPU work, but they have confirmed a few important details. First and foremost, in order to offer the die-to-die bandwidth needed have the memory subsystem located off-chip, AMD is using their Elevated Fanout Bridge (EFB) packaging technology, which AMD first used for their Instinct MI200 series accelerators (CDNA2). On those accelerator parts it was used to hook up the monolithic GPUs to each other, as well as HBM2e memory. On RDNA 3, it’s being used to hook up the MCDs to the GCD.

Notably, Elevated Fanout Bridge is a non-organic packaging technology, which is to say it’s complex. That AMD is able to get 5.3TB/second of die-to-die bandwidth via it underscores its utility, but it also means that AMD is undoubtedly paying a good deal more for packaging on Big Navi 3x than they were on Navi 21 (or Ryzen 7000).

Internally, AMD is calling this memory-to-graphics link Infinity Link. Which, as the name implies, is responsible for (transparently) routing AMD’s Infinity Fabric between dies.
https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-december
But this slide is not from the RDNA presentation but the CDNA2 one. I know that some tech sites claim that EFBs are used, but I have not seen AMD saying it (neither in the RDNA3 slides nor the presentation). But the Anandtech article sounds like AMD told them about the EFBs basically as background information. Okay, let's believe them.
 
Ok, from what I'm understanding the design is facing these design problems/limits:
- the dual issue instructions means that if they can't extract the second instruction, it has just a little more shaders than N21, this can be alleviated by drivers and engines tweaks
- the same architecture prevents it from scaling in frequency, apparently it was targeting about 3GHz and 73TF
- the EFB interconnection is adding to power consumption limiting the GCD
- the EFB interconnection is adding to the cost, dunno how much compared to a bigger monolithic die
 
Back
Top