AMD: RDNA 3 Speculation, Rumours and Discussion

Bondrewd · Nov 2, 2022

Frenetic Pony said:
Supposedly there's 24 "compute units" in the high end APU.

No?
Phoenix is 12CU aka 6WGP RDNA3 config.

eastmen said:
Couldn't we just go with more channels?

Board costs and baseline mem capacity goes way up.
Possible but not mainstream.

Flappy Pannus · Nov 2, 2022

eastmen said:
Couldn't we just go with more channels? Intel used to have a tri channel ram set up. What is preventing them from doing so again or a quad channel solution ?

Cost, especially as DDR5 has increased density - DDR5 8gb DIMMs are actually pretty rare compared to 16GB. So you would need ~4 slots of memory (64GB?) to maybe get less bandwidth than a GTX 3050.

eastmen · Nov 2, 2022

Bondrewd said:
No?
Phoenix is 12CU aka 6WGP RDNA3 config.
Board costs and baseline mem capacity goes way up.
Possible but not mainstream.

Flappy Pannus said:
Cost, especially as DDR5 has increased density - DDR5 8gb DIMMs are actually pretty rare compared to 16GB. So you would need ~4 slots of memory (64GB?) to maybe get less bandwidth than a GTX 3050.

Layers for the board and complexity might go up but ram costs seem to always go down.
4x 16 = 64gigs of ram. Right now its as low as $150 for 2x16 gig kits with ddr 5 5600 memory. So you are looking at $300 to fill out the channels which isn't really expensive. If you want to go faster its $210 at for ddr 5 6000 16x2 kits and $330 for 2x16 ddr 6800 .

Seems like a reasonable price on that end of the market

JoeJ · Nov 2, 2022

Flappy Pannus said:
However, as I wrote here, there are a lot of differences and challenges with the open PC market as compared to the closed market of consoles/Apple wrt marketing a UMA. The big problem still remains - memory bandwidth. Infinity cache may provide a glimpse into how this problem can be mitigated for APU's, but there's a big difference for it providing that extra boost for a discrete card that already has 500GB+ to main memory vs 80GB/sec - and that's with fast DDR5. Apple 'solves' it by just creating ridiculously large chips that only they can afford to make, and consoles solve it by using GDDR6 which isn't affordable for the small amounts PC OEM's could order vs say, Sony locking down supply agreements for years due to knowing they will order 10+M chips a year.

So it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.
Thanks for disappointing me. Now i know why i feel alone with my naive APU optimism.

Ok then. So i'll have no other choice than standing proud on Titanic too, hoping the iceberg in front won't be that bad... : /

Newguy · Nov 2, 2022

So about a day to go, where are you all putting your money for claimed perf/efficiency? I'll go for them sandbagging a bit with that 1.5x perf/w and they claim 1.7-1.8x efficiency for N31/5nm parts in raster

JoeJ said:
So it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.
Thanks for disappointing me. Now i know why i feel alone with my naive APU optimism.

Ok then. So i'll have no other choice than standing proud on Titanic too, hoping the iceberg in front won't be that bad... : /

Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work, moving performance targets as time advances like the 6600XT being 1080Ti perf with N33 probably being >=1.4x faster than that etc. Like it's been mentioned If you have a big 32 CU APU you still need to feed it, the 6600XT has 32MB infinity cache + 256GB/s memory bandwidth which is quad channel 8000MT/s or 16000MT/s dual channel. 6900HX is 10.2b transistors, 208mm² with 12 CUs, has 16MB L3 exclusive to the CPU, 2MB L2 for the GPU etc. At least doubling caches making the L3 shared to help with bandwidth, more memory PHYs for bandwidth, massive GPU transistor requirements from 12 CUs to 32 CUs for example, not forgetting future CPU transistor increases like Zen 3 4.15b transistors -> 4 being 6.57b per CCD or 1.58x (assume similar increases here) etc etc. Or you forgo DIMMs and have a very wide memory pool on the board to feed it like consoles/apple and with GDDR7 doubling bandwidth, you can have 512-576GB/s on a 128 bit bus (32gbit/s, 36gbit/s) which is amazing but we aren't there yet and that'd be expensive anyway

For all of that to happen you'd need a ~350mm² APU (using 7nm/6nm and zen 3) that would consume 150-200W and we do actually have them, they're consoles. Unfortunately we can't buy those APUs and it's not worth the cost of making a massive consumer APU when you can have two separate smaller, higher yield, easier to feed, power and cool parts

Bondrewd · Nov 3, 2022

Newguy said:
and they claim 1.7-1.8x efficiency for N31/5nm parts in raster

Good bet.

Newguy said:
Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work

Which is why you tile them.
MI300, Falcon Shores, Arrow-P683, %REDACTED% and %REDACTED% are all examples of different takes on DC/client big APU that are tiled for cost/power/scale-up bling reasons.

troyan said:
A100 has a chiplet design

Split on-die L2 does not make it a chiplet design, it's still one die.
A precursor to one, most definitely.

techuse · Nov 3, 2022

Newguy said:
So about a day to go, where are you all putting your money for claimed perf/efficiency? I'll go for them sandbagging a bit with that 1.5x perf/w and they claim 1.7-1.8x efficiency for N31/5nm parts in raster

Non RT game performance between 4080 and 4090. RT heavy game performance 3090Ti level. 350 watt. So closer to 50% for raster games, closer to 100% for RT.

Leoneazzurro5 · Nov 3, 2022

Navi 31 Die Shot

First Chiplet GPU

www.angstronomics.com

This part is interesting:

"Performance, power and price will be revealed at the launch event. Unfortunately, our A0 performance profiling results were not indicative of production samples, so we have refrained from detailing our findings publicly. Silicon and board cost is definitely lower than NVIDIA’s RTX 4090, so let’s see where the two Navi 31 SKUs will be priced."

fehu · Nov 3, 2022

The MCDs are bigger than what I was imagining

Bondrewd · Nov 3, 2022

Dampf said:
There are leaked slides of the RDNA3 presentation?

Leaked? nooooo people would be lobotomized if they do leak.

Dampf said:
I can't find them.

And you won't, AMD opsec is great.

Jawed · Nov 3, 2022

Quick and dirty die areas derived from the low-resolution Angstronomics picture.

TopSpoiler · Nov 3, 2022

Does lower bandwidth variant use the same 6 MCD package?

Henry swagger · Nov 3, 2022

https://twitter.com/x/status/1587978907139145730

Amd's secret weapon.. lisa hinted it with this quote.. vitual tensor cores

Henry swagger · Nov 3, 2022

https://twitter.com/x/status/1588005796842971136

Maybe this is the next gen infinity cache

Bondrewd · Nov 3, 2022

Read less dumb shit and just wait a few hours already.

pTmdfx · Nov 3, 2022

Henry swagger said:
Maybe this is the next gen infinity cache

Mate, it is a patent on cache replacement policy…

Picao84 · Nov 3, 2022

Henry swagger said:
https://twitter.com/x/status/1587978907139145730
Amd's secret weapon.. lisa hinted it with this quote.. vitual tensor cores

That makes no sense. Plus emulation of fixed function units is nothing new. How do you think games from DX7 era run on current GPUs? T&L fixed functions must have been emulated for quite a while, as GPUs are mostly compute now. They might be using normal cores for AI inferencing though.

Leoneazzurro5 · Nov 3, 2022

TopSpoiler said:
Does lower bandwidth variant use the same 6 MCD package?

Almost surely yes (less logistics involved), reference models will probably use the very same PCB with different number of RAM chips, maybe a different number of VRM, and maybe a less beefed-up cooling.

Bondrewd · Nov 3, 2022

TopSpoiler said:
Does lower bandwidth variant use the same 6 MCD package?

I'm not sure.
The 6th one on the 5 MCD part might be a spacer a-la A100.

Kaotik · Nov 3, 2022

Picao84 said:
That makes no sense. Plus emulation of fixed function units is nothing new. How do you think games from DX7 era run on current GPUs? T&L fixed functions must have been emulated for quite a while, as GPUs are mostly compute now. They might be using normal cores for AI inferencing though.

There's nothing to 'emulate' in tensor/xmx/matrix cores, they're just accelerating normal matrix math. As for precision, RDNA2 goes all the way down to 8:1 INT4 so yeah, you can pick lower precisions for inferencing if you want.

AMD: RDNA 3 Speculation, Rumours and Discussion

Bondrewd

Flappy Pannus

eastmen

JoeJ

Newguy

Bondrewd

techuse

Leoneazzurro5

Navi 31 Die Shot

fehu

Bondrewd

Jawed

TopSpoiler

Henry swagger

Henry swagger

Bondrewd

pTmdfx

Picao84

Leoneazzurro5

Bondrewd

Kaotik

Drunk Member

Similar threads