AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Couldn't we just go with more channels? Intel used to have a tri channel ram set up. What is preventing them from doing so again or a quad channel solution ?

Cost, especially as DDR5 has increased density - DDR5 8gb DIMMs are actually pretty rare compared to 16GB. So you would need ~4 slots of memory (64GB?) to maybe get less bandwidth than a GTX 3050.
 
No?
Phoenix is 12CU aka 6WGP RDNA3 config.
Board costs and baseline mem capacity goes way up.
Possible but not mainstream.

Cost, especially as DDR5 has increased density - DDR5 8gb DIMMs are actually pretty rare compared to 16GB. So you would need ~4 slots of memory (64GB?) to maybe get less bandwidth than a GTX 3050.

Layers for the board and complexity might go up but ram costs seem to always go down.
4x 16 = 64gigs of ram. Right now its as low as $150 for 2x16 gig kits with ddr 5 5600 memory. So you are looking at $300 to fill out the channels which isn't really expensive. If you want to go faster its $210 at for ddr 5 6000 16x2 kits and $330 for 2x16 ddr 6800 .

Seems like a reasonable price on that end of the market
 
However, as I wrote here, there are a lot of differences and challenges with the open PC market as compared to the closed market of consoles/Apple wrt marketing a UMA. The big problem still remains - memory bandwidth. Infinity cache may provide a glimpse into how this problem can be mitigated for APU's, but there's a big difference for it providing that extra boost for a discrete card that already has 500GB+ to main memory vs 80GB/sec - and that's with fast DDR5. Apple 'solves' it by just creating ridiculously large chips that only they can afford to make, and consoles solve it by using GDDR6 which isn't affordable for the small amounts PC OEM's could order vs say, Sony locking down supply agreements for years due to knowing they will order 10+M chips a year.
So it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.
Thanks for disappointing me. Now i know why i feel alone with my naive APU optimism. :cry:

Ok then. So i'll have no other choice than standing proud on Titanic too, hoping the iceberg in front won't be that bad... : /
 
So about a day to go, where are you all putting your money for claimed perf/efficiency? I'll go for them sandbagging a bit with that 1.5x perf/w and they claim 1.7-1.8x efficiency for N31/5nm parts in raster

So it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.
Thanks for disappointing me. Now i know why i feel alone with my naive APU optimism. :cry:

Ok then. So i'll have no other choice than standing proud on Titanic too, hoping the iceberg in front won't be that bad... : /
Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work, moving performance targets as time advances like the 6600XT being 1080Ti perf with N33 probably being >=1.4x faster than that etc. Like it's been mentioned If you have a big 32 CU APU you still need to feed it, the 6600XT has 32MB infinity cache + 256GB/s memory bandwidth which is quad channel 8000MT/s or 16000MT/s dual channel. 6900HX is 10.2b transistors, 208mm² with 12 CUs, has 16MB L3 exclusive to the CPU, 2MB L2 for the GPU etc. At least doubling caches making the L3 shared to help with bandwidth, more memory PHYs for bandwidth, massive GPU transistor requirements from 12 CUs to 32 CUs for example, not forgetting future CPU transistor increases like Zen 3 4.15b transistors -> 4 being 6.57b per CCD or 1.58x (assume similar increases here) etc etc. Or you forgo DIMMs and have a very wide memory pool on the board to feed it like consoles/apple and with GDDR7 doubling bandwidth, you can have 512-576GB/s on a 128 bit bus (32gbit/s, 36gbit/s) which is amazing but we aren't there yet and that'd be expensive anyway

For all of that to happen you'd need a ~350mm² APU (using 7nm/6nm and zen 3) that would consume 150-200W and we do actually have them, they're consoles. Unfortunately we can't buy those APUs and it's not worth the cost of making a massive consumer APU when you can have two separate smaller, higher yield, easier to feed, power and cool parts
 
and they claim 1.7-1.8x efficiency for N31/5nm parts in raster
Good bet.
Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work
Which is why you tile them.
MI300, Falcon Shores, Arrow-P683, %REDACTED% and %REDACTED% are all examples of different takes on DC/client big APU that are tiled for cost/power/scale-up bling reasons.
A100 has a chiplet design
Split on-die L2 does not make it a chiplet design, it's still one die.
A precursor to one, most definitely.
 
So about a day to go, where are you all putting your money for claimed perf/efficiency? I'll go for them sandbagging a bit with that 1.5x perf/w and they claim 1.7-1.8x efficiency for N31/5nm parts in raster
Non RT game performance between 4080 and 4090. RT heavy game performance 3090Ti level. 350 watt. So closer to 50% for raster games, closer to 100% for RT.
 

This part is interesting:

"Performance, power and price will be revealed at the launch event. Unfortunately, our A0 performance profiling results were not indicative of production samples, so we have refrained from detailing our findings publicly. Silicon and board cost is definitely lower than NVIDIA’s RTX 4090, so let’s see where the two Navi 31 SKUs will be priced."
 
b3da052.png


Quick and dirty die areas derived from the low-resolution Angstronomics picture.
 
That makes no sense. Plus emulation of fixed function units is nothing new. How do you think games from DX7 era run on current GPUs? T&L fixed functions must have been emulated for quite a while, as GPUs are mostly compute now. They might be using normal cores for AI inferencing though.
There's nothing to 'emulate' in tensor/xmx/matrix cores, they're just accelerating normal matrix math. As for precision, RDNA2 goes all the way down to 8:1 INT4 so yeah, you can pick lower precisions for inferencing if you want.
 
Status
Not open for further replies.
Back
Top