AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
Defect rates on 6/7nm are vanishingly low and MC/PHY/Cache defect sensitivity is also extremely low. So small dies for these functions provide no advantage.
I'm not sure that they are "vanishingly" low and considering the cost of advanced nodes even low defects rate may have a sizeable price difference.
 
I mean, that does seem to be what AMD are planning on doing with CDNA3. It's actually what I originally expected RDNA3 to be, but I guess AMD are just not quite there yet.
To repeat myself, 3D V-Cache sits where ever there's cache it's supposed to be expanding. If we assume Infinity Cache is with memory controllers, that's where 3D V-Cache will be too (if there will be 3D V-Cache to begin with)
 
To repeat myself, 3D V-Cache sits where ever there's cache it's supposed to be expanding. If we assume Infinity Cache is with memory controllers, that's where 3D V-Cache will be too (if there will be 3D V-Cache to begin with)
Well I think the best use of 'Vcache' is not going to be stacking cache on top of existing cache, it'll be having all the cache on a different chip and layer, presumably on the bottom. This'll give you the best overall area savings.

And yea I dont expect Vcache versions of RDNA3 GPU's. If the point of having these chiplets is scalability of L3, then that solution is kind of already there.
 
Well I think the best use of 'Vcache' is not going to be stacking cache on top of existing cache, it'll be having all the cache on a different chip and layer, presumably on the bottom. This'll give you the best overall area savings.

And yea I dont expect Vcache versions of RDNA3 GPU's. If the point of having these chiplets is scalability of L3, then that solution is kind of already there.
It's too early to rely solely on 3D V-Cache, give it couple more years to mature for that.
(And even then you'd still want it at the memory controllers which would be the very same place Infinity Cache would sit anyway)
 
Probably because it's really bloody hard to get multiple GCDs to work nicely with one another. CPUs already assume that cores are independent, and NUMA and NUCA are already solved problems, so scaling out multiple compute chiplets isn't really a problem. The same is not true of graphics workloads where a single kernel might be running across the entire chip.
I'm not sure I understand that last sentence. A single kernel running across the entire chip is common for compute workloads, yet from the perspective of the cores it doesn't matter as they usually don't have to communicate with each other. The problem with graphics workloads is the graphics pipeline with its implicit communication and ordering requirements.
 
I'm not sure I understand that last sentence. A single kernel running across the entire chip is common for compute workloads, yet from the perspective of the cores it doesn't matter as they usually don't have to communicate with each other. The problem with graphics workloads is the graphics pipeline with its implicit communication and ordering requirements.
Which is what the patents we've already seen try to solve, specifically the hardware geometry ordering for a single compute die pre-pass that sorts everything for distribution to multiple chiplets.

So, yeah raytracing and compute/vertex/pixel shaders should be quite flexible. It'd be interesting to know the driver work involved, but ideally nothing gets accessed outside wavefronts for immediate work dependence in these. And as these are the major drivers modern triple a game, IE the only thing customers looking at benchmarks care about, I can see multiple GCDs working. The bandwidth requirements seem to be met with these modern interconnects.
 

Kabooley.
Thank you to Mr. Juicey for dropping the bomb.

Impressive achievement in the perf/mm^2 according to the article, it makes me wonder however about the absolute performance of such a solution, when Nvidia will have a monolithic solution with greater area than a multi-chip device. Yes, we are hearing rumors about over 3 GHz clocks, but will this be enough?
 
Some very interesting parts like RDNA3's WGP being physically smaller than RDNA2's at the same node, much smaller cache amounts than many expected too (N31 and N32 having less than N21 and N22 respectively). N33 in particular is going to be exciting, similar transistor budget to N23 and N10 given N6's density improvements and we get to compare RDNA3's efficiency improvements (perf/area, transistor and power) relatively directly to the last 2 generations. Even if it's "only" a normal generational improvement in raster performance (about 1.4-1.5x vs 6600XT) that's still a bit faster than the 6700xt at 1440p except you're using about 0.7x the transistors (fewer WGPs/CUs except more ALUs), have significantly less cache + cache bandwidth, less VRAM bandwidth etc and using about 160-170W vs 220W in games? That would be an astonishing arch improvement in a single generation
 
Calling this out as almost certainly fake. There's a ton of errors that come down to either major typos again and again, or a fake by someone that can't do math.

The midrange card suddenly flips from a 16CU block to... I dunno 12CUs and an odd number of blocks. High end card has either a 20% or 140% resource increase depending on how many SIMD32's are counted in a CU/WGP, which is already contradicted in the post earlier as well. No mention of multiple graphics dies, despite rescinded patch notes and patents both mentioning that. Finally the numbers just look like a copy/paste of my own predictions. I'm not saying I'm wrong, but I'm not saying I'm right at all either.
 
So the all the graphics/compute work is done similarly to a monolithic die. Work distribution over multiple chips hasn't yet been solved.
 
OREO (Opaque Random Export Order) sounds interesting, essentially replacing the re-order buffer (ROB) with a smaller skid buffer allowing things to be received and executed in any order before being exported to the next stage in-order.
 
Less.
RDNA3 is utterly bananas as far as raw IP prowess goes.


It's real, about as real as it gets.

Ah, but it was, MI300 is literally next and N4x parts aren't THAT far away either.
Will the gap in RT performance grow or lessen with RDNA 3 vs Ada? What about rasterization performance relative to resolution compared to RDNA 2 vs Ampere?

To what degree has work distribution been solved?
 
Status
Not open for further replies.
Back
Top