MI250X is treated as 2 completely separate GPUs, isn't it?
Yep, we have yet to see a chiplet GPU in action. The MI250X is just two GPUs in the same socket.
MI250X is treated as 2 completely separate GPUs, isn't it?
It does seem a bit odd to make them so small, but I guess they're limited by their placement options and which sides of the die the interconnects are on, as well as minimizing the area of the interposer or number of multiple EMIBs/similar technology to connect them.I wonder how much variability there is in MCD dies, vs GCD dies? Especially in terms of what speed they will run at.
if there is a lot of variability in MCD dies, then they can bin a lot of those MCD's very well.
Eg. Bundle the slower ones for slower cards with slower GDDR needs.
I'm still hoping we end up getting a monster config with multiple GCD as well.
It seems kinds pointless to go for such a small GCD die, if they are moving to a chiplet style arch.
Not taking advantage of being able to make much bigger compute dies seems silly....
I'm not sure if Navi3x gen is the GPU version of zen1 or zen2!
I was gonna respond saying your all wrong, but after giving it some more thought, I think you might be correct in that the underlying substrate, eg. the interposer might actually be a limiting factor here. Whilst they can dictate the size and shape of the "socket" for lack of a better term, it simply may not be feasible to use a much larger interposer underneath all the various chips.It does seem a bit odd to make them so small, but I guess they're limited by their placement options and which sides of the die the interconnects are on, as well as minimizing the area of the interposer or number of multiple EMIBs/similar technology to connect them.
Assuming you weren't packaging-constrained, 3x hypothetical 80mm2 128-bit MCDs, one on each of the 3 edges of the compute die would seem to make a lot more sense to me. It would still be easy to scale up and down as needed, lower end products can just have 2 for a 256-bit bus and if you really needed a chiplet SKU that had a 320-bit or 192-bit bus then you've got a place for your half-broken harvested ones to be used. You'd likely be saving some money on the chiplet end, but spending more on the packaging end in that case; I'm sure AMD's engineers already made the cost-benefit analysis of each approach and came to the proper conclusion.
AMD has already several more advanced 'interposer' technologies in their hands than huge interposer covering all chipsI was gonna respond saying your all wrong, but after giving it some more thought, I think you might be correct in that the underlying substrate, eg. the interposer might actually be a limiting factor here. Whilst they can dictate the size and shape of the "socket" for lack of a better term, it simply may not be feasible to use a much larger interposer underneath all the various chips.
You raise some interesting points about the 6 x 64bit controllers, vs 3 x 128bit.
Although the 6 x 64 setup does give them more flexibility going forward, and is probably a more natural match for the 64bit wide nature of GDDRx
(yeah i know GDDR5 is actually 2 x 32 internally, or is that DDR5 only? )
I'm not sure how well harvested MCD dies would work,
so after all that maybe 6 x 64 MCD's is the smarter move.
After reading the initial article they do mention the possibility of a larger GCD in the future, so perhaps i was too quick to dismiss that.
I also don't understand why AMD didn't go for the kill (if rumors are right). You know that practical limit on N5/4 is around 600mm2 for a gaming GPU, before cost explodes in datacenter territory. So why don't make a 600mm2 GDC and 6*40mm2 MCD (840mm2 total) and be sure to land well above AD102 ? In other words, take significantly the performance crown and charge whatever you want for it. In marketing, the rule is simple, people pay whatever for the best, even when it has horrible value for money. It has been Nvidia motto forever and it worked very well... So why AMD?
The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.and extra IO area for inter-chiplet communication
The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.
Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.
Imagine AMD making a Zen processor where there's multiple I/O dies and a single core complex die?...
That's how ridiculous the rumours currently seem to me.
Further, AMD had two distinct designs of Zen I/O chiplets, one for consumer and the other for server. The server design has more connectivity options (more PCI Express lanes, more DDR channels).
Why wouldn't AMD follow the Zen pattern when making a chiplet based GPU?
They're still in monolith phase there too (sure there's now two of them in one packaging, but they're still treated as separate GPUs)BTW AMD also design CDNA based GPUs.
There is the M1 Ultra from Apple, it failed to impresss GPU wise, even when working with the Metal API, scaling issues is rampant even in compute workloads, bad fps pacing in gaming is also an issue.Yep, we have yet to see a chiplet GPU
Yep, we have yet to see a chiplet GPU in action. The MI250X is just two GPUs in the same socket.
Apparently you missed the bit where I talked about consumer and server variants: where the compute chiplets are all the same and it's the count of compute chiplets plus the size of the I/O die that varies according to SKU.I don't understand your comparison to Zen.
Zen3 doesn't need more IO
That's the ideal, but not necessarily feasible at this time. For Radeon & Instinct lines the compute chiplets can't* remain same though, unlike Ryzen & EpycApparently you missed the bit where I talked about consumer and server variants: where the compute chiplets are all the same and it's the count of compute chiplets plus the size of the I/O die that varies according to SKU.
The compute chiplets are expensive, small and on the most advanced node.
The I/O chiplet is cheap, large and on a legacy node.
There is the M1 Ultra from Apple, it failed to impresss GPU wise, even when working with the Metal API, scaling issues is rampant even in compute workloads, bad fps pacing in gaming is also an issue.
The most obvious reason is that you can actually scale MCDs rather easily (remains to be seen what such physical implementation would mean for L2 coherency though) while scaling GCDs would be pretty hard, above 2 especially I'd wager.The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.
Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.
Imagine AMD making a Zen processor where there's multiple I/O dies and a single core complex die?...
That's how ridiculous the rumours currently seem to me.
Further, AMD had two distinct designs of Zen I/O chiplets, one for consumer and the other for server. The server design has more connectivity options (more PCI Express lanes, more DDR channels).
Why wouldn't AMD follow the Zen pattern when making a chiplet based GPU?
3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections. There is also a massive difference in the bandwidth requirements.The extra I/O area for Zen 3 chiplets to communicate with 3D V Cache is so small, everyone missed it until AMD mentioned it.
Probably because it's really bloody hard to get multiple GCDs to work nicely with one another. CPUs already assume that cores are independent, and NUMA and NUCA are already solved problems, so scaling out multiple compute chiplets isn't really a problem. The same is not true of graphics workloads where a single kernel might be running across the entire chip.Meanwhile, I have no idea why AMD would produce a chiplet based GPU where the cheap silicon, 6nm, appears as multiple small dies for I/O and cache and the expensive silicon, 5nm, appears as the single biggest die, by far, for compute.
What is the point of scaling MCDs? How does that save money?The most obvious reason is that you can actually scale MCDs rather easily (remains to be seen what such physical implementation would mean for L2 coherency though) while scaling GCDs would be pretty hard, above 2 especially I'd wager.
With CPUs it is rather easy to scale CCDs while scaling IODs is more complex.
All in all it looks like an easy way of making die costs smaller, not improving performance per se. In this the approach is very similar to Zen's.
It's not about i/o dies or computes dies or whatnot dies for 3d v-cache, it's about where your cache is. If we assume Infinity Cache is in memory dies, that's where the 3D V-Cache will be too (assuming of course they use 3D V-Cache on GPUs in the first place)3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections. There is also a massive difference in the bandwidth requirements.
Probably because it's really bloody hard to get multiple GCDs to work nicely with one another. CPUs already assume that cores are independent, and NUMA and NUCA are already solved problems, so scaling out multiple compute chiplets isn't really a problem. The same is not true of graphics workloads where a single kernel might be running across the entire chip.
You get 200-300mm^2 main dies and X number of small MCDs instead of two big 400-600 mm^2 dies. Defects ratio is lower and you get more dies from a wafer.What is the point of scaling MCDs? How does that save money?
Defect rates on 6/7nm are vanishingly low and MC/PHY/Cache defect sensitivity is also extremely low. So small dies for these functions provide no advantage.You get 200-300mm^2 main dies and X number of small MCDs instead of two big 400-600 mm^2 dies. Defects ratio is lower and you get more dies from a wafer.
I mean, that does seem to be what AMD are planning on doing with CDNA3. It's actually what I originally expected RDNA3 to be, but I guess AMD are just not quite there yet.3D V cache sits on top of the compute die. With a memory controller, RDNA's cache/IMC dies would need to sit underneath the compute die, which would also necessitate passing through power and other connections.