From the die shot pics, I've seen estimates of 709-713 mm2 posted on twitter. That's pretty large but still smaller than the A100. AMD enabling 120 CUs out of 128 is not bad at all points to a reasonably good yield. We'll probably see a slightly further cut down part later once they are in full production.
Capacity is only 32GB but just like Nvidia, I'm guessing they'll have a part with HBM2E and double the capacity in a quarter or two.
Of note of course is that the leaks here were way off, at least those pinning it as a straight Vega derivative with a ton of standard FP32 flops. Instead this appears to be a straight up matrix/machine learning, at least for model training, competitor to challenge Nvidia and others. And it has, at least theoretically, the performance to do so if its priced right.
Not sure what the leaks were but anyone who's been following AMD's path in HPC and supercomputer wins would have known it's not just a rehashed Vega. This is going into the Frontier supercomputer next year and a number of other smaller installations. This is AMD's push back into the lucrative HPC segment and it certainly looks like they've invested significantly into it (looking at the readiness of ROCm 4.0 as well)
Pricing is apparently around $6400 as per a link from
@CarstenS in another thread but these products aren't usually sold at retail. AMD themselves claim between 1.8X - 2.1X higher performance per dollar than the Nvidia A100.
I want to recall that there was some speculation on wether CDNA would be a short-lived product precisely because it was assumed to be mostly Vega. Would this more radical departure speak for CDNA continuing in parallel with RDNA? And would there be some degree of symbiosis in having two architectures developed in parallel like this?
AMD has already committed to CDNA 2, which is going into the El-Capitan Supercomputer in 2022. The key feature they've announced is that it's using a next gen Infinity fabric which allows cache coherency with EPYC CPUs. Not sure about Symbiosis but the whole point of CDNA and RDNA is to maximize the utility of each with specific features for the segments they are intended. Certain technologies such as Infinity cache, fabric, etc may of course be used across both, and some amount of physical design, etc but otherwise it seems like they will be developed in parallel.
From dieshot and infinity links placement, I guess that MI100 uses 3x Infinity Fabric Links + WAFL(?), with 3x more links unused. Just guessing, I can be totally wrong.
View attachment 4930
Well the official specs from AMD say 3 links -
https://www.amd.com/en/products/server-accelerators/instinct-mi100
And given that they're promoting 4 GPU configurations using the 3 links, I don't expect there to be more unused ones or they'd be using them.