Their testing doesn't make sense to me. Why are they using ZeRO-Inference (optimised for running huge models with weights stored outside GPU DRAM) on a 8xMI300X system with 1536GB of GPU memory? How are they managing to get OOM on a 8xH100 system at a batch size of 16 for a 170B model? That doesn't mean MI300X is good or bad; just that those specific tests are useless.
We should start a new MI300X(/MI400?) thread, it's worth discussing but NOT in the same thread as Blackwell, that will just inevitably lead to drama and low signal-to-noise. A lot of posts in this thread discuss both Blackwell and AMD so I'm not sure it works to just move all of them into a new thread unfortunately...