And labour costs are one of the biggest factor. No company is letting their high paid employees sitting around doing nothing because they have to wait twice as long for a result.
I couldn't find the price for them. How much more expensive are the AMD chips compared to Nvidia H100?MI300 is much more expensive
Nvidia runs quite well with PyTorch and since October PyTorch also supports CUDA, so compute optimizations are available.
Thats not how GPGPU works. You need optimize software for accelerators. Otherwise companies just wasting money. This is the reason why nVidia has invested so much money into software. Even processors with less (hardware) performance can be competitive when your software is more efficient. AMD must spend twice as much ressource to be barely faster. nVidia can invest these ressources into a CPU and sell the package for a similiar price. AMD customers have to decide to go with MI300X and still have to invest into the host-system or go with MI300A and getting worse performance with only 1/4 of the memory of GH200.Twice as long is nothing in the sector this is relevent for. There it's all about whether something is an order of magnitude (10x) slower or faster. In that world programming ease and implementation is far more important that a measly 2x improvement in execution speed.
It's one reason why Python and now PyTorch is the king. They aren't the fastest languages in terms of execution but they are the fastest and easiest WRT programming and implementation across a broad spectrum of R&D efforts (corporate and research institutions).
Regards,
SB
So GH200 with 4 times memory with what bandwidth?Thats not how GPGPU works. You need optimize software for accelerators. Otherwise companies just wasting money. This is the reason why nVidia has invested so much money into software. Even processors with less (hardware) performance can be competitive when your software is more efficient. AMD must spend twice as much ressource to be barely faster. nVidia can invest these ressources into a CPU and sell the package for a similiar price. AMD customers have to decide to go with MI300X and still have to invest into the host-system or go with MI300A and getting worse performance with only 1/4 of the memory of GH200.
Wut?So GH200 with 4 times memory with what bandwidth?
I was talking to troyan. He complained about memory capacity.Wut?
GH200 HBM3E Superchip has total of 5.5 TB/s mem bandwidth, split at 5 TB/s HBM3E and 512 GB/s LPDDR5X. That's far cry from "4 times memory bandwidth" compared to 5.3 TB/s of MI300X or MI300A
GH200 has 3.2x (MI300X) or 4.8x (MI300A) the memory, but 480 GB of it is behind just 512 GB/s bandwidth, 141 GB (which is less than MI300X but more than MI300A) behind 5TB/s
(late edit: I blame alcohol and other intoxicants if I missed obvious sarcasm)
I somehow missed the "what" portion there, see my late edit which was still in time for your quoteI was talking to troyan. He complained about memory capacity.
Well no, you're limited by NVL-C2C quirks quite heavily, and 512GB/s is only for 120/240GB SKUs.of it is behind just 512 GB/s bandwidth
At a recent launch event, AMD talked about the inference performance of the H100 GPU compared to that of its MI300X chip. The results shared did not use optimized software, and the H100, if benchmarked properly, is 2x faster.
little did Nvidia know you can also amp batch sizes on MI300X; the thing has 304 CUs after all.NVIDIA published their numbers for the inference benchmarks AMD used in their presentation. NVIDIA claims AMD results didn't use properly optimized code.
Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM | NVIDIA Technical Blog
Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM…developer.nvidia.com
Was clear from AMD's own numbers. Even nVidia claims that one H200 is 1.9x faster in Llama than H100. Like is said: MI300 is the bottom for AMD. I only hope it can go better from now on.NVIDIA published their numbers for the inference benchmarks AMD used in their presentation. NVIDIA claims AMD results didn't use properly optimized code.
Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM | NVIDIA Technical Blog
Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM…developer.nvidia.com
AFAIK, I don't think AMD has ever participated in any MLPerf which is a shame since they won't be able to compare performance uplifts against the previous generations.MLPerf was created to prevent these problems.
With significantly more performance deficit. With optimized software batch 1, the H100 is 47% faster in Inference. I imagine with larger batches it's probably even faster.little did Nvidia know you can also amp batch sizes on MI300X; the thing has 304 CUs after all.