There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues to keep pace with fast-moving ML technology. Nvidia showcased H100 and H200 results, Qualcomm’s Cloud AI 100 Ultra (preview category) and Intel/Habana’s Gaudi 2 showed gains. Intel had the only CPU-as-accelerator in the mix.
...
Overall, the number of submitters has been fairly stable in recent years. There were 23 this time including ASUSTeK, Azure, Broadcom, Cisco, CTuning, Dell, Fujitsu, Giga Computing, Google, Hewlett Packard Enterprise, Intel, Intel Habana Labs, Juniper Networks, Krai, Lenovo, NVIDIA, Oracle, Qualcomm Technologies, Inc., Quanta Cloud Technology, Red Hat, Supermicro, SiMa, and Wiwynn. MLPerf Inference v4.0 included over 8500 performance results and 900 Power results from 23 submitting organizations.
Missing were the heavy chest-thumping bar charts from Nvidia versus competitors as the rough pecking order of inference accelerators, at least for now, seems to be settled. One of the more interesting notes came from David Salvator, Nvidia director of accelerated computing products, who said inference revenue was now 40% of Nvidia’s datacenter revenue.
...
MLCommons provided a deeper look into its decision-making process in adding the
two new benchmarks which is
posted on the MLCommons site.
The composition of the team doing that work — Thomas Atta-fosu,
Intel (Task Force Chair); Akhil Arunkumar,
D-Matrix; Anton Lokhomotov,
Krai ; Ashwin Nanjappa,
NVIDIA; Itay Hubara,
Intel Habana Labs; Michal Szutenberg, Intel Habana Labs; Miro Hodak,
AMD ; Mitchelle Rasquinha,
Google; Zhihan Jiang, Nvidia — reinforces the idea of cooperation among rival companies.
...
Practically speaking, digging out value from the results requires some work. With this round, MLPerf results are being
presented on a different platform —
Tableau — and, at least for me, there’a learning curve to effectively use the powerful platform. That said the data is there.
...
Instead SalvatorAsked about forthcoming Blackwell GPUs, B100 and B200, and their drop-in compatibility with existing H100 and H200 systems, Salvator said, “We have not designed B200 to be drop-in compatible with an H200 CTS system. The drop-in compatible side is focused more on the B100, because we have a pretty significant installed base of H100 base servers and a lot of our partners know how to build those servers. So that ability to easily swap in a B100 base board gets them to market that much faster. B200 will require a different chassis design. It’s not going to be drop-in compatible with H200 systems.”
...
Shah noted Intel had five partners submitting this time around. “The fact that we have five partners that submitted is indicative of the fact that they also are recognizing that this is where Xeons key strengths are; it’s in the area of when you have mixed general purpose workloads or a general purpose application and you’re infusing AI into it.” The five partners were Cisco, Dell, Quanta, Supermicro, and WiWynn.
Next up is the MLPerf
Training expected in the June time frame.