NVIDIA discussion [2024]

According to TechInsights, NVIDIA managed to ship around 3.76 million units of data center GPUs in 2023, which is almost 1 million more than what the company achieved back in 2022. The firm managed to gain a market share of a whopping 98%

 
Not sure how much to trust that, it feels like someone just dividing public revenue numbers by ASP estimates maybe? NVIDIA had full-year datacenter revenue of $47.5B, but that's "end of january 2023 to end of january 2024" rather than "2023". And it includes Mellanox/networking which is a large amount of money as well. And H100 80GiB probably had >2x ASP of A100 40GiB. So it kinda-sorta tracks but doesn't say anything very useful if it is even real data imo.

Also, AMD/Intel are meant to be 50K and 40K respectively, not 500K and 400K, for it to add up. That's a pretty big typo...
 
Latest ML4.0 Benchmarks, NVIDIA is far ahead of the competition (Gaudi 2 and TPUv5p).


In GPT3 for example:
512 H100: 50 minutes
512 Gaudi 2 : 132 minutes (normalized)
512 TPUv5p: 144 minutes



IIama 70b:
8 H100: 29 minutes
8 Gaudi 2: 78 minutes


AMD is missing for the third time. No idea when they will post their results for MI300x.


NVIDIA is also talking about a 100k Hopper factory in 2024 and a 300k Blackwell GPU factory in 2025.

2024-06-12_19-14-53.png
 
Last edited:
To be fair, MI300X is mostly focusing on Inference, so the fact they didn't participate in MLPerf Training isn't surprising, but if they miss the next MLPerf Inference refresh *again* then that wouldn't be a good sign...
 
H100 slower than 4090? Why?
RTX 4090 has more SMs and higher clock rates, if you're not DRAM limited or Tensor Core limited (and don't need FP64 or FP16), it is actually much faster than H100! Also, H100 PCIe only has a 350W TDP (H100 SXM5 has a massive 700W TDP) so it's extremely power limited for many workloads, while RTX 4090 has a 450W TDP so it also wins on that front.

Same reason why I wouldn't put too much weight into MI300X not being that much better in Geekbench OpenCL, it's really not aimed at that kind of workload at all.
 
Speaking of which, I wanna discuss the difference between H100 and MI300X die size and cost wise

We know MI300X uses about 920mm² of combined TSMC 5nm chiplets (8 compute chiplets) stacked on top of 1460mm² of TSMC 6nm chiplets (4 IO chiplets). Combined this makes for a total silicon area of 2380mm², this is compared to a 814mm² single die of the H100.

https://www.semianalysis.com/p/amd-mi300-taming-the-hype-ai-performance

NVIDIA's transistor footprint is much smaller, but yield is worse, however the H100 is not a full die product, it has about 20% of it disabled to improve yields. So the difference in yields between the two may not be that large.

MI300X costs significantly more due to it's larger overall size, more complex packaging and the need for additional 6nm chiplets.

Any thoughts?
 
To understand where NVIDIA is going with Blackwell. The next step for AI breakthrough is to massively increase the amount of compute dedicated to a single very large model, train a multi-trillion parameter multimodal transformer with massive amounts of video, image, audio, and text. Then run it on a cluster of a 100K H100/B100/B200, which will be the basis of these large clusters.

Multiple large AI labs including but not limited to OpenAI/Microsoft, xAI, and Meta are in a race to build GPU clusters with over 100,000 GPUs. These individual training clusters cost in excess of $4 billion of server capital expenditures alone, but they are also heavily limited by the lack of datacenter capacity and power as GPUs generally need to be co-located for high-speed chip to chip networking. A 100,000 GPU cluster will require >150MW in datacenter capacity and guzzle down 1.59 terawatt hours in a single year, costing $123.9 million at a standard rate of $0.078/kWh.

 
Speaking of which, I wanna discuss the difference between H100 and MI300X die size and cost wise

We know MI300X uses about 920mm² of combined TSMC 5nm chiplets (8 compute chiplets) stacked on top of 1460mm² of TSMC 6nm chiplets (4 IO chiplets). Combined this makes for a total silicon area of 2380mm², this is compared to a 814mm² single die of the H100.

https://www.semianalysis.com/p/amd-mi300-taming-the-hype-ai-performance

NVIDIA's transistor footprint is much smaller, but yield is worse, however the H100 is not a full die product, it has about 20% of it disabled to improve yields. So the difference in yields between the two may not be that large.

MI300X costs significantly more due to it's larger overall size, more complex packaging and the need for additional 6nm chiplets.

Any thoughts?
AMD is the only firm that has a track record of successfully delivering silicon for high performance computing.

Stopped reading after that.
 
Back
Top