Speculation and Rumors: Nvidia Blackwell ...

  • Thread starter Deleted member 2197
  • Start date
Oh ok. So Jan 2025
Yeah, they could probably launch some of them now but they seem to want as much 40-series stock cleared out as possible before the launch. Hopefully we’ll get at least one of the cards by end of January. Early word was they’d release the 5080 before the 5090, but unless it’s priced very competitively I’d think they’d want the halo card out first. In any case they’ll probably release within a few weeks of each other as usual - wouldn’t be surprised to see the rumored 5070 launch much later, in the spring. Of course political developments could affect all of this. I hope not but the possibility can’t be entirely discounted.
 

The listing at Broadberry has put a price tag of $515,410.43 on the Blackwell DGX B200 AI system, with configuration options as well, mainly dealing with after-sale services. This is the first instance where we have seen NVIDIA's Blackwell AI product surface over the internet in the form of a retail listing, and while we are currently unaware of the supply situation, it is said that Blackwell will be initially confined, with a larger portion of shipments slated for the first quarter of next year.

Screenshot-2024-10-10-200026.png
 
When you look at FP8 training performance on the spec chart below, B200 is 2.5x faster than Hopper -- which is only 1.25x faster on a per-die basis. How did they get to that “5x faster training” number? Well, B200 has another new feature to double per-die performance over Hopper: 4-bit arithmetic.

4 bits doesn’t seem like a lot. If you’re using those bits to represent integers, you can only count up to 16. But Nvidia’s GPUs feature some really clever technology to squeeze the most utility out of those 4-bit numbers. They’re called “mixed precision tensor cores,” and if you want to understand Nvidia’s dominance at AI, you need to understand how they work.
...
The closer a network can get to being represented entirely with FP4 operations, the closer Blackwell’s training performance can get to that eye-popping 5x number Nvidia cited. And luckily, there’s already some research showing that networks can train with FP4 operations without significant loss of accuracy. If those results can scale to GPT-4-scale networks, then Nvidia has a huge advantage over other datacenter AI chips, which, as far as I can tell, don’t yet support these FP4 operations.
 
How well does it run Crysis though?

On a genuine note, why have they gone with Intel rather than AMD for the CPU? I thought AMD performed better these days?

Could simply be a matter of timing, i.e. what was available earlier for qualification/validation. And anyway the CPU performance of these clusters is not relevant, the CPU is almost an accessory. They will likely qualify newer generation processors for third party servers in the coming quarters.
 
Blackwell laptop designs are seemingly also launching at CES 25.

Also apparently B300 is a thing now.
 
Last edited:
More Blackwell server announcements. No new architecture details. I wonder why Nvidia hasn’t shared any details of the SM configuration, cache or clocks.

Based on the limited information they’ve shared so far it’s hard to tell whether Blackwell is a further refinement of Volta/Ampere/Hopper or something more.

“The GB200 Grace Blackwell NVL4 Superchip integrates four NVIDIA NVLink-connected Blackwell GPUs unified with two Grace CPUs over NVLink-C2C, Buck said. It provides up to 2x performance for scientific computing, training and inference applications over the prior generation.

The GB200 NVL4 superchip will be available in the second half of 2025.”
 
5070 Ti rumored to be based on GB203 with only 6% more SMs than the 4070 Ti Super and 16% more than the 4070 Ti. The optimist in me thinks Blackwell SMs must be a lot more efficient or clock much higher.
 
Back
Top