Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
Stand outs:

In direct comparisons with previous-generation models, the RTX 5090 shows 30-40% performance increase without DLSS 4
With DLSS on, gamers not only enjoy improved frame rates but also reduced latency, when considering absolute milliseconds
Usage statistics indicate that over 80% of RTX players enable DLSS during gameplay, with a cumulative total of 3 billion hours of DLSS-enabled gaming
Double RT throughput on ray tracing units
The Shader Executive Reordering introduced with Ada Lovelace is now intended to be done twice as fast at Blackwell as quickly as with the predecessor
cores include a triangle cluster intersection engine designed specifically for handling mega geometry
Ray Tracing on Blackwell is expected to require 25 percent less graphics card memory than Ada Lovelace
 
Can't say that this "tech deep dive" is very "tech" or "deep". A proper whitepaper would be nice.
Also +30-40% seem to be confirmed now. 5080 will likely end up being slower than 4090 in direct comparisons - unless the new DLSS model would run a lot better on Blackwell leading to some unexpected gains there.
 
Do we know if Blackwell has Level 5 RT now?

Level 5 – Coherent BVH Processing with Scene Hierarchy Generator in Hardware.

12-1080.40ac43d3.png
 
Can't say that this "tech deep dive" is very "tech" or "deep". A proper whitepaper would be nice.
Also +30-40% seem to be confirmed now. 5080 will likely end up being slower than 4090 in direct comparisons - unless the new DLSS model would run a lot better on Blackwell leading to some unexpected gains there.
I don't think we get one. Just look at Datacenter Blackwell and the terribly presentation at hot chips. It's just a basic architecture overview. Nvidia stopped showing real architecute details with Blackwell. It seems like arrogance is starting to win at Nvidia everywhere.
 
@Man from Atlantis That's a pretty big change for the SM. Really looking forward to reading this. Curious of the int32 increase is going to mean anything interesting for performance. My expectation is it won't.

Well Nvidia made a big deal out of saying that INT32 isn’t as important as FP32 to graphics performance. No idea why they’re reversing course now. Maybe it helps more with AI workloads?
 
Can't say that this "tech deep dive" is very "tech" or "deep". A proper whitepaper would be nice.
Also +30-40% seem to be confirmed now. 5080 will likely end up being slower than 4090 in direct comparisons - unless the new DLSS model would run a lot better on Blackwell leading to some unexpected gains there.

Nvidia’s white papers don’t go into much more detail than we’re seeing on these slides. Their CUDA programming guides are slightly better but they haven’t released one for Blackwell.

Anyone else find it strange that there’s zero mention of L2 cache size and capability after they hyped up Ada L2? This really feels like a coasting generation hardware wise. All the fun stuff is in software.
 
Well Nvidia made a big deal out of saying that INT32 isn’t as important as FP32 to graphics performance. No idea why they’re reversing course now. Maybe it helps more with AI workloads?

Kind of wondering if it has something to do with simplifying the SM to improve scheduling of work. You have shader execution reordering and this AMP processor to help schedule ai workloads, and maybe it's just easier to do all of this if the compute resources are all the same.
 
Nvidia’s white papers don’t go into much more detail than we’re seeing on these slides. Their CUDA programming guides are slightly better but they haven’t released one for Blackwell.

Anyone else find it strange that there’s zero mention of L2 cache size and capability after they hyped up Ada L2? This really feels like a coasting generation hardware wise. All the fun stuff is in software.
Judging by the die sizes of GB203/205 in comparison to AD103/104, it seems Nvidia has kept the same L2 cache size.
 
Judging by the die sizes of GB203/205 in comparison to AD103/104, it seems Nvidia has kept the same L2 cache size.

If those die sizes are accurate then Nvidia somehow managed to cram fatter SMs and 4 more of them into the same area. I would be surprised if they didn’t sacrifice L2 to make room and lean more heavily on GDDR7.
 
Back
Top