Nvidia Blackwell Architecture Speculation

DegustatoR · Jan 8, 2025

DavidGraham said:
RTX Mega Geometry (who comes up with these names?) works on all RTX GPUs.

Yes. NTC should also work on any GPU with fast enough matrix math. The rest though are less clear.

techuse · Jan 8, 2025

I assume the review embargo is the 30th but has anyone heard any specifics?

TopSpoiler · Jan 8, 2025

Jensen says DLSS 4 'predicts the future' to increase framerates without introducing latency (Updated: Nope)

Multi frame generation will be very different from framegen.

www.tomshardware.com

When we asked how DLSS 4 multi frame generation works and whether it was still interpolating, Jensen boldly proclaimed that DLSS 4 "predicts the future" rather than "interpolating the past." That drastically changes how it works, what it requires in terms of hardware capabilities, and what we can expect in terms of latency.

Extrapolation?

Scott_Arm · Jan 8, 2025

TopSpoiler said:
Jensen says DLSS 4 'predicts the future' to increase framerates without introducing latency (Updated: Nope)

Multi frame generation will be very different from framegen.

www.tomshardware.com

Extrapolation?

That’s an interesting article. It also suggests only 50 series can run neural shaders, which would be a death knell for feature adoption

Remij · Jan 8, 2025

Scott_Arm said:
That’s an interesting article. It also suggests only 50 series can run neural shaders, which would be a death knell for feature adoption

It's why I'm strongly considering not getting a 5090 and instead waiting for 60 Series. It will be years before these features are broadly adopted in games (if they ever do) and the 4090 and 5090 will still remain overkill for the rest of this generation for VRAM.. so there's no real necessity.

Ext3h · Jan 8, 2025

Scott_Arm said:
That’s an interesting article. It also suggests only 50 series can run neural shaders, which would be a death knell for feature adoption

That makes sense though. Neural shaders with forward shading means you need a proper interface to the tensor cores that is suitable for calling it from just groups of 4 threads, not full warps.

Older Nvidia generations would not be able to use the tensor cores properly from a fragment shader.

Funny though, RDNA should not have any issues with neural shaders at all, if anything Blackwell should behave much closer to it now.

Scott_Arm · Jan 8, 2025

hopefully there's a decent whitepaper at some point, but I'm looking forward to see the launch details that media people are able to get their hands on.

Man from Atlantis · Jan 8, 2025

Some brief hands-on experience with DLSS 4 and Reflex 2 from an e-sports player's perspective. Some funny tidbits regarding cheap 4090s on used market.

Kaotik · Jan 8, 2025

Man from Atlantis said:
Some brief hands-on experience with DLSS 4 and Reflex 2 from an e-sports player's perspective. Some funny tidbits regarding cheap 4090s on used market.

Tl;dw?

Xmas · Jan 8, 2025

DegustatoR said:
Yeah, well, they could've done shading and tensor operations "at the same time" since Ampere. It is usually impossible due to register file and memory bandwidth though, not because of how the SM is designed. So it is unclear what has changed in Blackwell. Maybe they've moved the tensor ALUs inside main shading SIMDs and they all are controlled by the same logic now? Would be wild as that's seem to be how AMD has implemented AI h/w in RDNA4, and also would actually be a step backwards from how tensor h/w was built into Nvidia GPUs since Volta.

H100 introduced asynchronous warpgroup MMA. A single thread within a warpgroup (= 4 warps, 128 threads) submits a batch of tensor operations. The tensor cores then pull data out of shared memory (and, optionally, registers) and write the result to registers across the whole warpgroup. This happens asynchronously, threads can go on doing other work until they require the MMA results, while the tensor cores use any "spare" shared memory and register bandwidth to do their work. I would expect Blackwell to adopt (in fact, to expand) this model.

DegustatoR · Jan 8, 2025

Ext3h said:
That makes sense though. Neural shaders with forward shading means you need a proper interface to the tensor cores that is suitable for calling it from just groups of 4 threads, not full warps.

Older Nvidia generations would not be able to use the tensor cores properly from a fragment shader.

Funny though, RDNA should not have any issues with neural shaders at all, if anything Blackwell should behave much closer to it now.

Well strictly speaking nothing stops Nvidia from running such workloads in the same way RDNA does it right now. It would probably be too slow to be usable though. The convergence between Blackwell and RDNA is happening with RDNA4 specifically which isn't exactly generic "RDNA" and means the exact same thing for "neural shaders" compatibility on AMD as it does for Nvidia.

Boss · Jan 8, 2025

The more you look into the specs, the more obvious it is that Nvidia is trying to pull a fast one..... Comparing the 4070 super to the 5070, the 4070 super has less bandwidth but 17% more cuda cores, 17% more rt cores...... Other than mfg which isn't looking too hot at the moment with all the artifacts that it has, will the 5070 even beat the 4070 super? I'm not so sure.... I'm, not so sure... The 5080 also looks to be barely faster than a 4080 super if we just look at the specs. The same also appears to be true when comparing the 5070ti to the 4070ti super.... This might just be the worst generational increase I've ever seen. It's no surprise that it comes as a result of no competition. We'll see what the real world non mfg performance looks like shortly.. I personally can't wait.

troyan · Jan 8, 2025

In these Farcry 6 benchmarks from nVidia Blackwell is ~33% faster than the non super Lovelace cards. A 5070TI would be ~10% slower than a 4090:

NVIDIA GeForce RTX 4080 Super Founders Edition Review - Savings of $200

NVIDIA's new GeForce RTX 4080 Super introduces a noteworthy $200 price reduction compared to the non-Super 4080, placing significant pricing pressure on AMD's RX 7900 XTX. Despite this, the performance gains vs RTX 4080 non-Super are only marginal, we expected more.

www.techpowerup.com

DegustatoR · Jan 8, 2025

Boss said:
will the 5070 even beat the 4070 super? I'm not so sure.... I'm, not so sure...

5070 is beating 4070Ti in Nvidia provided benchmarks.

Boss said:
The 5080 also looks to be barely faster than a 4080 super if we just look at the specs

Which is why nobody should look at the specs to figure out the performance.

Charlietus · Jan 8, 2025

That 27% and 40 something percent uplift for far cry and a plague tale tells me that scaling outside of node reductions is dead and buried.

The 5090 consumes like 25% more energy and has 70% percent more memory bandwidth and that's the percentage improvement?

troyan · Jan 8, 2025

nVidia claims that their notebook versions are 40% more efficient. And they use basically the same configuration outside of GDDR7.

Boss · Jan 8, 2025

DegustatoR said:
5070 is beating 4070Ti in Nvidia provided benchmarks.

Which is why nobody should look at the specs to figure out the performance.

Yea, let's wait for independent benchmarks. The Nvidia provided benchmarks are filled with caveats and often compare unlike things. Even Intel provides better benchmarks.. For those of us not interested in mfg, there's certainly reason for concern.... Especially as it relates to real performance gains in RT and Raster when compared to the super line..

Charlietus said:
That 27% and 40 something percent uplift for far cry and a plague tale tells me that scaling outside of node reductions is dead and buried.

The 5090 consumes like 25% more energy and has 70% percent more memory bandwidth and that's the percentage improvement?

That's for the 5090 and it has significantly more cuda cores, more bandwidth, more rt cores, etc. For the other gpus, when compared to super line, there is barely any improvement in base specs, clock speeds, etc. Very suspect benchmarks released by Nvidia... Very suspect.

Charlietus · Jan 8, 2025

troyan said:
nVidia claims that their notebook versions are 40% more efficient. And they use basically the same configuration outside of GDDR7.

"For GeForce RTX 50 Series laptops, new Max-Q technologies such as Advanced Power Gating, Low Latency Sleep, and Accelerated Frequency Switching increases battery life by up to 40%, compared to the previous generation."

Up to 40%. Don't have much hope for energy efficiency, wouldn't be surprised if with same core count and the same frequency they are very close.

DegustatoR · Jan 8, 2025

Charlietus said:
The 5090 consumes like 25% more energy and has 70% percent more memory bandwidth and that's the percentage improvement?

FC6 is most definitely CPU limited on the 5090 since it shows a higher gain on 5080 vs 4080 which makes zero sense otherwise.
APTR is a more GPU limited game so +40% is the more likely average result for 5090 vs 4090. Considering that we're looking at +30% or so FP32 change between 4090 and 5090 this seems like a solid enough gain really.

Man from Atlantis · Jan 8, 2025

GB202 in the RTX 5090 uses 16x 2 GB Samsung modules. The GPU entered production in the week of September 17 2024.
The Nvidia Drive AGX Thor (right) uses Blackwell GPU with an Arm Neoverse V3AE CPU and Micron memory.

Also, we might learn more about architectural details after tonight's deep dive.

https://twitter.com/x/status/1877003601685430695

Nvidia Blackwell Architecture Speculation

DegustatoR

techuse

TopSpoiler

Jensen says DLSS 4 'predicts the future' to increase framerates without introducing latency (Updated: Nope)

Scott_Arm

Jensen says DLSS 4 'predicts the future' to increase framerates without introducing latency (Updated: Nope)

Remij

Ext3h

Scott_Arm

Man from Atlantis

idk

Kaotik

Drunk Member

Xmas

Porous

DegustatoR

Boss

troyan

NVIDIA GeForce RTX 4080 Super Founders Edition Review - Savings of $200

DegustatoR

Charlietus

troyan

Boss

Charlietus

DegustatoR

Man from Atlantis

idk