Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
Who or how many games/applications/software do you think will ship two sets of textures/shaders just to use more AI HW exclusively for a single vendor ?
I suppose if there wants to be a new console entrant. Or an existing one wants to switch, this would be a very quick way of bringing about adoption.

Consoles need a way to continue to drive graphical fidelity while keeping the power consumption and costs down.

Barring some revolution in silicon that would allow us to increase brute force computation, this looks like the path forward.
 
Last edited:
Edit: need to verify some of my statements 😅

Edit 2: maybe I'm remembering wrong, but I can't find the article that analyzed ampere vs Lovelace. So maybe I'm wrong. Of course there isn't much testing done on those new cards so this is all speculation. Here is the original message:

Overall, not a bad lineup and prices, even if the 5090 got a 400€ increase \:

What's a bit disappointing is that it seems that the SM's aren't that much more efficient, if at all.

In the official Nvidia benchmarks without 3x frame gen (far cry and a plague tale) I see a 30% to a 45% improvement from 4090 to 5090, with the 5090 consuming much more and having more SM's.

Honestly, I wouldn't be surprised if most of the improvement comes from the GDDR7 bandwidth increase. That would be 2 gens without major improvements.
 
Last edited:
I don’t think it’ll be exclusive but it’ll probably run better.

Really want to see if the new directx spec will finally open up some of this technology and make it run across multiple vendors .

I just read the whole thing, and apparently, Neural Shaders are a Blackwell exclusive. Hmm.
 

Since Nvidia replaced the hardware optical flow accelerator with an AI model that runs on Tensor Cores in DLSS 4, wouldn’t this open the door for RTX 2000 and 3000 series GPUs to support the Frame Generation feature? Why is it still exclusive to the RTX 4000 and 5000 series?
Well we dont know what the new model requires. FP8 support is exclusive to 40/50 series for instance, so if it's using that, then it's reasonable that it would be too slow on 20/30 series to be useful.

That said, we might just never know. Even if it was possible to run on 20/30 series parts, they might be getting old enough to where Nvidia just wouldn't bother including support for them in order to push those people to upgrade. I know they supported Ray Reconstruction for these older parts, but I dont think that's nearly as big a selling point as frame generation is.

But I doubt it's a case of simply needing 'so much tensor power' or whatever. I think there had been shown utilization of tensor cores at like 15% or something using frame generation on Lovelace.
 
Framegen was limited to 40-series because it required Ada’s upgraded optical flow analyzers to get good results (it ran on 30 series cards just not particularly well IIRC). If I had to guess there’s something similar going on here.

Honestly I would be very happy with a framgen that actually doubled fps in GPU limited scenarios. If a 5090 consistently can hit that I might consider getting one (a 25% raw upgrade isn’t especially compelling).

I am assuming the 4x framegen will only hit those numbers in CPU limited games.
I'm curious if the supposedly much faster frame generation this time around(the basic single frame generation) will help make 30fps->60fps viable now. I know this doesn't get much attention by PC gamers, but I really think that could be something of a huge deal, enabling gamers and developers to really push max graphics while still getting 'good' framerates with playable enough input response(especially with controller games).

Particularly for future consoles, even knowing they wont use Nvidia specifically, just knowing that such a capability is achievable would be pretty great. And not just consoles, but also future Nintendo hardware(post-Switch 2) and portable devices like Steam Deck and whatnot.
 
Seems pretty good overall, but I'm seeing a lot of flickery garbling of lettering, especially it seems when moving horizontally.

There's limited comparison footage to tell whether this is unique to the new DLSS or to the new frame generation or whatever, but it's very noticeable.

EDIT: Alex has commented on this elsewhere saying it's actually an issue of the capture?
 
Last edited:
I'm curious if the supposedly much faster frame generation this time around(the basic single frame generation) will help make 30fps->60fps viable now. I know this doesn't get much attention by PC gamers, but I really think that could be something of a huge deal, enabling gamers and developers to really push max graphics while still getting 'good' framerates with playable enough input response(especially with controller games).

Particularly for future consoles, even knowing they wont use Nvidia specifically, just knowing that such a capability is achievable would be pretty great. And not just consoles, but also future Nintendo hardware(post-Switch 2) and portable devices like Steam Deck and whatnot.
The issue with 30 to 60 fps has been less speed and more latency and artifacting. Reflex 2 could help ameliorate the former if it works with framegen.
 
The issue with 30 to 60 fps has been less speed and more latency and artifacting. Reflex 2 could help ameliorate the former if it works with framegen.
Well yea, I'm just kind of including Reflex2 as part of 'frame gen' since Nvidia requires it. The speeds up overall seem like it could maybe make it practical to use?
 
Exactly. FSR FG doesn't work well even in its current form. I'd imagine that going above +1 frame would just add a lot of "runt frames" which would improve only the fps readouts.
I was actually pleasantly surprised with how well FSRFG works, most of their problems are with upscaling imo.
 
RTX Mega Geometry intelligently updates clusters of triangles in batches on the GPU, reducing CPU overhead and increasing performance and image quality in ray traced scenes. RTX Mega Geometry is coming soon to the NVIDIA RTX Branch of Unreal Engine (NvRTX), so developers can use Nanite and fully ray trace every triangle in their projects. For developers using custom engines, RTX Mega Geometry will be available at the end of the month as an SDK to RTX Kit. Sign up to be notified of availability.
Wait, what? The BVH construction / sorting triangles into buckets was still heavily relying in CPU?... You'd have expected that they had figurred that out 5 years ago?

I mean, once you have the BVH updated, then actually rendering any arbitrary size of BHV is a well understood problem, especially once we understood in detail how the traversal "engine" is essentially just a glorified LRU cache that permits spatial queries over its contents to find a much better entry point estimate than the root node (and often turning simple occlusion queries "for free").
Framegen was limited to 40-series because it required Ada’s upgraded optical flow analyzers to get good results (it ran on 30 series cards just not particularly well IIRC). If I had to guess there’s something similar going on here.
Bingo. Framegen always goes hand in hand with an upgrade of the video decoding ASICs, and does not really depend on then tensor core performance. So this will most likely also scale almost effortlessly to the lower end parts of the 5xxx lineup.

3 frames worth of interpolation without noticeable stuttering will requires bicubic rather than bilinear interpolation though, so this most likely means that the optical flow analysis has gotten a 3rd frame (or a previous frames flow map) worth of input and now yields a derivative? Respectively it re-projects twice, which does still require a really accurate flow estimate in order not to get worse than the linear interpolation...
 
Wait, what? The BVH construction / sorting triangles into buckets was still heavily relying in CPU?... You'd have expected that they had figurred that out 5 years ago?

I mean, once you have the BVH updated, then actually rendering any arbitrary size of BHV is a well understood problem, especially once we understood in detail how the traversal "engine" is essentially just a glorified LRU cache that permits spatial queries over its contents to find a much better entry point estimate than the root node (and often turning simple occlusion queries "for free").

Bingo. Framegen always goes hand in hand with an upgrade of the video decoding ASICs, and does not really depend on then tensor core performance. So this will most likely also scale almost effortlessly to the lower end parts of the 5xxx lineup.

3 frames worth of interpolation without noticeable stuttering will requires bicubic rather than bilinear interpolation though, so this most likely means that the optical flow analysis has gotten a 3rd frame (or a previous frames flow map) worth of input and now yields a derivative? Respectively it re-projects twice, which does still require a really accurate flow estimate in order not to get worse than the linear interpolation...

Considering most game engines are not fully doing gpu-driven rendering yet, and maybe that wasn't even possible with DXR, it could be a case where the cpu is finally out of the loop.

In terms of frame gen, they're no longer using that optical flow accelerator. They have an ai transformer that does the optical flow analysis, so it's done on the tensor cores. There is hardware to help with frame pacing, which is maybe more necessary for multi-frame gen? Maybe the transformer just runs better on the new SMs. No idea. Or they've just decided to lock it to 50 series so people will buy them lol.
 
The BVH construction / sorting triangles into buckets was still heavily relying in CPU?...
Construction and sorting are GPU driven. They are talking about updates though which is a CPU heavy task. Moving that to the GPU should free up the CPU and make RT's CPU impact smaller.
At least this is how I understand what is being claimed.

Bingo. Framegen always goes hand in hand with an upgrade of the video decoding ASICs, and does not really depend on then tensor core performance. So this will most likely also scale almost effortlessly to the lower end parts of the 5xxx lineup.

3 frames worth of interpolation without noticeable stuttering will requires bicubic rather than bilinear interpolation though, so this most likely means that the optical flow analysis has gotten a 3rd frame (or a previous frames flow map) worth of input and now yields a derivative? Respectively it re-projects twice, which does still require a really accurate flow estimate in order not to get worse than the linear interpolation...
They specifically state that they've moved from h/w OFA unit to a AI OFA processing - which presumably means that it's done on tensor h/w now.
As for why it's Blackwell exclusive it is anyone's guess. Maybe the h/w flip control is actually a requirement for good framepacing here. Maybe the re-built SM with more tight integration of tensor h/w with shading units (whatever the hell that even means) is needed to be able to produce more than 1 generated frame quickly enough. Maybe they are just limiting the feature to the new series artificially.
 
Back
Top