Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date

Blackwell is going to be the biggest jump in Nvidia history and has a denoising accelerator, according to this rumor from RedGamingTech. Also new SM architecture.

It seems like RGT does not know tensor cores are not in use for denoising at all, contrary to what Nvidia has had planned, which makes that rumor more creditable in my eyes.

Nvidia probably had no success in moving denoising to an AI neural network running on the tensor cores, which is why they are now working on a fixed function denoiser accelerator. Makes sense and is indeed very interesting!
 

Blackwell is going to be the biggest jump in Nvidia history and has a denoising accelerator, according to this rumor from RedGamingTech. Also new SM architecture.

It seems like RGT does not know tensor cores are not in use for denoising at all, contrary to what Nvidia has had planned, which makes that rumor more creditable in my eyes.

Nvidia probably had no success in moving denoising to an AI neural network running on the tensor cores, which is why they are now working on a fixed function denoiser accelerator. Makes sense and is indeed very interesting!
Rtg guy they are quoting is a fraud he doesn't know nothing
 
Nvidia probably had no success in moving denoising to an AI neural network running on the tensor cores
Nvidia is using AI denoiser in OptiX. It's slow, but suitable for offline renderers that cannot use the spatio-temporal denoiser.
And I don't know what does 'Denoising accelerator part of the RT pipeline' means since the denoiser works in the screen space pipeline.
 
I just hope blackwell bring more RT offloading. We see right now big cpu load with RT, not only gpu load, so if they can offload more and more to the gpu.... But can even DXR support more offloading ? With BVH structures setup done on the gpu, etc ...
 
Just as Ada and RDNA3 got released and refuted everything these Streamers have peddled for years, these hacks, charlatans, and anti-journalists are already onto their next fake news spree. Unbelievable.
 
Layman question time:

If rasterization as we know it today will be of less importance in the future, isn't a logical way forward to let performance growth there stagnate? The 4090 is already following that logic somewhat by improving RT performance more than raster.

In such a scenario - Would it make sense for nVidia (well, all GPU manufacturers) to simply grow RT on a chiplet (if possible) and contiously add what parts of the rasterization hardware we still require to the tensor cores over time. Shrinking raster hardware and growing more advanced RT cores generation by generation until we have a reverse of today - An RT focused chip with ancillary raster functions.

I just don't know if RT is independent enough of the rest of the pipeline yet, or can be made so, in order to make a chiplet approach feasible there.

But I like to speculate.
 
This is the guy who spent months claiming PS5's GPU had Infinity Cache.

But the same guy who knew about Infinity Cache before it was a thing. He got some good sources, but I guess, like a lot of youtuber, maybe he has 1 legit sources for 9 bad sources....
 
Layman question time:

If rasterization as we know it today will be of less importance in the future, isn't a logical way forward to let performance growth there stagnate? The 4090 is already following that logic somewhat by improving RT performance more than raster.

In such a scenario - Would it make sense for nVidia (well, all GPU manufacturers) to simply grow RT on a chiplet (if possible) and contiously add what parts of the rasterization hardware we still require to the tensor cores over time. Shrinking raster hardware and growing more advanced RT cores generation by generation until we have a reverse of today - An RT focused chip with ancillary raster functions.

I just don't know if RT is independent enough of the rest of the pipeline yet, or can be made so, in order to make a chiplet approach feasible there.

But I like to speculate.

Should probably clarify what you mean by raster. In most modern games the heaviest users of raster hardware are the GBuffer and depth passes, shadow mapping, compositing and maybe post-processing pixel shaders. Shadow maps are on the way out. I would be very surprised if any game is still using shadow maps or SSAO in the next console generation. Post-processing has moved on to compute shaders but GBuffers are still very useful for computing primary visibility even in heavy RT games.

While the relative importance of raster will slowly diminish I think it’s too early to slow down raster development. It would be cool to see hardware acceleration for pixel sized geometry and continuous LOD ala Nanite for example. I have low expectations though. There has been almost zero buzz around mesh shaders and no signs of rapid adoption. Maybe everyone is either going to use UE5 or doesn’t want to deal with building their own LOD system.

Btw an RT only chiplet doesn’t make sense because while RT doesn’t use rasterization hardware it heavily uses the same shaders and TMUs that rasterization does. Both RT and rasterization need to stay close to the general compute cores.
 
Shadow maps are on the way out.
Variance shadow maps in Unreal Engine 5 seem to be pretty crucial as I understand it.
While the relative importance of raster will slowly diminish I think it’s too early to slow down raster development. It would be cool to see hardware acceleration for pixel sized geometry and continuous LOD ala Nanite for example. I have low expectations though.
It would be fun to program an FPGA with the Nanite algorithm to see what happens.

There has been almost zero buzz around mesh shaders and no signs of rapid adoption.
I wonder if it's 5 to 10 years too late. We had a horrible diversion through GS then HS/TS/DS. On the other hand it could be argued that GPUs didn't have the compute density to make it work.

I've not seen a synthetic result with Mesh Shader that comes even vaguely close to what can be done with compute shaders, so it seems to be dead. I'm kinda bemused that AMD put extra effort into making it "work well" in RDNA 4 (in theory, anyway). But the design for that was completed two to three or so years ago, pre-dating Nanite to a degree.
 
Variance shadow maps in Unreal Engine 5 seem to be pretty crucial as I understand it.

It is but it VSMs suffer from the same limitations of all shadow mapping techniques and are restricted in the number of supported lights. Given what we’re seeing from Portal RTX today, those limits will hopefully look very dated in 7-8 years from now when next-gen exclusives start dropping.

I wonder if it's 5 to 10 years too late. We had a horrible diversion through GS then HS/TS/DS. On the other hand it could be argued that GPUs didn't have the compute density to make it work.

I've not seen a synthetic result with Mesh Shader that comes even vaguely close to what can be done with compute shaders, so it seems to be dead. I'm kinda bemused that AMD put extra effort into making it "work well" in RDNA 4 (in theory, anyway). But the design for that was completed two to three or so years ago, pre-dating Nanite to a degree.

Yeah mesh shaders do seem quite dead right out of the gate.

It’s hard to guess where Nvidia will go next with Blackwell. Frame generation took people by surprise. In hindsight SER was an “obvious” thing to do. Raw ray box/triangle intersection rates will continue to scale up. The next big problem seems to be BVH construction speed and flexibility so maybe they’ll do something about that.
 
Should probably clarify what you mean by raster. In most modern games the heaviest users of raster hardware are the GBuffer and depth passes, shadow mapping, compositing and maybe post-processing pixel shaders. Shadow maps are on the way out. I would be very surprised if any game is still using shadow maps or SSAO in the next console generation. Post-processing has moved on to compute shaders but GBuffers are still very useful for computing primary visibility even in heavy RT games.

[...]

Btw an RT only chiplet doesn’t make sense because while RT doesn’t use rasterization hardware it heavily uses the same shaders and TMUs that rasterization does. Both RT and rasterization need to stay close to the general compute cores.
At some point I just assume a sort of "minimum viable raster" level of hardware has to be achieved for compatibility or specific functionality in order to free up transistors and development for all things RT. I don't see building enormous monolithic cores, growing to expand both raster and RT functionality/performance, as viable forever. Unless cards in the future are going to end up costing as much as a second hand car (which some already do, come to think of it...)

It's what that transition looks like that I'm interested in. What if the chiplets with RT hardware were expanded with what they will need in the future? Such as general compute hardware? Letting the two increasingly live separate lives? Though in the short term that might mean lots of wasted idle silicon depending on workload...
 
At some point I just assume a sort of "minimum viable raster" level of hardware has to be achieved for compatibility or specific functionality in order to free up transistors and development for all things RT. I don't see building enormous monolithic cores, growing to expand both raster and RT functionality/performance, as viable forever. Unless cards in the future are going to end up costing as much as a second hand car (which some already do, come to think of it...)

It's what that transition looks like that I'm interested in. What if the chiplets with RT hardware were expanded with what they will need in the future? Such as general compute hardware? Letting the two increasingly live separate lives? Though in the short term that might mean lots of wasted idle silicon depending on workload...

It doesn’t need to be that complicated. The IHVs can simply increase the number of SMs/CUs faster than the number of GPCs/Shader Engines. It doesn’t make sense to have compute units that are only accessible when doing RT.
 
I would not be so categorical. The advantage of mesh shaders is not faster setup rates in synthetic benchmarks, but rather the lack of roundtrip through the memory, which can result into faster rates in memory bound applications such as the gbuffer fill in real games.

I’m going to quote Timur’s blog here. Mesh shaders have potential to do really good things but it requires work and talent. After seeing what transpired with DX12 I’m not hopeful. There are no signs of adoption at all from any of the major engines.

Yes, mesh and task shaders do give you a lot of opportunities to implement things just the way you like them without the stupid hardware getting in your way, but as with any low-level tools, this also means that you get a lot of possibities for shooting yourself in the foot.

The traditional vertex processing pipeline has been around for so long that on most hardware it’s extremely well optimized because the drivers do a lot of optimization work for you. Therefore, just because an app uses mesh shaders, doesn’t automatically mean that it’s going to be faster or better in any way. It’s only worth it if you are willing to do it well.
 
Back
Top