Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
Not that wild. Nvidia generally doesn't release any performance numbers or even drivers prior to the announcement.
Yeah but the last two releases have had benches leaked months in advance. Maybe they made adjustments to how they handle roll out to partners and OEMs to keep a lid on things.
 
Perhaps new hardware blocks for "Neural rendering"?
No, that's just the tensor cores doing their usual business. If NVidia decides to make that Blackwell only, then it's a pure software restriction. I expect only a minor reduction in the overhead of switching between tensors, which is effectively equivalent to being able to switch between multiple classic textures. So possibly some memory management detail, to permit efficient arrays of tensors similar in a use case similar to array textures. The hidden (firmware internal) details usually deal with the prefetch logic necessary to have the right amount of data in L1 at the right point in time.
All videocards are capable of "neural rendering" and of course there will be "advanced DLSS" and "enhanced RT" on future products.
It's slightly more complicated than that. "Neural rendering" with a single texture from a deferred texturing style compute shader is really simple and brutally efficient since you don't even need to reload the expensive tensor that has the texture encoded, but only the short input vector from the G-buffer. But doing that for a setup with many textures needs some reasonably smart deferred batching by texture. You are trading effectively mip-chains and their very small L1 footprint (much waste in VRAM, very little in L1) for something that isn't even remotely as L1 cache friendly (but much friendlier on VRAM).

I suspect there's going to be a little trick for getting it fast enough even without aggressive batching.
No no, something is coming ... NVIDIA just released a new SDK for "In-Game Inference".





Red herring. Unless I'm mistaken, the motivation for that API doesn't appear to be rendering at all, but rather to enable efficient scheduling of generic inferencing tasks related to the game logic. Not so sure if that will even start to have any relevance for the next 2-3 years, especially considering that simple AI related inference doesn't need to be offloaded to the GPU in the first place. Plus hiding the details behind an opaque, proprietary interface sounds like a stillbirth which will at most hinder integration of costlier AI features such dynamic voice and dialogue synthesis, which is what we are probably going to see more tech demos for. Then again, those options are so VRAM hungry they don't fit the lower half of the product range either. (And if they do, they are mostly still cheap enough to run them somewhere on the CPU anyway.)
 
No, that's just the tensor cores doing their usual business. If NVidia decides to make that Blackwell only, then it's a pure software restriction. I expect only a minor reduction in the overhead of switching between tensors, which is effectively equivalent to being able to switch between multiple classic textures. So possibly some memory management detail, to permit efficient arrays of tensors similar in a use case similar to array textures. The hidden (firmware internal) details usually deal with the prefetch logic necessary to have the right amount of data in L1 at the right point in time.

It's slightly more complicated than that. "Neural rendering" with a single texture from a deferred texturing style compute shader is really simple and brutally efficient since you don't even need to reload the expensive tensor that has the texture encoded, but only the short input vector from the G-buffer. But doing that for a setup with many textures needs some reasonably smart deferred batching by texture. You are trading effectively mip-chains and their very small L1 footprint (much waste in VRAM, very little in L1) for something that isn't even remotely as L1 cache friendly (but much friendlier on VRAM).

I suspect there's going to be a little trick for getting it fast enough even without aggressive batching.

Red herring. Unless I'm mistaken, the motivation for that API doesn't appear to be rendering at all, but rather to enable efficient scheduling of generic inferencing tasks related to the game logic. Not so sure if that will even start to have any relevance for the next 2-3 years, especially considering that simple AI related inference doesn't need to be offloaded to the GPU in the first place. Plus hiding the details behind an opaque, proprietary interface sounds like a stillbirth which will at most hinder integration of costlier AI features such dynamic voice and dialogue synthesis, which is what we are probably going to see more tech demos for. Then again, those options are so VRAM hungry they don't fit the lower half of the product range either. (And if they do, they are mostly still cheap enough to run them somewhere on the CPU anyway.)
Do you think we will get anything new or just more generated frames/higher quality upscaling?
 
Yeah funny how it’s impossible to find a simple queue system at US retailers. It’s every man or woman for themselves. So uncivilized.

I had my heart set on a 5090 but $4000 would definitely make me think twice.
During the shortages AMD.com had a semi-functional queue system when they had their drops once a week on Thursday. That’s the only one I saw, EVGA had a different system where you joined a waitlist as well.
 
Not that wild. Nvidia generally doesn't release any performance numbers or even drivers prior to the announcement.
I remember having very accurate information on Ampere and Ada prior to the announcement. Even more information from the AMD side. It’s like crickets this time, feels like enthusiasm is rather low this time around.
 
I remember having very accurate information on Ampere and Ada prior to the announcement. Even more information from the AMD side. It’s like crickets this time, feels like enthusiasm is rather low this time around.

Keep in mind with Ada there was the Nvidia hack which happened. As such there was quite a bit of leaked data, which is why you had articles like this months out - https://semianalysis.com/2022/04/16/nvidia-ada-lovelace-leaked-specifications/

It was also very known that Ada would not be a significant departure architecturally from Ampere. What was more influx rumour price was the clock speed, power and of course specific SKU configuration.

In terms of Ampere the notable SM change with x2 FP32 was not known and caused some confusion in the rumour mill until very near release from what I remember.

With Blackwell it's somewhat similar. Just based on the CUDA versioning changes it's largely assumed that it will likely have signficant changes architecturally. This makes guesstimating numbers is likely going to be very tricky.
 
Last edited:
Back
Top