Speculation and Rumors: Nvidia Blackwell ...

For those interested, here are the graphics settings TPU used for those benches: Click Me. Cliff's notes: no DLSS super-resolution, no dynamic resolution scaling, all advanced settings to high but the "basic" settings were off (motion blur, chromatic aberration, film grain, DoF, and lens flare.) They enabled DLAA but kept native resolution, which is an interesting decision?

The bump in memory consumption from just FrameGen is a curious finding to be sure. We already know FG reprojects based on a lot more data than just the framebuffer and also keeps a few prior frames of this data in buffer. It looks like this is more data than we might have anticipated, but a lot of that data is also scene dependant. I wish TPU would have used FrameGen on other resolutions "just to see"...

Edit: Hey, tell ya what -- I'll give this a shot tonight on my home rig (AMD 5950X PBO'd out, 64G 3800 1:1 CL14 ram, 4090OC.) I can emulate most of those findings, although I don't have a 4K monitor. I'll use the NV driver settings to allow supersampling (via the simplified non-DLAA model) and see if I can drum up some VRAM figures like the above. Dunno if it will be precisely accurate, but it wouldn't stop me from trying.

Edit #2: That TPU article is from the press review copy of CP2077, dated September 2023. No way to know for certain, but I wonder if newer drivers and/or updated game patches may have reduced VRAM consumption?
 
Last edited:
What's the best way to tell how much VRAM is in use?

Just remembered I do have a 4K monitor but I doubt a 12GB 4070 will be sufficient to test at 4K.
 
Think if you use a GPU tool like GPUTweak III or Afterburner it will record the amount used over time. Both tools work with any card.
 
Yeah, anything that uses the RivaTuner Statistics Server can tell you, like MSI Afterburner.
 
I'm testing and I've discovered a bug in Hellblade 2. Every time I pause and unpause the game VRAM use goes up by ~50MB. I did this until the entire 12GB filled up and the game because a slideshow. Lol.

I'm editing this in realtime. The bug only occurs with framegen on. Disabling framegen makes it go back down, but if you turn framegen back on VRAM goes back up to the bugged level until you restart the game. If this happens in other games it explains why I had VRAM problems in Hogwart's Legacy and Witcher 3 with framegen turned on. I'm go test those games.
 
Last edited:
Despite the bug, Hellblade 2 is the only game I can get reproducible numbers from. Witcher 3 and Hogwarts are all over the place no matter what I do. But they don't seem to suffer from the pause/unpause bug.

In Hellblade 2 at 1080p DLAA everything max, framegen uses 250-300MB. Only way to get good numbers for framegen on is to turn it on and then restart the game. Otherwise like I said it just balloons by 50MB every time you pause and unpause.

Some numbers I got were:

6975MB FG on
6739MB FG off

7100MB FG on
6800MB FG off

6660MB FG on
6358MB FG off

All the tests showed a difference of ~250-300MB. So at 8K that would be between 4 and 4.8GB. Maybe it varies based on the game. I will say this kind of testing is not very repeatable in terms of the absolute amount of memory being used. It would change every time I restarted the game, but the difference between FG on vs FG off was fairly consistent.
 
The bump in memory consumption from just FrameGen is a curious finding to be sure. We already know FG reprojects based on a lot more data than just the framebuffer and also keeps a few prior frames of this data in buffer. It looks like this is more data than we might have anticipated, but a lot of that data is also scene dependant. I wish TPU would have used FrameGen on other resolutions "just to see"...

They don't really keep several frames, it's all just one averaged frame from previous with an accumulation factor, which is like "this is 8 frame accumulated, this is 2, this is 9" and that's a pretty low storage cost, and then this frames input.

Regardless, I could see an argument for 20gb, or even 24 maybe? Does Frontiers of Pandora use more than 16gb? If the BVH for unobtanium is more detailed it could easily take up a few more gigs. Native 8k could also get pretty expensive without upscaling, but even on the highest end 4k still seems like the vast majority of users. On one hand that amount of RAM probably isn't very useful, on the other this would target the highest end consumers so what do they care about paying a bit more?
 
All the tests showed a difference of ~250-300MB. So at 8K that would be between 4 and 4.8GB. Maybe it varies based on the game. I will say this kind of testing is not very repeatable in terms of the absolute amount of memory being used. It would change every time I restarted the game, but the difference between FG on vs FG off was fairly consistent.
Eh, I'd challenge this on the face of it. There's no way to actually know if there's a perfectly linear growth in VRAM consumption based only on output resolution. You would need to test with multiple resolutions to understand any potential correlation...
 
Eh, I'd challenge this on the face of it. There's no way to actually know if there's a perfectly linear growth in VRAM consumption based only on output resolution. You would need to test with multiple resolutions to understand any potential correlation...
More results. Max settings with DLAA. My methodology is to make sure framegen is enabled and set the desired resolution, then restart the game. Press Continue and let it load my scene, then wait ~30secs for VRAM to stabilize (it goes down for the first 30secs then levels off). Record the VRAM, then go to options, disable framegen, wait a few seconds (it doesn't really matter, once the VRAM settles initially it stays settled) and record the VRAM. For new test runs I have to turn framegen back on and restart the game to avoid the bug.

4K (3840x2160 DSR scaled to 1080p, desktop resolution set to 4K with DSR):
Run 1:
10784MB FG On
10124MB FG Off

Run 2:
10902MB FG On
10154MB FG Off

Edit: I did more tests. 1080p tested with desktop resolution set to 1080p, 720p tested with desktop resolution set to 720p. "Fullscreen" in this game doesn't mean what it used to mean, so the desktop resolutions will affect the results.

1080p (with 1080p desktop resolution):
Run 1:
7344MB FG On
7121MB FG Off

Run 2:
7290MB FG On
7067MBFG Off

720p (with 720p desktop resolution):
Run 1:
6699MB FG On
6583MB FG Off

Run 2:
6655MB FG On
6524MB FG Off

On both 1080p runs I got exactly 223MB higher when using framegen. At 720p it was 116MB and 131MB. I can't test higher resolutions but at least from 720p to 1080p it scales linearly. If my 4K results are valid, that scales a bit better than linearly with 748MB and 660MB used.
 
Last edited:
Maybe it varies based on the game.
There may be some variation of course but generally it's as I've said - around +1GB in 4K.
No idea how results from TPU are measured but I think I have CP2077 installed so can check...

Edit: So a scene inside NC at the start of Nocturne mission, native 4K, everything maxed (and as such unplayable on a 4090 mind you):
RT Ultra: 12.4GB -> 13.8GB (42 -> 72 fps)
PT: 15GB -> 16.2GB (24 -> 46 fps)
So a tad more than 1GB in CP2077 case but nowhere close to the results from TPU.
It is also worth remembering that this is completely unrestricted scenario with game just loading as much as it can into VRAM. Doesn't mean that its actually require as much to work without issues.
 
Last edited:
They don't really keep several frames, it's all just one averaged frame from previous with an accumulation factor, which is like "this is 8 frame accumulated, this is 2, this is 9" and that's a pretty low storage cost, and then this frames input.

Good point.

Regardless, I could see an argument for 20gb, or even 24 maybe? Does Frontiers of Pandora use more than 16gb? If the BVH for unobtanium is more detailed it could easily take up a few more gigs. Native 8k could also get pretty expensive without upscaling, but even on the highest end 4k still seems like the vast majority of users. On one hand that amount of RAM probably isn't very useful, on the other this would target the highest end consumers so what do they care about paying a bit more?

Nvidia is likely more concerned with protecting its high margin AI sales than with peddling 8K to consumers. They aren’t going to put more memory on GeForce than is absolutely necessary.
 
There may be some variation of course but generally it's as I've said - around +1GB in 4K.
No idea how results from TPU are measured but I think I have CP2077 installed so can check...

Edit: So a scene inside NC at the start of Nocturne mission, native 4K, everything maxed (and as such unplayable on a 4090 mind you):
RT Ultra: 12.4GB -> 13.8GB (42 -> 72 fps)
PT: 15GB -> 16.2GB (24 -> 46 fps)
So a tad more than 1GB in CP2077 case but nowhere close to the results from TPU.
It is also worth remembering that this is completely unrestricted scenario with game just loading as much as it can into VRAM. Doesn't mean that its actually require as much to work without issues.
Yep I was getting actually <1GB for framegen at 4K in Hellblade 2. I've edited my post to be less rambling.
 
Sorry, I got caught up last night in some family business. Even at 4x DSR I still can't quite hit 8K resolutions, so I ran eight total scenarios in the CP2077 ingame benchmark for our data seeking pleasures:

Raster ResolutionDLSS Framegen Enabled?DLSS Quality Upscaling Enabled?Benchmark Avg FPSMax VRAM (MB)Notes
6880x2880 (DSR x 4)NoNo10.3224,576+VRAM Exceeded
6880x2880YesNo3.9824,576+VRAM Exceeded, worse framerate
6880x2880NoYes22.8616,907
6880x2880YesYes29.3419,685+2,778MB for 28% more FPS??
4865x2036 (DSR x 2)NoNo19.7816,753
4865x2036YesNo35.3118,414+1,661MB for 79% more FPS
4865x2036NoYes38.8214,038
4865x2036YesYes64.3815,585+1,547MB for 66% more FPS

I made my graphics settings match that of TPU above, and as you can see, 8K resolution isn't reachable in native rez with "only" 24GB of ram. If you want to run an 8K screen at extreme presets, you'll need to enable DLSS to save memory. And at that point, DLSS + FG is still within the limits of a 24GB card -- albeit, at around 30FPS on a 4090 which steadily holds 2850MHz during the benchmark (remarkably above what a "stock" card would do.)

The pixel count for the 6880 rez is 19,814,400
The pixel count for the 4865 rez is 9,905,140

With DLSS Quality enabled, and with 100.4% more pixels, the FG memory requirement only went up by 79.6%. It's not quite linear in consumption, and I'm going to go out on a limb and say anyone running 8K with framegen probably isn't going to do it in pure native screen rez. I think 8K is an easily-achieved target in Blackwell, so long as it's with DLSS upscaling enabled.
 
That's a bit of a moot achievement though as Nvidia was marketing such "8K" back at Ampere launch IIRC. DLSS UP mode was made specifically for that.
Of course they did, and with the asterisks pointing to the fine print about what kinds of lame settings you'd need to achieve 8K at whatever they determined was "playable FPS." Truly and honestly, the ~30FPS result at the 6880 resolution with my 4090 running at absolute full tilt was probably playable enough, so long as you aren't in a twitch shooter game. Clearly, 8K at 60FPS or higher is still far from guaranteed with any fully pathtraced title using current hardware, so it gives Blackwell something to reach for.

24GB of VRAM seems the lowest quantity they could reasonably allocate for this performance target.
 
Hey, so an addendum to my prior benchmarks post, but something I didn't want to edit in as I felt it was worth pointing out...

FHD (1920 x 1080) is 2,073,600 pixels
4K (3840 x 2160) is 8,294,400 pixels
My WQHD screen at DSR x4 (6880x2880) is 19,814,400 pixels
8K (7680 x 4320) is 33,177,600 pixels

Looking at CP2077 VRAM scaling above in pure raster (no DLSS/FG) we would be somewhere north of 32GB of VRAM for a native 8K resolution; 24GB would not cut it at all. If we think Blackwell is really going for 8K, then a 32GB VRAM pool is probably on the docket as well. So I'm changing my vote: the 5090-equivalent tier is gonna be at least 28GB and perhaps even 32GB.
 
Foxconn has won exclusivity on manufacturing NVIDIA's new NVLink switches for its next-gen GB200 AI servers, a key component of GB200, which is known as the "magic weapon for improving computing power".
...
Foxconn has never commented on its orders or customer dynamics, but the industry has highlighted that the exclusive technology behind NVIDIA's new NVLink consists of two parts. First is the bridge technology, which connects the CPU and the AI chips together, while the other is the switch technology, which is the key to the interconnection between the GPUs.
...
Foxconn has been a leading OEM for network communication equipment, so NVIDIA "naturally handed" over the relevant orders to the company. Supply chain sources told UDN that just a single GB200 AI server cabinet requires 7 x NVLinks, meaning that the number of GB200 AI server cabinets that Foxconn receives is 7x the order volume of NVLink switches.l

Due to the gross margin ratio of the NVLink switches, the server assembly business is "much higher" and will "significantly affect" Foxconn's operations. It's also reported that the world's top 7 switch manufacturers -- Dell, HP, Cisco, Nokia, Ericsson, and more -- are all customers of Foxconn, meaning that Foxconn's global switch market share is over 75%, making them the leader in network switch manufacturing.
 
I'd love to see a document which describes how an NVLink switch really works, through the entire OSI model (or at least the relevant layers anyway.)
 
Hey, so an addendum to my prior benchmarks post, but something I didn't want to edit in as I felt it was worth pointing out...

FHD (1920 x 1080) is 2,073,600 pixels
4K (3840 x 2160) is 8,294,400 pixels
My WQHD screen at DSR x4 (6880x2880) is 19,814,400 pixels
8K (7680 x 4320) is 33,177,600 pixels

Looking at CP2077 VRAM scaling above in pure raster (no DLSS/FG) we would be somewhere north of 32GB of VRAM for a native 8K resolution; 24GB would not cut it at all. If we think Blackwell is really going for 8K, then a 32GB VRAM pool is probably on the docket as well. So I'm changing my vote: the 5090-equivalent tier is gonna be at least 28GB and perhaps even 32GB.
Could you notice any difference when running at 6880x2880? I know it's DSR but maybe there's less aliasing?
 
Could you notice any difference when running at 6880x2880? I know it's DSR but maybe there's less aliasing?
I didn't pay enough attention, honestly. One of my biggest graphical gripes is specular highlight aliasing, which DSR and DSR-AI (or whatever it's called) never solved to my satisfaction. I think RT reflections kinda "solve" as an innate function of getting light bounces more accurately computed. As such, I don't recall having any specific complaints about aliasing in CP2077 with pathtracing enabled, DLSS-Quality, and DLSS-FG.
 
Back
Top