Digital Foundry Article Technical Discussion [2025]

a setting like this could be the solution to games' optimisation, or letting ANYONE play a modern game, in toaster mode (or everybody's mode). This is a cool mod for Kingdom Come Deliverance 2 if you don't care if it looks like a half-finished game.


57-1738708977-52014819.png


57-1738708982-909138401.png


57-1738708981-317185673.png


57-1738708978-82654230.png


57-1738709847-217642698.png

This mod sets the graphics extremely low, allowing you to maximize the game's performance, at the cost of graphical fidelity. This allows you to run the game at acceptable frame rates on systems with much lower specifications than those minimal required by the game.

Resolution is brute forced to HD (1280x720) and VSync is off. If you want to change the resolution, you can do so at the top of the mod file, using Notepad. By changing various values in the mod file (or simply just by deleting individual lines of the file) you can easily bring the look of the game back up, closer to normal "Low" settings. If you want the game to retain its original look, try the Very Low Graphics Mode instead.

The mod doesn't interfere with or change DLSS/FSR settings, although I'd suggest leaving those off - the base resolution is already set very low and these technologies typically do not help much at such low fidelity/resolution anyway.
 
Last edited:
I love that it looks like Kings Quest 8 but with self shadows and the wagons are high detail objects
 
I wonder how much work this mega geometry updated tor Alan Wake required on the developer side. Really makes you wonder how much perf we are leaving on the table due to poor API design in general.
 
Microsoft primarily I would imagine. I’m guessing this is a dig at AMD/Mantle but i don’t think that holds much merit.
It could be a dig at Team Green.
"DirectX Raytracing is the latest example of Microsoft's commitment toward enabling developers to create incredible experiences using cutting-edge graphics innovations," said Max McMullen, development manager of Windows Graphics and AI at Microsoft. "Our close partnership with NVIDIA on DirectX Raytracing and NVIDIA's RTX technology brings real-time ray tracing closer than ever for millions of gamers on Windows."

You can't really blame DXR 1.0/1.1 for not taking into account Nanite and other next-gen meshlet renderers that wouldn't release until years later though.
 
@raytracingfan

It reads like simple marketing. Given how common it has been over the decades for DX to map poorly onto GPUs, I have my doubts as to how much of the API is designed by either IHV and how much their input really factors in.
 
Digital Foundry's deep dive on every thing related to Blackwell.

0:01:08 Blackwell architecture: Blackwell SM design
0:03:39 Updated Tensor Core and RT Core
0:09:37 AI Management Processor and Max Q power management
0:15:13 Display Engine and video encode/decode
0:18:32 Hardware specs: RTX 5080 and 5090
0:29:16 RTX 5070 and 5070 Ti
0:34:23 RTX 50 Series laptops
0:40:14 RTX Software: Neural Shaders
0:44:24 SER 2.0
0:47:06 RTX Mega Geometry
0:50:15 RTX Hair
0:53:13 Neural Radiance Cache
0:55:58 RTX Skin
0:58:01 DLSS 4: Super Resolution, Ray Reconstruction, Frame Generation
1:06:18 Generative AI demos
1:10:03 Wrap up discussion: RTX 50 series price and performance
1:17:36 Nvidia’s software package and features
1:22:38 How should we review graphics hardware?

Jensen Huang said during the RTX 5000 presentation that: "This GPU can predict the future". 🙂

I also do remember watching a DF video and saying that it surprised me when FGx4 on nVidia GPUs showed less input lag than native in the 2nd and 3rd generated frame.

Thanks to an interview from Plasma TV For Gaming (Ariel) with a member of the team who created Lossless Scaling, now I understand why both things are true. The interview is really interesting. 🙂

52:50 here they talk about how nVidia and Intel are doing normal FG now but they are going to make ONLY generated frames in the future -no rendered frames by the GPU anymore, the GPU is just going to provide the data of the frame, not the final render -said render can cause framepacing issues-.

Intel has a lot of documents and research papers where they show future frames will be all extrapolated. For now they just released normal frame generation. But aside from the papers and developer talks, Intel and nVidia are working on FG using only extrapolation.

50:57 Having only 100% generated frames, which means full control over the pacing, is what nVidia wants in the future, with the GPU only providing the data of the frame but NOT rendering the frame -rendering the frame creates framepacing issues-. That'd totally eliminate judder.

nVidia is kind of experimenting with it right now, their current technology is a hybrid between interpolation and extrapolation, 'cos the first frame is interpolated, and the next two frames are being extrapolated.

Eventually all of them are going to be extrapolated. (this explains the decreased latency numbers from the 2nd and 3rd frame in the DF video)

53:45 here he explains the difference between interpolation and extrapolation.

He details the important difference with interpolated frames; the GPU renders the previous frame, then the next frame, which they hold -this creates lag 'cos they are holding a frame while the generated frames are being created-, and place a new, generated frame in the middle. This leads to less artifacts, plus they have all the motion vector data so they can insert like a perfect frame in betweeen frames. This is currently the most used tech.

For extrapolated frames, as soon as the current frame is done rendering, you render the generated frame instantly, which makes you get less input lag, it's more like a real frame.

With current FG, the frame looks smoother like you are at a higher framerate, but it doesn't feel like a higher framerate. Extrapolation fixes that.

55:00 Lossless Scaling uses extrapolation, 'cos they can't obviously hold the next frame since they can't access the internal framebuffer via drivers. So no input lag, but any input lag is caused by the communication with the Windows API they use to render the frames.

56:55 on Reflex 2 and asynchronous reprojection. The future, that's where they are heading.

58:05 on Smooth Motion and the amazing power of tensor cores that don't use the actual GPU that is running the games, to work and how it seems like free performance because of that, but they are used to so many things nowadays -FG, Ray reconstruction, Super Resolution etc- that you must be very careful using them to avoid performance issues, they are so busy by themselves.

That's why nVidia has moved to not using Tensor Cores for certain tasks -optical flow, more on that below-.

59:49 he explains the challenges of using FSR FG. Sometimes using FSR FG can lead to not gaining frames at all, and even losing frames vs native frames if you use it. That's because FSR FG uses async calls to work, but many games are very async heavy using a lot of async calls themselves, so certain GPUs can stall if you use it.

1:04:00 Optical flow and async compute and tensor cores.

This is the full video:

Chat with a developer Hybred about CRT simulation, LosslessScaling, MFG, and more


btw, learnt today that Jensen Huang was born the same day as one of my heroes, imho the best sportsman my eyes have seen, Michael Jordan.
 
Last edited:
1:23:50 he also mentions that even if you are rendering things at native 4K nowadays, you aren't rendering at native 4K. And he explains why.

The developer that talks to him is working in this videogame.


 
Last edited:
@Cyan I’m not sure this developer knows how deferred rendering works, or they’re just trying way too hard to simplify it which is making the answer nonsensical.
 
@Cyan I’m not sure this developer knows how deferred rendering works, or they’re just trying way too hard to simplify it which is making the answer nonsensical.
do you mean in the most recent post where I share what he mentions about 4K?

I guess he is trying to simplify it, but I'm still watching the rest of the video.

Afaik as I understand it, forward rendering works rendering the whole image, but it can't calculate things like occlusion or whether an object is closer to the camera or not -compared to other objects- or when an object is behind another. Deferred rendering kinda fixes that, but that's the reason why MSAA doesn't work on it. This is how I understand it, not how he explains it. Don't want to digress.

Still am in this exact moment when the developer talks about DF and how would be the ideal channel to talk about certain things.

2:12:20

Transcribing details of a video takes time, kinda overwhelming.

He is developing his game in UE4 to avoid stutters btw.
 
@Cyan Deferred rendering calculates lighting in a separate pass. The reason this is desirable is because of the way pixel shaders are executed by a gpu. They do not run against individual pixels, but pixel quads (groups of 4 pixels 2x2). I'm not an expert, but deferring lighting guarantees that the lighting calculations will only run once per pixel, where in a forward rendering the lighting can be calculated up to 4 times per pixel, depending on how a triangle covers a quad. In a forward renderer, if you have four different triangles covering each pixel in a quad, you'll end up running the pixel shader 4 times for each pixel in the quad, so 16 shader executions for that quad of four pixels. Basically if a texture is sampled in a pixel shader, and the triangle only covers one pixel in the quad, the other three pixels have to be evaluated as well as "helpers" to determine which mipmap level of the texture to use.

Complex lighting ends up having inconsistent performance because it can worsen depending on quad utilization. Small triangles are the primary culprit, but you are guaranteed to have quads that do not have perfect quad utilization regardless. A deferred rendering is guaranteed to only execute the lighting once per pixel in a quad regardless of quad coverage/utilization, because it does not have to sample textures. The texture data that's relevant is written out to a gbuffer in a first pass, so mipmap selection is no longer relevant (and there's something about full-screen quads or compute shaders here that I don't fully know). That makes lighting perform better regardless of small triangles etc.

If MSAA is enabled, I believe it will operate on the first pass in a deferred renderer where the textures are sampled and the gbuffer is written. The problem is the lighting pass then runs, which can create aliasing, as well as any other post-processing or shader effects that run after the gbuffer is created.

I'm sure a pro in here can correct any mistakes I've made, but that's the general idea.
 
Last edited:
@Cyan Deferred rendering calculates lighting in a separate pass. The reason this is desirable is because of the way pixel shaders are executed by a gpu. They do not run against individual pixels, but pixel quads (groups of 4 pixels 2x2). I'm not an expert, but deferring lighting guarantees that the lighting calculations will only run once per pixel, where in a forward rendering the lighting can be calculated up to 4 times per pixel, depending on how a triangle covers a quad. In a forward renderer, if you have four different triangles covering each pixel in a quad, you'll end up running the pixel shader 4 times for each pixel in the quad, so 16 shader executions for that quad of four pixels. Basically if a texture is sampled in a pixel shader, and the triangle only covers one pixel in the quad, the other three pixels have to be evaluated as well as "helpers" to determine which mipmap level of the texture to use.

Complex lighting ends up having inconsistent performance because it can worsen depending on quad utilization. Small triangles are the primary culprit, but you are guaranteed to have quads that do not have perfect quad utilization regardless. A deferred rendering is guaranteed to only execute the lighting once per pixel in a quad regardless of quad coverage/utilization, because it does not have to sample textures. The texture data that's relevant is written out to a gbuffer in a first pass, so mipmap selection is no longer relevant. That makes lighting perform better regardless of small triangles etc.

If MSAA is enabled, I believe it will operate on the first pass in a deferred renderer where the textures are sampled and the gbuffer is written. The problem is the lighting pass then runs, which can create aliasing, as well as any other post-processing or shader effects that run after the gbuffer is created.

I'm sure a pro in here can correct any mistakes I've made, but that's the general idea.
thanks for the detailed explanation Scott. You don't need to be an expert, although we here dream to be one, but you sure have quite the advanced knowledge and you seem to know your stuff. That's a good thing.

Still, you can become an expert in something out of passion. Sure you can study, let's say, how to grow potatoes, and be the person with the best theoretical knowledge about that in the entire world.

But if you don't plant them yourself, you are going to be missing what isn't theoretical.

Same about a biologist who knows a lot about snakes. But until they don't go and capture them to study or even has some "incidents" with them, the knowledge is going to be limited.

It's late and I feel too drowsy and dense now to explain things. But I felt particularly proud when I noticed, out of use, how certain app wasn't doing any upscaling to the generated frames, 'cos the image looked crisp all the time no matter what. Or when I experienced myself what having a projector feels like, and now how 360fps feel, and it starts to get to another level from there on. I can't imagine how great it's going to be when 750Hz or more screens become the norm, such smoothness and softness to the image....

Well, no more metaphors. Kinda done with the shaolin master teachings. 😁 Sleep time for me. Cheers
 
@Cyan I'm not an expert either, but I'll add that another key performance advantage for deferred rendering, in addition to quad utilization, is that you don't calculate lighting for occluded geometry - triangles that aren't visible on the screen because other geometry is in between it and the camera. In a perfect world, the GPU would only draw each pixel in a frame once, but quad utilization issues and drawing occluded geometry both cause overdraw - which is where the GPU wastes work by drawing to a pixel multiple times. With forward rendering, overdraw makes the GPU do the full amount of shading work including lighting calculations extra times, which is all wasted. With deferred rendering, overdraw makes the GPU draw to the g-buffer multiple times, which is still a waste, but much less of a waste because drawing to the g-buffer is much cheaper than fully shading the pixel. Z-testing (not drawing pixels when you've already drawn to that pixel with a depth value closer to the camera) and occlusion culling (not drawing meshes or meshlets that are completely obscured) can mitigate overdraw from occluded geometry in forward rendering (and deferred renderers use these techniques too because drawing to the g-buffer isn't free) but they aren't perfect, so the less work that is wasted by overdraw, the better.

The modern forward renderers used today are mostly Forward+ renderers that have their own techniques for mitigating wasted work.
 
is that you don't calculate lighting for occluded geometry - triangles that aren't visible on the screen because other geometry is in between it and the camera
thanks! That's what I'd like to explain but didn't find the words nor I was sure. As I understand it, forward rendering draws the whole scene like a 2D painting, so to say. Everything is calculated and included, even objects that aren't visible but are in the scene -i.e. a car behind another car or something behind a

Does Forward+ fix that?

This image can explain what I mean, the cars behind the column, or some cars occluding other cars, are all rendered entirely in a forward rendered pass, it paints the whole scene, what you see and don't see as a 2D painting, thus MSAA can work with it without that much of a cost. Excuse me if I'm wrong. Am I right?

Resident_Evil_2_Parking_Garage_1__0045_Layer_13.jpg
 
thanks! That's what I'd like to explain but didn't find the words nor I was sure. As I understand it, forward rendering draws the whole scene like a 2D painting, so to say. Everything is calculated and included, even objects that aren't visible but are in the scene -i.e. a car behind another car or something behind a

Does Forward+ fix that?

This image can explain what I mean, the cars behind the column, or some cars occluding other cars, are all rendered entirely in a forward rendered pass, it paints the whole scene, what you see and don't see as a 2D painting, thus MSAA can work with it without that much of a cost. Excuse me if I'm wrong. Am I right?

Resident_Evil_2_Parking_Garage_1__0045_Layer_13.jpg
My understanding is that trying to use MSAA in a deferred renderer will result in full scene supersampling by default. You have to do some Complicated Stuff* to identify which samples are redundant. Compare this to normal MSAA where you can easily run the shader once per covered primitive and there is dedicated hardware to facilitate this.

*Complicated Stuff:

During the PS3/X360 era when deferred rendering became popular this was a huge problem. Most UE3 PC ports didn't have any AA support although I think PC GPUs by then (G80+) could technically do the Complicated Stuff. Very few devs seemed concerned enough to address this. This is when I made my signature.
 
Excuse me if I'm wrong. Am I right?
Not exactly no. Any of those pixels for any of those objects could be rendererd (this is commonly called "overdraw") but many (perhaps most!) of the occluded pixels will not be. Assuming you're writing to the depth buffer, you only shade a pixel (for a given object) if you haven't already shaded a pixel which is closer to the camera. (This is called "depth testing".) Generally, objects are sorted (with some errors, or some objects which are very large and have occluded portions sorted 'ahead' of other objects) so you render something like that pillar first, and then the for the car you can reject all of the pixels which are behind the pillar.

Sometimes you'll also see "early-Z" (or "early-depth" in non-jargon-speak), where everything is rendered to depth only first, which is extremely cheap for the pixel shader and makes overdraw less of a concern, and then you get perfect depth rejection for the "real" work. This has different tradeoffs though (potentially more scheduling complications, a whole duplicate set of draw calls, different implications with other techniques, etc.)
 
Last edited:

DF Direct Weekly #200! Switch 2 Pricing, The NEW Battlefield, Monster Hunter Wilds Perf Concerns!​


0:00:00 Introduction
0:02:07 News 1: DF Direct celebrates episode 200!
0:32:01 News 2: Nintendo sets Switch 2 price expectations
0:46:17 News 3: Next Battlefield shown in brief teaser
0:55:45 News 4: Monster Hunter Wilds beta, benchmark tested
1:09:05 News 5: RTX 5080 overclocking tested
1:26:52 News 6: Epic admits Unreal Engine has a #StutterStruggle problem
1:34:17 News 7: Star Wars Outlaws updated with new PSSR version
1:41:07 Supporter Q1: Could you test the 5090 with no AI features vs. the 5080 with all AI features?
1:47:23 Supporter Q2: Could DF establish a latency rating system?
1:54:41 Supporter Q3: Would increasing resolution and AA quality be enough for Nintendo Switch 2 games?
1:57:53 Supporter Q4: Is a Bloodborne remaster imminent?
2:03:32 Supporter Q5: Could 1080p actually be preferable to 4K for Switch 2 games?
 
Back
Top