CES 2025 Thread (AMD, Intel, Nvidia, and others!)

iroboto · Jan 16, 2025

Albuquerque said:
Awesome excel chart Also, just wanted to put a very fine tip on your statement here:

What @iroboto is explaining here, in the bolded comment, is a blur to your human eye. This isn't to suggest or even insinuate the frames themselves are blurry, in fact they'll be equally well detailed as the past two rasterized frames.

That detail also doesn't suggest or insinuate they'll be exactly correct in terms of object permanence, or viewport-edge object visibility, or similar possible types of artifacts stemming from motion estimation and projection which bleeds either into or out of the viewable frustum. Still, the generated frames will absolutely have detail.

yea, with such a high refresh rate, motion clarity will be exceptionally good, provided your monitor can display it properly.

Albuquerque · Jan 16, 2025

DegustatoR said:
FG works by interpolating between two rendered frames. Their frametimes are known in advance so it is easy to calculate the point(s) between them where a generated frame(s) should be inserted.

This doesn't match my understanding of DLSS:FG, although it does match FSR:FG and the software Lossless Scaling version 2 (but not version 3) products.

If what you described were true, maximum latency with FG enabled would be two full rasterized frames, because we're always holding that second rastered frame long enough to then create and display all the intermediate frames. Meaning that if we were at 60FPS base rasterization, we would hold the 16.7msec next frame in order to generate all the other intermediate frames (another 16.7msec worth) before we could then finally display the "real" rastered frame that we had to sit on. THat doesn't jive with the input latency data we have.

Also, motion vecvtor data and depth data would be pointless if you already had the source and destination "real" frames and only had to generate the middle parts. That's just pure interpolation and it's been done for decades on TVs.

No, my understanding is FG is entirely a projection based on prior frames and the combination of motion vector and depth data from the last "real" frame.

DegustatoR · Jan 16, 2025

Albuquerque said:
If what you described were true, maximum latency with FG enabled would be two full rasterized frames

It is. With FG you're getting latency increase exactly as one additional rendered frame would cost. It's not double of the latency because latency is more than just frame rendering plus also the rendering can overlap a bit. The interpolation is very fast which is why it works.

Why would latency even increase if generated frames would be predicted? This is how I would want FG to work but it seems that it's a bit too complex to implement right now.

Albuquerque · Jan 16, 2025

DegustatoR said:
It is. With FG you're getting latency increase exactly as one additional rendered frame would cost.

No, I don't believe that's true. Can you show some examples of this?

Here is Cyberpunk 2077 with FG disabled running around 50 FPS (so about 20msec frame time) at 1440p and a 45ms of total system latency -- which then moves to 85 FPS (so about 11msec frame time) with FG enabled and a total system latency of 55ms. The difference is 10msec, which is essentially the cost of one generated frame, not the cost of one entire rasterized frame. I don't think this jives with your claim of doubling frame latency...

The YT link hops straight to the non-FG vs FG comparo portion of the video.

DegustatoR said:
Why would latency even increase if generated frames would be predicted? This is how I would want FG to work but it seems that it's a bit too complex to implement right now.

Because the display is still unlinked to the input loop of the game, and the video driver itself is still having to manipulate the flip chain away from the game's central rasterization loop. In fact, more times than once I've mentioned in this thread that FG inserts one generated frame's worth of latency into the chain, and the latency figures above in the video bear this out. Sitting on a whole rastered frame to then figure out the intermediate steps would double the latency as a function of rastered frames, just as you describe... But that's not what's happening.

neckthrough · Jan 16, 2025

Albuquerque said:
This doesn't match my understanding of DLSS:FG, although it does match FSR:FG and the software Lossless Scaling version 2 (but not version 3) products.

If what you described were true, maximum latency with FG enabled would be two full rasterized frames, because we're always holding that second rastered frame long enough to then create and display all the intermediate frames. Meaning that if we were at 60FPS base rasterization, we would hold the 16.7msec next frame in order to generate all the other intermediate frames (another 16.7msec worth) before we could then finally display the "real" rastered frame that we had to sit on. THat doesn't jive with the input latency data we have.

Also, motion vecvtor data and depth data would be pointless if you already had the source and destination "real" frames and only had to generate the middle parts. That's just pure interpolation and it's been done for decades on TVs.

No, my understanding is FG is entirely a projection based on prior frames and the combination of motion vector and depth data from the last "real" frame.

I don't believe this is correct. I love @iroboto's diagram but I think the timing is off.

I think @DegustatoR is right. I believe DLSSFG does have to wait until the second frame arrives in order to produce the intermediate frames.

Let's assume a hypothetical scenario in which the FG (x2) algorithm is infinitely fast. To achieve uniform frame-pacing, you would delay the first frame (F0) by 8.3ms (even though it arrives at t=0). Then at t=16.7ms the second frame (F1) arrives, and you instantly calculate the interpolated F0.5 and display it. Then at t=16.7+8.3ms you display F1. And so on. So overall you've added 8.3ms latency, which is not too bad since you're probably dealing with 30-40ms anyway on a good day. Of course, the FG algorithm isn't instantaneous, so you'll have to delay F0 by 8.3ms + the cost of running the algorithm.

Albuquerque · Jan 16, 2025

neckthrough said:
Let's assume a hypothetical scenario in which the FG (x2) algorithm is infinitely fast. To achieve uniform frame-pacing, you would delay the first frame (F0) by 8.3ms (even though it arrives at t=0). Then at t=16.7ms the second frame (F1) arrives, and you instantly calculate the interpolated F0.5 and display it. Then at t=16.7+8.3ms you display F1. And so on. So overall you've added 8.3ms latency, which is not too bad since you're probably dealing with 30-40ms anyway on a good day. Of course, the FG algorithm isn't instantaneous, so you'll have to delay F0 by 8.3ms + the cost of running the algorithm.

Ok, let's walk through this hypothetical. To help keep it all straight, let's adjust the naming standard of our frames to Rn for "rastered" frames, and Fn for frame-generated frames.

The first rastered frame (R0) wouldn't have to be delayed, it would just be rastered and sent. It's now queued up as the first reference frame for DLSS:FG. It wouldn't do us any good to hold it, because at this exact moment we have no idea when the next rastered frame will be ready, it's a total dataset of n=1.

The second rastered frame (R1) would come whenever it's done. Let's keep your example of 16.7ms frame time, but this time we can't send it yet -- we need it as the second sample to begin creating our F-frames.

In your hypothetical, frame-generation takes no time (I like this, it makes the math easy) , so F1 is displayed at the 16.7msec mark, but represents only the halfway point between R0 and R1. We then wait 8.3msec to show R1 to keep the pacing. This means we have:

R0 is displayed ----> 16.7mec ----> R1 is created but F0 is displayed----> 8.3msec ----> R1 is displayed ---->...

Alright, so now we need to wait for R2 to be created. It's gonna come through at another 16.7msec, and when it does arrive, we instead hold it and generate the new F-frame, wait 8.3msec again, and show R2. And then we repeat the cycle:

R0 is displayed ----> 16.7mec ----> R1 is created but F0 is displayed----> 8.3msec ----> R1 is displayed ----> 8.3msec ----> R2 is created but F1 is displayed ----> 8.3msec ----> R2 is displayed

And then the pattern repeats, this time I'll shorten the arrows to indicate 16.7msec vs 8.3msec:

R0 ----> F0 -> R1 -> F1 -> R2 -> F2 -> R3 -> F3...

Ok, I get it. That does make sense. Good to get myself straight

I've gone back to give both of your posts the thumbs up for the education, hehe...

Alright then, so why would FG care about motion vectors and depth? That data seems utterly pointless.

iroboto · Jan 16, 2025

neckthrough said:
I don't believe this is correct. I love @iroboto's diagram but I think the timing is off.

I think @DegustatoR is right. I believe DLSSFG does have to wait until the second frame arrives in order to produce the intermediate frames.

Let's assume a hypothetical scenario in which the FG (x2) algorithm is infinitely fast. To achieve uniform frame-pacing, you would delay the first frame (F0) by 8.3ms (even though it arrives at t=0). Then at t=16.7ms the second frame (F1) arrives, and you instantly calculate the interpolated F0.5 and display it. Then at t=16.7+8.3ms you display F1. And so on. So overall you've added 8.3ms latency, which is not too bad since you're probably dealing with 30-40ms anyway on a good day. Of course, the FG algorithm isn't instantaneous, so you'll have to delay F0 by 8.3ms + the cost of running the algorithm.

Yup,
I would definitely need to redraw the graph if we wanted to showcase latency etc.

Then we'd have to draw what the GPU is doing for rendered frames, when it's doing it's MFG, and then compositing those two to the monitor output, and we should be able to get an idea of latency. But the idea of the graph was to showcase that with more FG, there is less screen time per frame before the next one arrives.

The harder part here is that each game will do it's update and render separately. Typically when we talk about latency, we're referring to how quickly the CPU responds to our input and the total time it takes for that input to show up on screen. It would be ugly to draw this on excel! We'd certainly need to make some assumptions!

NVious · Jan 16, 2025

Albuquerque said:
Ok, let's walk through this hypothetical. To help keep it all straight, let's adjust the naming standard of our frames to Rn for "rastered" frames, and Fn for frame-generated frames.

The first rastered frame (R0) wouldn't have to be delayed, it would just be rastered and sent. It's now queued up as the first reference frame for DLSS:FG. It wouldn't do us any good to hold it, because at this exact moment we have no idea when the next rastered frame will be ready, it's a total dataset of n=1.

The second rastered frame (R1) would come whenever it's done. Let's keep your example of 16.7ms frame time, but this time we can't send it yet -- we need it as the second sample to begin creating our F-frames.

In your hypothetical, frame-generation takes no time (I like this, it makes the math easy) , so F1 is displayed at the 16.7msec mark, but represents only the halfway point between R0 and R1. We then wait 8.3msec to show R1 to keep the pacing. This means we have:

R0 is displayed ----> 16.7mec ----> R1 is created but F0 is displayed----> 8.3msec ----> R1 is displayed ---->...

Alright, so now we need to wait for R2 to be created. It's gonna come through at another 16.7msec, and when it does arrive, we instead hold it and generate the new F-frame, wait 8.3msec again, and show R2. And then we repeat the cycle:

R0 is displayed ----> 16.7mec ----> R1 is created but F0 is displayed----> 8.3msec ----> R1 is displayed ----> 8.3msec ----> R2 is created but F1 is displayed ----> 8.3msec ----> R2 is displayed

And then the pattern repeats, this time I'll shorten the arrows to indicate 16.7msec vs 8.3msec:

R0 ----> F0 -> R1 -> F1 -> R2 -> F2 -> R3 -> F3...

Ok, I get it. That does make sense. Good to get myself straight I've gone back to give both of your posts the thumbs up for the education, hehe...

Alright then, so why would FG care about motion vectors and depth? That data seems utterly pointless.

You need motion vectors and depth to account for occluded objects and such where a straight interpolation between R0 and R1 would produce an incorrect image.

Linking one of @Dictator's very nice videos that shows such an issue in Spiderman where he runs behind a railing while the camera is also moving.

Here's another DF video clip comparing a sequence of only rendered frames on one half versus only generated frames on the other half. Not double-blind, but informative.

Cappuccino · Jan 16, 2025

Albuquerque said:
Edit: You know, I have an idea. When one of us has a 50-series card at our disposal, we need to do a video capture with MFG fully enabled and grab a series of about two or three dozen frames. We can splat them all out individually, and the people with remarkably strong opinions about how FAKE frames will be lower quality and "wrong" can then point out the ones which are so obviously fake. We can do it double-blind, so the person who does the capture can tell the answers to another unrelated person beforehand, and a second unrelated person can be given the frames to post without any knowledge of which ones are "real" vs "fake".

I think it would be REALLY interesting to see how many of the frames can be accurately determined to be AI generated vs not.

This isn’t the point, I can’t tell which frames are generated when displayed side by side but it’s very clear in motion when they aren’t real. DF did a nice video on 2x FG, displaying side by side it’s actually fairly convincing!

The latency is also pretty easily felt and you obviously won’t get that from stills.

But yea I get your point, frame rate is a separate metric. I can see your point but I think we just differ on our philosophy here.

Cappuccino · Jan 16, 2025

Scott_Arm said:
It quite literally doubles frame rate. "Normal" frames are generated through a render pipeline. Frame generation uses interpolation to generate frames. They're all frames. It's doubled. In terms of what's going to your display, they're all unique frames. They're just qualitatively different because of the way they were generated.

All frames are equal, however some are qualitatively more equal than others.

Only half joking lol but I get ur point. My philosophy with benchmarking is it should be as close to 1:1 as you can get.

Albuquerque · Jan 16, 2025

Cappuccino said:
But yea I get your point, frame rate is a separate metric. I can see your point but I think we just differ on our philosophy here.

Entirely fair.

And honestly, to each their own on this matter.

MfA · Jan 17, 2025

Albuquerque said:
With 40-series FG enabled:
Frame 1 is rastered and displayed at 0.0000 seconds (the clock starts here.)
Frame 2 is AI-generated and displayed at 0.0083 seconds.

Does DLSS do extrapolation now?

TopSpoiler · Jan 17, 2025

MfA said:
Better to have a pan
Does DLSS do extrapolation now?

No. It's still interpolation, including MFG.

MfA · Jan 17, 2025

Albuquerque said:
The first rastered frame (R0) wouldn't have to be delayed, it would just be rastered and sent.

While this is true, it's also irrelevant. It's much better to focus on the steady state and forget about startup to avoid confusion.

Think in terms of frame N and frame N+1, for N>1. Every frame after the first is displayed delayed with interpolation, which is more relevant.

pcchen · Jan 17, 2025

If the goal is to have steady state, the frames should still be as close to the game state as possible, otherwise even if frame time is consistent the animations will still be stuttering. Sometimes you can see this in games even when the frame rate is very high, especially when the movement is relatively simple, such as an object moving horizontally as a constant speed.
So to better achieve the goal of consistency I think it's probably better to make the frames as smooth as possible even if it's somewhat delayed, then try to make the interpolated frames as smooth as possible. This may sacrifices some latency, of course.

Albuquerque · Jan 17, 2025

Well, the very first frame would be at the very beginning of your gaming session, it's not right in the middle of you doing something. No part of that very first frame "hiccup" is going to affect your overall sense of frame consistency IMO, at the very most it might look like the engine was just finishing up the loading process.

MfA · Jan 17, 2025

pcchen said:
So to better achieve the goal of consistency I think it's probably better to make the frames as smooth as possible even if it's somewhat delayed, then try to make the interpolated frames as smooth as possible. This may sacrifices some latency, of course.

The best thing to do is go frameless with sample buffers with way more data.

raytracingfan · Jan 17, 2025

neckthrough said:
What if in a future GPU the traditional/RT pipeline is only used to generate intermediate metadata that a neural-network then ingests to generate *all* frames?

This is probably the best way to get constant frame rate and frame pacing from Frame Generation regardless of input frame rate from the renderer (or even a frameless renderer). Instead of displaying real frames and a specific number of generated interpolated frames in between them, use the ML model to generate every single frame sent to the display with a constant delta time in between them. Unlike existing methods this could even hide micro-stutter and frame pacing issues with the base game.

MfA · Jan 17, 2025

It would be v-synced. No more need for VRR either.

PS. apart from being unwilling to just abandon the "the latency doesn't matter" argument they build, I guess abandoning G-sync at the same time is a hard pill to swallow for NVIDIA.

trinibwoy · Jan 17, 2025

MfA said:
It would be v-synced. No more need for VRR either.

PS. apart from being unwilling to just abandon the "the latency doesn't matter" argument they build, I guess abandoning G-sync at the same time is a hard pill to swallow for NVIDIA.

Huh? Nvidia has been the most vocal and active player tackling rendering pipeline latency in recent memory.

CES 2025 Thread (AMD, Intel, Nvidia, and others!)

iroboto

Daft Funk

Albuquerque

Red-headed step child

DegustatoR

Albuquerque

Red-headed step child

neckthrough

Albuquerque

Red-headed step child

iroboto

Daft Funk

NVious

Cappuccino

Cappuccino

Albuquerque

Red-headed step child

MfA

TopSpoiler

MfA

pcchen

Moderator

Albuquerque

Red-headed step child

MfA

raytracingfan

MfA

trinibwoy

Meh

Similar threads