Why isn't framerate upscaling being progressed when TVs have it but it's a better fit in game engine?

But the question is how acceptable aliasing and flicker is in peripherical regions. We might need high quality there, and temporal smoothing contradicts our quick perception of movements in the periphery.
Just had an idea about this. Idk what's the current state with eye tracking, but if accuracy is a problem, we could use expected human behavior to address the inaccuracy.
The behavior is: We stare forwards, but then we notice a bear jumping at us, or a car at collision course. The threat comes from the side, so we notice some movement in peripherical view. The quick reflex makes us moving our eyes focusing the threat, to see what it is.
So basically, if the detector detects sudden eye movement but the exact angle is noisy, we could find local maxima of acceleration to get a good hint.

Imagine: Buy some crappy eye tracking sensor for 50 bucks, and if games support the tech, it makes your iGPU as powerful as a 4090, because rendering costs reduce to 5%. : )
Maybe too optimistic, but i think that's really a promising direction.
 
That's actually apparent watching VR footage that's head-tracked and all over the place!
Yeah. Some of the earliest headset prototypes (but still in the modern Rift/Vive era) did not track this high-frequency head movement. So the picture stayed static but the eyes darted around as a reflex trying to "compensate". The nausea was almost instantaneous.
 
I think the eyes move a lot more than that - you just don't realise.
That’s right. Our heads are extremely unstable and so the eyes are constantly counteracting the movement so that we can stay fixated on a single point.
Rephrasing up in context of potential optimization: We don't have to worry about eyes constantly moving, because the constant movement only aims to generate a stable image as seen from the observer by compensating head motions.
So it's actually a benefit, not a problem.

There are two types of eye movements i think: This compensation (which i notice mostly as missing from earlier attempts of character animation in games, where eyes did not focus anything, but followed just the head bone, ugh :) ),
and a sudden change of focus to a region that was in the periphery before. Mostly caused by unexpected movements, or a change of our interest.

So its' only this second type which gives us problems.
However, if i jump from the word i'm just typing to another word at the other side of the screen, it lasts something like 0.2 seconds until i can see it sharply.
That's probably the time the brain needs to generate stable 'images' from our low count of 'sensors' in the eyes.

The question then is: Can we utilize this too, giving us some headroom on reconstruction? E.g. to gradually upscale things like previous frames information we use for temporal accumulation, or even gradually increase resolution of the image as a whole?
Pretty sure we can, but idk how much. If we used that same 0.2 seconds to increase resolution gradually, likely we would slow down the process in the brain, causing some discomfort. But 0.1s might be fine.

So if we find ways to generate only the data we really need to see, the potential optimization is huge, and the cost on HW should be small.
But i'm not well informed. I just saw PSVR2 gets a speedup of 3.6 from foveated rendering (probably implemented using VRS, but not utilizing lower LODs or a full reduction of resolution). And it also has a shooting game where you aim with the eye tracker, people saying it works well.

But there's also this, claiming 95% compute reduction, which is what i personally expect and hope for:
 
There are two types of eye movements i think: This compensation (which i notice mostly as missing from earlier attempts of character animation in games, where eyes did not focus anything, but followed just the head bone, ugh :) ),
and a sudden change of focus to a region that was in the periphery before. Mostly caused by unexpected movements, or a change of our interest.

Tiny saccades can be accommodated within the area of crisp rendering. Vestibulo-ocular (the ones adjusting for physical movements of the eyes relative to view target, like head wobbles) also. However, in viewing a scene the eyes dart around, taking in information. If talking to someone, the eyes track eyes, mouth, hands, etc. There's lots of eye motion that you aren't really aware of.

What you are talking about is large saccades with a 200 ms interval where you can reconstruct. However, smooth pursuit of a target moving across the scene won't have this interval. Consider a viewer seeing a wildlife scene, like HorizonZD. The viewer may watch a large dino, then jump to a flying saurus, follow it past trees, then jump to a robotic butterfly.

If you look up gaze plots, you'll see some pretty wild visual sampling going on!
But there's also this, claiming 95% compute reduction, which is what i personally expect and hope for:
Nice approach and demonstrated on screen by looking at the focal point, but I wonder how much the AI reconstruction is neccesary versus just having the periphery blurred out or otherwise naively reconstructed?

Also, how does this apply to frame interpolation? Approximate updates in the periphery at lower frequency??
 
It doesn't apply to interpolation at all, interpolation is a complete dead end for VR.

It's in the realm of frameless rendering and similar techniques, extrapolation not interpolation.
 
What you are talking about is large saccades with a 200 ms interval where you can reconstruct. However, smooth pursuit of a target moving across the scene won't have this interval. Consider a viewer seeing a wildlife scene, like HorizonZD. The viewer may watch a large dino, then jump to a flying saurus, follow it past trees, then jump to a robotic butterfly.
Yes, but if we can track eyes exactly with low latency, no need for reconstruction of that tiny 200^2 pixels square we have to do at full res. We might even afford true multisampling in that region.
I'm mostly worried about inaccuracy from tracking, so making a guess on focus if input is noisy might help. Tracking from the top of a flat screen surely is much harder than in a headset. (Also: a couple could no longer play such game using just one couch, console and display :( )
Reconstruction seems more usable for the peripherical regions, like shown in the video.

I wonder how much the AI reconstruction is neccesary versus just having the periphery blurred out or otherwise naively reconstructed?
For this question, i often try to analyze what i see on out of focus regions.
E.g. i look down on the ground, and there is soil with many tiny bright boulders on it. The boulders are at high frequency. I can not see them precisely, but i still do know it's many tiny boulders causing high contrast. I guess the brain makes up this information from knowledge, although the eyes can't see it.
If we did a naive foveated render, the boulders would appear blurred, and the sharp high frequency contrast would get lost. So eventually our brain would complain and detect the trick.
If so, advanced upscaling like neural techniques might be worth it to prevent this issue. I feel like we still have some sense of texture and patterns at high frequency, although we can't see it sharply.

Also, how does this apply to frame interpolation? Approximate updates in the periphery at lower frequency??
I share this question too. Ideally we can have lower resolution and lower fps, and we can also use heavy temporal sampling to smooth out aliasing and jitter so it's not confused with motion triggering reflexes.
Personally i assume we can't do lower fps eventually, but i hope TA is acceptable. Just guessing.
In the worst case, the requirements on temporal stability are so high that most advantages get lost, and a speed up of 4 remains all we can expect in practice.

Idk. But i assume VR industry will pay the bill to figure it out, together with all the changes needed on realtime rendering. Once they have it working and widely adopted, people may start to consider it for flat screens too.
Could be a godsend for the gaming industry: AAA visuals on mobile HW, lifting the barrier between mobile and PC/console development, further growth of the market.
Maybe the bottleneck of future games is no longer gfx, but becomes the actual simulation of the game, which would make some sense.
 
You step back to DLSS 2. Except the pixels for the current frame aren't uniform any more.

Also engines should start passing some more data, at least the G buffer. For the reflection tricks from the paper Temporally Dense Ray Tracing, the integration with the engine needs to be even tighter.

No longer a simple bolt on for lazy developers.
 
Temporal convergence is mostly acceptable with stationary displays. I have a big question mark over naive foveated rendering (high resolution center). That's just too good to be true and any win is spread thin from fixing SS artifacts. Closest to ground truth must be multiview/multiangular sensor output or rendering ( who even does that). Then you're back to spreading thin.
 
Back
Top