ouldn't it be done like BFV variable ray tracer except based on the areas of the screen you are looking it? Or ... in some other way... making the denoiser wokr extra hard and be more expensive on the focal area and less/expensive/less accurate in areas out of the viewer focus?
Sure there are options, but you will not see a 10 times speedup just from going down to 500x500px.
I'm no expert here, but i see some intersting things, not obvious on the first thought:
When changing focus quickly to another section of the screen, it takes 1/4 second until i see that sharply. This is great because the previous low res results from that area should be still good enough to get going from there. Also the human perception of motion in the focused area is 'laggy' in comparison to the peripheral border area (coming from the primal need to detect dangerous animals quickly, they say).
So we need high spatial quality in the center but temporally stable results at the borders i guess. I assume laggy lighting is still acceptable everywhere because it's likely not so important to detect motion.
But the problem is we have bad neighborhood information in the borders, because pixels cover large solid angles. And this will break both denoising and TAA, which kinda defies the whole foveated rendering idea.
So this will not make high quality path tracing cheap - you'd just need to do more samples per pixel than before.
A solution would be something like prefiltered voxels for example. Here you could pick the voxel mip from the pixel solid angle and there would be no aliasing or flickering (see Cyril Crassins works before VCT - i don't say this is parctical, but there are not many options to get prefiltered graphics).
For the lighting some world space based methods have similar properties allowing such good filtering, and i assume this works well here. Though, this would not benefit from the lower resolution.
Still, the win could be: Expensive RT to get high frequency details like sharp reflections and hard shadows would be necessary only in the focused area at all, allowing for much higher quality in return. Requirement is both lighting techniques have to be accurate so they match and can be blended - VCT would fail here.
... far fetched random thoughts, ofc
But we see similar dilemmas aready now with DLSS: Even if we could do RT at quarter resolution and upscale just that while the rasterization happens at full resolution, we would loose samples for denoising. So the current standard to upscale the whole frame instead seems a compromise between a lot of things.