Maybe Nvidia doesn't think it needs real performance anymore. If we can generate a bunch of extra frames between real ones and make sure everything is using DLSS3 then we only need 4050's and get 200fps at 4k!
I know you're being facetious and I have no idea if that's where Nvidia is headed. But I want to entertain your thought seriously because it may not be such an absurd vision.
At the end of the day all of real time CG is a hacky simulation attempting to convey enough information for our brains to interpret the scene as some facsimile of reality. Multiple factors contribute to the fidelity of this simulation, but the two broad buckets that are relevant to this discussion are (1) fidelity of the physical model (i.e., lighting, materials, geometry) and (2) display sampling rate (i.e., resolution and fps).
Let's stick to (2) for now. What's funny is that a lot of the desire for higher sampling is due to the horrible nature of sample-and-hold LCD/OLED displays. CRTs were much less affected by this because they would sample a point for a tiny fraction of the refresh window and our brains would run the "DLSS3" frame generation to fill in the gaps. This led to a much smoother perception of motion than LCDs which keep displaying the same old sample for the entire duration of the refresh cycle, leading to a jarring visual shock when it instantly refreshes to the next sample. Black-frame insertion in LCDs make a weak attempt to approach CRT behavior but lose brightness (because the screen is off half the time). CRTs did not have this problem because each sample is insanely bright (which means they are probably much more strenuous on the eyes than LCDs).
(This was a purely temporal argument, but the spatial argument is somewhat analogous).
I believe that frame-generation is effectively making up for the sample-and-hold display's faults by simulating the brain's reconstruction behavior in code. Recent AI work has shown us that deep-learning models are actually excellent at solving specific perception problems that the human brain is adept at. I think motion interpolation falls into that category of problems. How well DLSS3's current implementation works is a separate discussion. But on a fundamental level I don't think there's anything wrong with the approach. Today we are just generating 1 frame. I'm hopeful that some day we are able to reconstruct from 60Hz to 360Hz or more.
All of this is especially important because transistors are getting costlier and so simply attempting to increase raw sample rate will get commensurately costlier. That's not to say that frame-generation comes for free -- you need transistors for that too! However, it seems to scale much better than raw rendering power, especially given that it cuts down on CPU cycles too. If it works well, it can lead to more cost effective GPUs (which we
sorely need). Conversely, for high-end GPUs, it frees up transistors for higher modeling fidelity (lighting, materials) instead of chasing dumb sample rates.