A major part of this is probably due to the nature of Nanite and VSMs. When you increase the resolution of a classic game, you tend to just blow up the polygons. It adds some shading work but generally the resolution of shadow maps, the detail of geometry and so on do not increase proportionally. So it may say "4k" and you get (often overly) crisp albedo textures, but in reality most of the game is just... bigger polygons and even more undersampled/blurry shadows. Gamers are of course used to this by this point, but it is obviously not the goal.
On the contrary, both Nanite and VSM target polygon sizes and shadow sampling rates (respectively) that are proportional to the pixel sampling rate. i.e. if you quadruple the primary pixel count (1080->4k), you will often do the same to both the geometric detail (assuming the mesh detail is available in the source asset) and the shadow resolution, which obviously has a much greater impact on performance than classic resolution changes. But that's really the point - classic resolution changes are not some holy grail of correctness. In many ways you can think of them as their own kind of "upsampling"; you are increasing the evaluation of part of the visibility and shading function (BRDF, textures) but not other parts (shadows, GI, etc). In reality, we really do want these rates to be directly related so that the nature of the image doesn't change fundamentally between low and high resolutions. The "side effect" of this is that resolution is a big hammer now in terms of performance and quality for games that use these technologies, and that's a good thing. People will just need to adjust their expectations on that front.
Now of course you can argue that you personally like the blown up polygons and blurry shadows look and that's fine. The new systems can of course be configured to undersample these parts more heavily if that is the goal, but I don't expect it to be the norm. For most people, in most situations - and specific game art aside - better lighting and cleverness applied to a bunch of well distributed stochastically sampled pixels plus smart upsampling is a (much) better use of performance than brute force. This was established in the research literature decades ago now, even for offline rendering. I don't think anyone I know have has been particularly vague on this front... things like dynamic GI require an order of magnitude more hardware performance but can have a much greater impact on visual quality than simply brute forcing some more primary visibility, especially considering the better sampling patterns enabled by modern super-resolution/temporal AA (much like how MSAA can look better and be more stable than doubling the resolution with uniform sampling beyond a certain visual pixel density).
The other consideration for games with more limited resources that consumers may not see is that the actual game production with technology like Nanite and Lumen can be significantly streamlined, allowing smaller teams to produce more content. I haven't played through Remnant 2 but from the video it doesn't look like there's a whole lot of dynamic lighting going on, so it could probably perform better with classic baked lighting. That said, baked lighting and manual LODs add a large amount of overhead to content production and thus it's very possible that given a fixed team size and time frame they would not have been able to produce nearly as much content. Obviously this is a consideration that the end user doesn't really see directly, but it's a big part of the benefit of these more modern, automatic systems.