Maybe its useful to talk about this in signals terms to come up with a minimal, objectively true statement. "Check my work" on this though. With "traditional rendering", some (if not all) of the equations of the scene are solved for each display pixel on each displayed frame. Thus, the display resolution and frame rate is twice the spatial and temporal bandwidth of a signal that the GPU is capable of generating. With DLSS-style upscaling, the spatial bandwidth can (in some cases) match that of traditional rendering due to sub-pixel jitter and temporal accumulation, but always at the cost of reduced temporal bandwidth. With frame-gen, no new information about the scene is solved for, so the bandwidth is unchanged from that of the rendered frames. A 5070 at some resolution and display rate with 4x frame gen has a smaller bandwidth than a 4090 with 2x frame gen at that same resolution and display rate.
Since not every signal will need the full bandwidth of the 4090, there may be no perceptible difference when that signal is produced with a 5070, even with a different frame-gen multiple. However, games will have signals with high-frequency discontinuities (edges, disocclusions, "light switch" style lighting changes) and those signals will always be better resolved and suffer fewer artifacts on a GPU that can generate a signal with a larger bandwidth.