It's not the number of assets but the rate of change that requires higher streaming performance. High detail requires lots of RAM to store them (assuming we're not using tiled resources). As that area of visibility changes, the new data needs to be swapped in. That could be many gigabytes of a new distant city one has teleported to, or many gigabytes of street-level assets in people and cars. The I/O consideration is how much of the already-present data is made redundant by the change in view.
In the case of FS2020, the rate of change is low. In the case of the other examples presented, from flying high to dropping to street level, the rate of change is about as high as it'll get for a computer game. Whatever FS can achieve doesn't represent the limits of challenge to next-gen games with much more severe changes in view.
Of course, in support of your view that streaming doesn't need be high, we can see what SM and GTA are already achieving on slow HDDs.