Either the increased texture resolution will have diminishing results because of low screen resolution or demands on texturing will increase greatly- and texturing power usually scales with other parts of chip.
Higher texture resolution = additional 2x2 higher quality mip levels. GPU accesses higher mips only for surfaces close to the camera (assuming roughly uniform texel density on all meshes). Cost of rendering further away geometry thus remains the same.
Background in general is more expensive to render than foreground (more objects, more triangles / pixel, more discontinuities -> more texture cache misses). A surface close to the camera blocks big chunk of background -> you see a sudden frame rate improvement. If this surface has 2x2 higher quality texture, it will only make the frame rate more even (reduces max frame rate a bit, but has no effect on min frame rate). Close up geometry has bigger continuous surfaces -> less texture cache misses. Thus the performance degradation of super high resolution textures is minimal.
I of course agree with you that high res textures matter a lot less on lower output resolution. 2x2 lower resolution (4K -> 1080p) means that GPU accesses 1 mip level lower version of each texture. 1 mip level = 2x2 higher resolution = 4x higher memory cost. But streaming of course makes the extra memory cost much more manageable. And its worth noting that on 1080p the streaming system will load each mip at 1/2 distance closer than 4K. Thus high resolution texture packs at 1080p are much more friendly to the streaming system compared to 4K.
I'm not sure why. You can still increase texture quality without increasing rendering resolution. Meaning a low end card should still be able to use more memory without greatly increasing the rendering load.
Regards,
SB
Modern game engines have sophisticated texture streaming systems. You only stream in highest mips for objects very close to the camera. Every added (2x2 higher detailed) mip level will be streamed in at 1/2 distance closer than the previous mip level. This is how mip mapping works. GPU doesn't even access the highest mip levels for objects that are not very close to the camera -> no need to have them in memory. Of course you have some extra guard band bias for streaming to guarantee that data is ready in VRAM before the sampler would access it.
Thus higher quality textures do not cost as much extra memory as people commonly think. 90%+ of the rendered objects are further away from the camera -> only lower mip is accessed.
Theoretically, you only need to have one texel loaded per pixel on the screen. So 2M texels for 1080p and 4M texels for 4K. If we assume point sampling, this is easily proven by
https://en.wikipedia.org/wiki/Pigeonhole_principle. Bilinear filtering needs to blend with neighbors. Fortunately most surfaces have continuous UVs and are significantly larger than single pixel, so most bilinear accesses will be shared by the neighbors. Thus you need to only pay extra memory cost for discontinuities (object edges, UV seams, mips). As you increase the resolution, the area of each surface grows quadratically (x*x), while the count of edge pixels grows linearly (x). Thus edge pixels become increasingly small percentage. Mip mapping with trilinear filtering adds a significant cost, but this is a constant multiplier, independent of amount of textures or the size of the textures. When rounded towards the less detailed mip, there's a 25% extra cost, and when rounded towards the more detailed mip, there's a 5x cost multiplier. Of course these additional samples are also shared with neighbors, mitigating some of the cost. Virtual texturing gets pretty close to the theoretical maximum. The biggest difference is that you can't load single individual texels (too much seek). You need to group nearby texels to tiles (commonly 128x128 pixels). This further increases the impact of discontinuities. My research done with virtual texturing tells me that screen pixel count x4 is a good upper estimate of required amount of texels to texture a single frame. You'd want to have at least 4x larger cache to ensure no streaming when the player rotates the camera around.
Sophisticated modern texture streaming systems get pretty close to virtual texturing. You don't have per pixel occlusion (VT doesn't load hidden textures at all). But nearby objects require most of the memory, and usually there's not that much overlap near the camera. Most engines also use some kind of large scale occlusion culling / level partitioning technologies to avoid loading all further away textures to memory. It is not as good as virtual texturing, but getting closer as the technology evolves. Too bad Microsoft artificially limited tiled resources to Windows 8+, meaning that no PC games yet use that feature. It would be trivial to load textures to system RAM (16 GB+ is common) and use tiled resources to change the active subset at fine granularity based on visibility. There would be no additional visible popping, as CPU->GPU copy is order of magnitude faster than loading textures on demand from HDD.
Pascal P100 can already page fault from CPU memory. Knights Landing MCDRAM can also be configured as memory or as cache. 4 GB of fast VRAM (for example HBM2) as a cache + page fault from 32+ GB of CPU memory (DDR4) should be more than enough for 4K and even 8K. Unfortunately PC hardware and OS install base is too fragmented to make this a reality soon.