VT cache size scales linearly with screen pixel count. So 4K requires 4x larger cache. With PBR materials (4 bytes per pixel) the following cache sizes are enough: 720p = 64 MB, 1080p = 128 MB, 4K = 512 MB.
These cache sizes reserve 16 texels per each screen pixel. I have measured that filtering + geometry/UV discontinuities roughly multiply the pixel count by 4x. The remaining 4x is to ensure that a single scene takes at max 25% of the cache. This allows camera rotation, shaking, etc without loading new data. Also you need some leeway to load data ahead of the camera.
VR would need wider FOV render for VR page id texture. This pass would also provide wider FOV depth buffer, reducing screen edge issues with SSAO, specular occlusion and screen space reflections.
Checkerboarding or interlacing doesn't reduce virtual texture cache size requirement. Also these techniques don't help any other mesh lod or texture mip based streaming techniques. Streaming pools need to be as big.