Virtual Texture Issues and Limitations (Gen 9/PC)

cheapchips

Veteran
As a layman, I naively buy into the promise of virtual textures whole heartedly. No limits to texture size beyond storage. VRAM use is always a smallish, consistent page size based on screen resolution etc.

Is it that straightforward? We still see PC titles, particularly console ports running into VRAM issues with textures.

These days, should there only be one texture setting, 'ultra awesome'? šŸ˜

Is it just that we're still coming out of crossgen, where spinning discs were an issue for using VT ubiquitously?

There's also the question of tool maturity? UE's virtual texturing is only four years old for instance.
 
Last edited:
These days, should there only be one texture setting, 'ultra awesome'? šŸ˜
No. We need those settings to achieve scaling to HW with less memory. There is no easier way to save memory, and reducing detail also has little impact on visuals, so this option won't go away. Same applies to geometric detail, and ideally we want to relate both settings. High res textures on low poly models look shitty, and then i even prefer low res textures too.

The second argument is storage. Personally i associate virtual texturing with unique detail everywhere, like Rage did (but no other game after that).
If we want this, we're obviously constrained by storage space. Or, if we would consider a streaming platform to avoid the client storage problem, we're still constrained by network bandwidth.
Looking at it this way it also becomes clear that 'virtual texturing' itself does not solve any problem. People use the term for any mechanism which does only load memory pages we currently need. But that's obvious, and can be only a low level implementation detail of solving a real problem on some higher level.
So to discuss this, you first need to make clear what is the promise of virtual texturing you see, and what problem you hope to solve using it.

There's also the question of tool maturity? UE's virtual texturing is only four years old for instance.
I don't know much about UEs VT, only what Karis said in his talks about Nanite.
Iirc, he made UEs VT system, but initially he had problems to convince the company about the potential benefits. Before they only had a system at the granularity of mip levels, but no texture pages, and they were fine with it.
The benefit than only showed up in combination with Nanite, which builds upon the VT system to store geometry data in the texture pages as well, i assume.

The 'limitations' of UE show if we compare it to Rage. UE does not try to achieve unique detail everywhere. Instead they build on the idea of instancing.
And that's a good example of why i think it's important to have context.
Think of a Nanite model of a rock, and we use 100 instances of this same rock across the scene, scattered across all distance.
Because Nanite relies on instancing, that's a likely case, and applies to most other models we use as well. A column, a wall, all kinds of modular building blocks, e.g. each skyscraper window in Matrix demo being an instance of one such model.
What happens with our VT system in this case? Because we have instances at all distance, we will just load all data of the model - all levels of detail for both texture and geometry.
There really is nothing wrong or suboptimal here. But we could say that our virtual memory management is not really utilized and thus needed in this case.
That's not correct for any case, e.g. if complex models are used only once or a few times. That's also why VT is worth it for them.

But it's enough of an argument to illustrate why i personally associate VT much more with Rage, where they do not use instancing but unique detail everywhere.
To me Rage is the perfect example of what virtual texturing enables. But maybe you have other applications in mind.
 
Trials evo had a unique texel for each inch of the world kind of VT as in Rage, but instead of loading those from storage, they were generated at runtime. Blending different tiled texure layers, compositing and decals, and other procedural texturing effects was done once in texture space, and cached in a VT page. New pages would be generated as needed. In that game it allowed more complex texture layering and decal compositing than usual for a much less performance.

I think many games could benefit from that and dont realize it.

If there ever was a game that made a comprehensive use of tesselated displacement map surfaces everywhere, such a system would allow the blending and modifying of texture heightmaps in all sort of ways, creating incredibly detailed geometry deformations.
 
If there ever was a game that made a comprehensive use of tesselated displacement map surfaces everywhere, such a system would allow the blending and modifying of texture heightmaps in all sort of ways, creating incredibly detailed geometry deformations.
That's basically what i work on. Though, i'm still just on the geometry side of things, basically a base mesh of quads which then enables seamless texturing on a per quad level.
This should enable all the things you said. We could even splat small 3D SDF models to the displaced surface, or do whatever procedural stuff we come up with.
But there are non obvious challenges:

Procedural stuff usually isn't that great. It's mostly based on noise functions, eventually poisson disk distributions, or other stuff like that. But hose things can not imitate the complex processes and flows we see in real world nature. So it gives us high frequency detail, but nothing interesting on the lower frequencies.
Thus my idea is to use samples, e.g. a hightmap of rocky surface. We can scan this from real world, or generate it with simulations or ML.
To break the repetition we usually get from this approach of triplanar mapped, tiled textures, i want to use something like texture synthesis. Just not per texel, but rather using blocks of image per quad. Across the quad boundaries i'll need some form of blending / distortion to match adjacent blocks of image. This won't be perfect, but for typical textures as seen in nature is should work pretty well i hope.
It's not yet clear how much of this can be pre computed and stored. At least at the highest levels of detail, the whole system will have to run on client as a background task together with streaming. Cost will be noticeable for sure. That's also another potential application of ML, but i feel too old to get into this and will work around it.

The other problem is see is LOD. If you imagine the idea as i've just described it, you usually imagine just one fixed level of detail, and you expect to get lower details from simple mip maps.
But this won't work. To create our rocky surface at some distance, we can not run the system at a mach higher level of detail and then sample it down. IT's not only inefficient, but we woudl need to load all the high detail bese mesh and other data we try to avoid.
So we need a system that generates the same stuff at different levels of details, and the output should match well enough LOD transitions.
That's a pretty though problem. We could solve it with an additive approach, where we just calculate various frequencies one per level, and add up parent laves before we refine it with adding higher frequencies.
But i'm not sure how well this works. It may again reduce quality and interesting features, generating just boring noise stuff we got from the ancient procedural texturing methods before, and they are just not good enough.
I may also overthink this LOD problem because i try to be too general in terms of scaling. Games often do not much about LOD at all, and they are still huge.

Trials evo had a unique texel for each inch of the world kind of VT as in Rage
Is there a certain Trials game on PC you would recommend to look up? I've never played one... :)
 
No. We need those settings to achieve scaling to HW with less memory. There is no easier way to save memory, and reducing detail also has little impact on visuals, so this option won't go away. Same applies to geometric detail, and ideally we want to relate both settings. High res textures on low poly models look shitty, and then i even prefer low res textures too.

I have a particular fetish for low frequency but crisp high res textures. :)

Virtual Texturing is literally a way to save memory. Since texture are split into tiles, you're streaming a highly constant number of tiles into the VT memory pool. Source texture size doesn't matter. Scene complexity doesn't matter, much, since tiles fetched based on visibility.

VT not be confused with unique texturing of Megatextures. Even ID dropped that for idTech7. Eternal uses virtual pages for loading in tiles based on discrete textures instead. It's worth noting that plenty of other games use VT for landscapes (Frostbite, Far Cry...). I assume they stop there just due to seek limitations of Gen8 storage.

My expectation for Gen9 titles, and the relative PC releases, is that texture memory will only be a small part of overall RAM use, and that it will not increase in future generations due to how it works.

As a for instance, The Coalition's Alpha point demo used 4k textures. The overall memory pool, which includes regular streaming textures for vfx etc, was under 1GB.

Timestamped to texture bit.



Trials Evolution is the best Trials. It's a lot of game without getting bogged down.
 
Last edited:
Procedural stuff usually isn't that great. It's mostly based on noise functions, eventually poisson disk distributions, or other stuff like that. But hose things can not imitate the complex processes and flows we see in real world nature. So it gives us high frequency detail, but nothing interesting on the lower frequencies.
That would have been nearly unquestionably true a decade ago, but as I understand it, a lot of AAA studio started adopting advanced procedural material authoring tools in the PS4 era with tools such as Quixel Mixer, wich rely a lot on different parametric functions, noise and voroni as a base, and A LOT of different interacting layers of further addition and subtraction of extra patterns, masks, heigh modulation, intersection etc...

Your concerns about LOD are valid, though. The only robust solution, as you said, is procedural models that have deterministic results when going from low frequency down to high-freq... Not trivial, but not impossible either. For extra quality, one can set the engjne to always generate the material at one mip higher than target and downsample from that for render. The performace cost might be offset from doing it only once every hundreds of frames for each surface and leaving it on the VT cache instead of running a complex shader every single frame for every single pixel as is the norm today.
 
From a hardware perspective it's worth noting that the lack of memory scaling (in terms of cost and consumer pricing/availability) is a relatively recent trend. Prior to 2016 you'd argue that memory scaled faster than compute for sure. As such we'll probably see more conventional memory (as in RAM/VRAM) efficiency driven techniques out of necessity.

In terms of supporting hardware its also worth noting that the prevalence of "high" core count CPUs, NVMe storage, and PCIe 4.0 (not necessarily for the SSD) achieving widespread prevalence on the PC side would also really have just been in the last 2 years. Implementing with the above in mind would have been relatively niche prior to 2020.

In terms of limitations a question I'd have is addressing scenes with more dynamic objects (especially if they vary in amount). This seems like it would incur a similar cost (compute time) issue that fast scene transitions would cause. As such is virtual textures going to be more suited to environments and more static objects?
 
What are the downsides of VT? There must be some substantial limitations for it to have almost no adoption.
 
What are the downsides of VT? There must be some substantial limitations for it to have almost no adoption.

I don't know if this would be the inherent reason but one aspect is that VT essentially trades off compute efficiency for memory efficiency. Memory up until somewhat recently memory hadn't run into a scaling ceiling. Imagine if we say have 64GB if not 128GB mainstream GPUs and consoles currently but the same compute power.

My other question would be on the asset authoring side is if VT is harder to work with in that aspect. I remember when Rage came out there was the mention that the megatexture implementation in that case for instance did have some tradeoffs with respect to content authoring (and also by extension modding).
 
What are the downsides of VT? There must be some substantial limitations for it to have almost no adoption.

Its one more complex system to be implemented and maintained in code. Most devs are rather not bother.
 
My other question would be on the asset authoring side is if VT is harder to work with in that aspect. I remember when Rage came out there was the mention that the megatexture implementation in that case for instance did have some tradeoffs with respect to content authoring (and also by extension modding).

We should really just ignore Megatextures. No one that I'm aware of is using VT in that maner anymore. VT's either being used as an alternative streaming system to mips for loading discreet textures or it's being used as a 2d space to blend and generate textures in realtime.

UE's system is the most well documented I could find. They don't have any authoring constraints beyond the textures needing to be a power of two, but they don't have to be square.
 
My other question would be on the asset authoring side is if VT is harder to work with in that aspect. I remember when Rage came out there was the mention that the megatexture implementation in that case for instance did have some tradeoffs with respect to content authoring (and also by extension modding).
We should really just ignore Megatextures. No one that I'm aware of is using VT in that maner anymore.
I see two main reasons for the mega texture failure:
1. Low res textures due to storage limits.
2. Their way to generate content was basically manual texture splatting. The artists had a library of texture samples, and they painted them over the geometry with some brush tool. Because detail is unique everywhere, they had to paint every spot of the big world manually. (more or less, i guess)

This raises two questions:
1. If compression is such a problem, why not ship the game with the (small) library of texture samples + splat positions, instead of the (huge) composited results?
2. If your game world is open and big, why attempt to paint it manually, instead making automated procedural systems? That's batter quality too. It can simulate erosion, spots where dirt sticks, fractured surface, etc., better than humans could do.

I assume Rage was meant to be an intermediate step towards a longer termed plan and vision. But maybe after they saw Rage as is could not really compete results of traditional workflows and dynamic lighting, they decided to put the idea on hold.
But i think the idea becomes more attractive, the larger our worlds become.
Personally i'm not totally convinced about this 'unique detail everywhere' idea either, but because i have to develop the base technology anyway to get the data i need for my GI system, i should explore some options and see what i get.

What are the downsides of VT? There must be some substantial limitations for it to have almost no adoption.
The only limitation i see is the need to manage paging on CPU, contradicting the goal of GPU driven rendering.
But this only applies to the 'HW accelerated' version of VT. We can just do this in software to bypass the limitation, if it's a problem. Rage did it in software too because HW feature was not ready / widespread back then.
(not 100% sure about those points)

I have a particular fetish for low frequency but crisp high res textures. :)
If you want high res textures, you also want less textures. VT does nothing to reduce memory requirements - it only helps to avoid wasting memory for stuff you do not currently use.
Considering this, there is no way around runtime generation and composition of materials. Because if you use less textures, the repetition becomes more obvious otherwise.

I mean, it's a classic problem fur us: Visible tiling of blocks making Super Mario levels, visible tiling of texture in early 3D games, up to compromises such as using modular instances of rock surface to model awesome Nanite demo, but all rocks have similar color and structure so artists can do some limited compositions.
We sure want to solve this, but without a need for >1TB games (and 1TB still would not be enough at all). We also want to avoid costs of content creation scaling up with higher resolutions.
That's a very important aspect, and VT is part of a solution, though only at a very low level.
 
My understanding is that in the end you still want to have enough VRAM for your typical scene. Maybe a bit smaller, but you really can't have a scene taking, say, twice the size of your VRAM.
The fundamental problem is that the interface between the GPU and the CPU (i.e. PCIe) is really not very fast. With PCIe 4.0 x16 we have ~30GB/s throughput, that means if you want to load 8GB of data it'll take 1/3.75 second, too long to achieve anywhere near 60fps. Now if we think about getting to 60fps that means you can't load more than 0.5GB of data for each frame. It's really not that much data. Even with PCIe 5.0 you'll only double that to 1 GB per frame (and in an ideal situation). Not to mention if you want 120fps then it goes back to 0.5GB again.

So in the end you'll still want to make sure that the texture data in your scenes are not taking too much more memory than your VRAM, and let streaming to handle viewpoint moving and a few textures in the distance maybe. With this in mind, virtual texturing is really not that useful in reducing stuttering, because you still need to plan your scenes carefully.

I guess it's might be easier to just using AI upscaling on textures.
 
My understanding is that in the end you still want to have enough VRAM for your typical scene. Maybe a bit smaller, but you really can't have a scene taking, say, twice the size of your VRAM.
The fundamental problem is that the interface between the GPU and the CPU (i.e. PCIe) is really not very fast. With PCIe 4.0 x16 we have ~30GB/s throughput, that means if you want to load 8GB of data it'll take 1/3.75 second, too long to achieve anywhere near 60fps. Now if we think about getting to 60fps that means you can't load more than 0.5GB of data for each frame. It's really not that much data. Even with PCIe 5.0 you'll only double that to 1 GB per frame (and in an ideal situation). Not to mention if you want 120fps then it goes back to 0.5GB again.

So in the end you'll still want to make sure that the texture data in your scenes are not taking too much more memory than your VRAM, and let streaming to handle viewpoint moving and a few textures in the distance maybe. With this in mind, virtual texturing is really not that useful in reducing stuttering, because you still need to plan your scenes carefully.

I guess it's might be easier to just using AI upscaling on textures.

The whole point of VT (or any other virtual memory managemebt system) is that you dont need much more than you can fit on screen on a frame in memory.

A 4k display has roughly 8.3 million pixels. Let's say you want your materials to have values for RGB albedo, RGB specular, 3 normal values, Roughness, Baked AO, RGB sss color all at a byte per channel. All that is less than 128Mb for the whole screen.

In the real world, your VT will load an cache more data than you need (sebbbi had said between 3x and 4x normally) But in the real world, you won't see the entire contents of the screen change every single frame either.

The 0.5 Gb a frame from PCIE you called too little is the entirety of the RAM contents of the XB360.
 
Last edited:
The whole point of VT (or any other virtual memory managemebt system) is that you dont need much more than you can fit on screen on a frame in memory.
There are two options: Load what you need for the current frame and camera angle as you say, or load what's nearby current location of the camera.
The latter needs more ram, but does not stutter from changing camera angle. It could only stutter if the camera moves too quickly, and we get much bigger tolerance on latency.
The difference becomes even bigger if we have a high cost on generating textures procedurally. Then we likely want to cache the work done for as long as possible, and we may also generate speculative stuff in advance which we then do not use.
The argument is also related if we consider something like texture space shading. Having and caching only what's currently in view may not be enough.
It also relates to something like AI upscaled textures. Doing this while fetching a sample sounds expensive. We may want to cache what's in view, or even what's nearby.

That's a big streaming question mark for me.
 
There are two options: Load what you need for the current frame and camera angle as you say, or load what's nearby current location of the camera.
The latter needs more ram, but does not stutter from changing camera angle. It could only stutter if the camera moves too quickly, and we get much bigger tolerance on latency.
The difference becomes even bigger if we have a high cost on generating textures procedurally. Then we likely want to cache the work done for as long as possible, and we may also generate speculative stuff in advance which we then do not use.
The argument is also related if we consider something like texture space shading. Having and caching only what's currently in view may not be enough.
It also relates to something like AI upscaled textures. Doing this while fetching a sample sounds expensive. We may want to cache what's in view, or even what's nearby.

That's a big streaming question mark for me.

Agreed. Ray tracing or other Global Illumination systems will also require broader (albeit tolerably much lower res) representation of the scene to be in the VT cash.

With runtime material generation and composition, you also must manage not only the final composite material, but also all the base textures used to generate it. Yes.

My example was just a illustrative back of napcking example to show how little actual texel data really is needed to fill a 4k screen buffer. I imagine actual games pack their material atributes with thinner data layouts than My example, (although I am not sure). I tried to be conservative on purpose.
 
The whole point of VT (or any other virtual memory managemebt system) is that you dont need much more than you can fit on screen on a frame in memory.

A 4k display has roughly 8.3 million pixels. Let's say you want your materials to have values for RGB albedo, RGB specular, 3 normal values, Roughness, Baked AO, RGB sss color all at a byte per channel. All that is less than 128Mb for the whole screen.

In the real world, your VT will load an cache more data than you need (sebbbi had said between 3x and 4x normally) But in the real world, you won't see the entire contents of the screen change every single frame either.

The 0.5 Gb a frame from PCIE you called too little is the entirety of the RAM contents of the XB360.

The thing is that it's difficult to predict beforehand which textures are going to be needed. It could be very small and it could be very large, or a completely different set of textures in the next frame.
Furthermore, you have to consider latency. Getting data from the main memory through PCIe incurs latency, so it's very unlikely that you can load all these textures you need from the main memory (let's say you need 1000 different textures) and fit all of them within one frame time (which is 1/60 seconds if you are targeting 60fps).

Furthermore, 0.5GB per frame is the max throughput of PCIe 4.0 x16. In real world it's very unlikely to be able to achieve such speed. There's also the bandwidth of main memory to be considered. DDR-5 6000 provides ~90GB/s, but that's the current high end. A more common DDR-4 based system will have much less bandwidth available and especially when you are loading scattered data.
 
The thing is that it's difficult to predict beforehand which textures are going to be needed. It could be very small and it could be very large, or a completely different set of textures in the next frame.
Furthermore, you have to consider latency. Getting data from the main memory through PCIe incurs latency, so it's very unlikely that you can load all these textures you need from the main memory (let's say you need 1000 different textures) and fit all of them within one frame time (which is 1/60 seconds if you are targeting 60fps).

Furthermore, 0.5GB per frame is the max throughput of PCIe 4.0 x16. In real world it's very unlikely to be able to achieve such speed. There's also the bandwidth of main memory to be considered. DDR-5 6000 provides ~90GB/s, but that's the current high end. A more common DDR-4 based system will have much less bandwidth available and especially when you are loading scattered data.

500mb per frame is better than you're thinking for virtual textures. G-buffer is let's say 4 render targets from textures, 16 bytes per pixel, 4k native is 8 million pixels, 128mb of vt traffic per frame, and that's assuming it's all new VT pages per frame, which clearly it isn't 99.9% of the time. Which means if we need to combine 8 textures per pixel at runtime we can as we're doing maybe 10% new pages per frame.

Now with real world scenarios and heap transfer from CPU to GPU taking up a lot of traffic that might start really cutting in, especially if you want 8k or 120hz+ or something. But that's what PCIe5 and on GPU decompression come in, as if you don't have those then probably you're running temporal upscaling and/or 1440p etc.

Virtual texturing is a silver bullet solution for texture streaming. You just reference and combine textures at runtime for close mips and store the farthest mips as unique textures, as even if you're doing a huge open world(s?) the farthest mips are tiny enough that you don't care about the disc size that much. It takes programmer time, but saves a ton of artist time.
 
500mb per frame is better than you're thinking for virtual textures. G-buffer is let's say 4 render targets from textures, 16 bytes per pixel, 4k native is 8 million pixels, 128mb of vt traffic per frame, and that's assuming it's all new VT pages per frame, which clearly it isn't 99.9% of the time. Which means if we need to combine 8 textures per pixel at runtime we can as we're doing maybe 10% new pages per frame.

Now with real world scenarios and heap transfer from CPU to GPU taking up a lot of traffic that might start really cutting in, especially if you want 8k or 120hz+ or something. But that's what PCIe5 and on GPU decompression come in, as if you don't have those then probably you're running temporal upscaling and/or 1440p etc.

Virtual texturing is a silver bullet solution for texture streaming. You just reference and combine textures at runtime for close mips and store the farthest mips as unique textures, as even if you're doing a huge open world(s?) the farthest mips are tiny enough that you don't care about the disc size that much. It takes programmer time, but saves a ton of artist time.

I really hope it's the case but I'm not that optimisitc. For example, you can't just load the texels you need, because the performance is likely to be very bad. However, if you need to load at least part of the texture (let's say in 4KB blocks), then the amount of data you might need could be very large. Not to mention that the main memory is not that good at random access. If you load from a large data set randomly, you'll be lucky to get 10% (or even less) of the theoretical memory bandwidth.
From what I've seen, VT is not just "takes programmer time." It's likely to need a huge amount of engineering efforts and in the end you might still get random stuttering when the stars are not aligned.
 
Back
Top