Virtual Texture Issues and Limitations (Gen 9/PC)

That's presumably the root of a lot of crappy texture filtering across PS5 titles?
It happened for the longest time during last generation for any games doing software virtual texturing ...

This Siggraph presentation goes into more clear details of the costs/tradeoffs for both software/hardware virtual texturing implementations. Hardware virtual texturing (PRT/tiled/sparse) looked interesting in the beginning but years later the very same IHV that were promoting the feature explicitly tell graphics programmers out there to NEVER use this feature in practice ...
 
It happened for the longest time during last generation for any games doing software virtual texturing ...

This Siggraph presentation goes into more clear details of the costs/tradeoffs for both software/hardware virtual texturing implementations. Hardware virtual texturing (PRT/tiled/sparse) looked interesting in the beginning but years later the very same IHV that were promoting the feature explicitly tell graphics programmers out there to NEVER use this feature in practice ...
Thats quite an old presentation. Can you share details on the pitfalls current AMD and/or Nvidia GPUs face with hardware VT?
 
Thats quite an old presentation. Can you share details on the pitfalls current AMD and/or Nvidia GPUs face with hardware VT?
Here are some graphs illustrating the performance issues of updating the page table mappings in Vulkan ...
1707997094219.png
The first graph above depicts the behaviour of the RTX 4090
1707997159560.png

What you see in these graphs are how long it takes to complete a vkQueueBindSparse (UpdateTileMappings in D3D) call as the hardware queues are getting saturated over each submission. The horizontal axis represents a new iteration for a vkQueueBindSparse submission while the vertical axis represents the total time in milliseconds ...

On the 4090, the first 50 calls of vkQueueBindSparse appear to have no performance impact but then the cost of each new vkQueueBindSparse from the 50th to 150th submission sharply increases. Just before the 4090 reaches the 300th submission, each new vkQueueBindSparse submission will take over 70ms to to complete ...

On the 7900 XTX, the cost of each new vkQueueBindSparse submission will sharply rise at the very start until it's ~40th iteration for which after every new submission of vkQueueBindSparse will take over 100ms to finish ...

So we can clearly identify the performance problems of updating the page table mappings in each of these graphs for hardware virtual texturing implementations on both vendors. Making API calls to update the page table mappings can get prohibitively expensive when done in succession for many new frames as is potentially the case with sparse/virtual shadow maps ...
 
Here are some graphs illustrating the performance issues of updating the page table mappings in Vulkan ...
View attachment 10838
The first graph above depicts the behaviour of the RTX 4090
View attachment 10839

What you see in these graphs are how long it takes to complete a vkQueueBindSparse (UpdateTileMappings in D3D) call as the hardware queues are getting saturated over each submission. The horizontal axis represents a new iteration for a vkQueueBindSparse submission while the vertical axis represents the total time in milliseconds ...

On the 4090, the first 50 calls of vkQueueBindSparse appear to have no performance impact but then the cost of each new vkQueueBindSparse from the 50th to 150th submission sharply increases. Just before the 4090 reaches the 300th submission, each new vkQueueBindSparse submission will take over 70ms to to complete ...

On the 7900 XTX, the cost of each new vkQueueBindSparse submission will sharply rise at the very start until it's ~40th iteration for which after every new submission of vkQueueBindSparse will take over 100ms to finish ...

So we can clearly identify the performance problems of updating the page table mappings in each of these graphs for hardware virtual texturing implementations on both vendors. Making API calls to update the page table mappings can get prohibitively expensive when done in succession for many new frames as is potentially the case with sparse/virtual shadow maps ...
Thx for info. I always wonder how such disconnects between API and the hardware happen when they should be in continual collaboration. Surely they knew of these pitfalls far before finalization of the API spec?
 
Here are some graphs illustrating the performance issues of updating the page table mappings in Vulkan
Believe there was some customization for that in XS consoles. That and filtering.
I didn't think the paging would have been such a big issue but maybe it does have a big affect on the console.
 
Thx for info. I always wonder how such disconnects between API and the hardware happen when they should be in continual collaboration. Surely they knew of these pitfalls far before finalization of the API spec?
Well some things require hindsight because we can't truly determine if feature xyz will turn out to be either good or bad without some passage of time ...

Ever since the start of D3D12 we've had a several duds (ROVs/tiled resources/VRS) here and there while even contentious topics such as PSOs have shown the benefit of making future hardware designs simpler and more cost effective too since it translates to implementing less hardware logic ...
Believe there was some customization for that in XS consoles. That and filtering.
I didn't think the paging would have been such a big issue but maybe it does have a big affect on the console.
Consoles have it even better with explicit access to virtual memory ...
 
Thx for info. I always wonder how such disconnects between API and the hardware happen when they should be in continual collaboration. Surely they knew of these pitfalls far before finalization of the API spec?
There have been many disconnects in the past, DX9, DX10 and DX11 all had some, but DX12 went into overdrive, as stated above: Raster Order View, Tiled Resources, Variable Rate Shading, there was also Conservative Rasterization, and now there is Sampler Feedback. Each of these features either have seen limited implementations or none at all, but none of them harmed the user experience in any way. They just didn't advance the tech in the originally planned way, as such they are more of a disappointment than anything else.

Of course the biggest failure of all in terms of user experience has been PSOs (Pipeline State Objects), which single handedly ruined the gameplay experience of hundreds of games for all players in a significant and lasting way. This motivated Vulkan to create new extensions to get rid of them, but DIrectX 12 has yet to do any of that, despite users suffering for almost 10 years now.
 
Ever since the start of D3D12 we've had a several duds (ROVs/tiled resources/VRS) here and there while even contentious topics such as PSOs have shown the benefit of making future hardware designs simpler and more cost effective too since it translates to implementing less hardware logic ...
Know there was issues in the tiled resource hardware last gen and was hoping had been ironed out as V2. But it sounds like consoles may not be too bad but pc is an issue. Something like this if you implement a software version probably not worth implementing a hardware version on console. Unless your console first like Sony maybe. Will see.

Other things like mesh shaders may be taking longer as even current gen only titles were just doing low hanging fruit improvement of ssd speed, gpu & cpu brute force performance.
AW2 and avatar was really the start, and that was end of last year.

So still having fingers crossed for hardware virtual texturing, but less hopeful after your posts.
 
Know there was issues in the tiled resource hardware last gen and was hoping had been ironed out as V2. But it sounds like consoles may not be too bad but pc is an issue. Something like this if you implement a software version probably not worth implementing a hardware version on console. Unless your console first like Sony maybe. Will see.

Other things like mesh shaders may be taking longer as even current gen only titles were just doing low hanging fruit improvement of ssd speed, gpu & cpu brute force performance.
AW2 and avatar was really the start, and that was end of last year.

So still having fingers crossed for hardware virtual texturing, but less hopeful after your posts.
Out of all the desktop graphics vendors, Intel of all IHVs has the best implementation of hardware virtual texturing. Not only do they have the highly prized GPU-driven page table mapping update functionality but their more recent graphics architectures also implements a hardware path for fast page table mapping updates in existing APIs but that same path also prevents them from concurrently accessing both their hardware compute and transfer queue so that means they can't do async compute/transfer while using the faster path for direct page table mappings ...

If graphics programmers want to use tiled resources there's always Intel graphics for them along with ROVs, geometry shaders, or other features that nobody else uses ...
 
Last edited:
Out of all the desktop graphics vendors, Intel of all IHVs has the best implementation of hardware virtual texturing. Not only do they have the highly prized GPU-driven page table mapping update functionality but their more recent graphics architectures also implements a hardware path for fast page table mapping updates in existing APIs but that same path also prevents them from concurrently accessing their both their hardware compute and transfer queue so that means they can't do async compute/transfer while using the faster path for direct page table mappings ...

If graphics programmers want to use tiled resources there's always Intel graphics for them along with ROVs, geometry shaders, or other features that nobody else uses ...

Or leave the bloody "software features in hardware" alone and just make a processor that does what it's told. HW mesh shaders are slower than shipping software already in the most useful cases, and triangles are starting to go out the window for rendering altogether. HW should quit trying to do SFs job and stick to making the best, most accessible processor they can.
 
Here are some graphs illustrating the performance issues of updating the page table mappings in Vulkan ...
View attachment 10838
The first graph above depicts the behaviour of the RTX 4090
View attachment 10839

What you see in these graphs are how long it takes to complete a vkQueueBindSparse (UpdateTileMappings in D3D) call as the hardware queues are getting saturated over each submission. The horizontal axis represents a new iteration for a vkQueueBindSparse submission while the vertical axis represents the total time in milliseconds ...

On the 4090, the first 50 calls of vkQueueBindSparse appear to have no performance impact but then the cost of each new vkQueueBindSparse from the 50th to 150th submission sharply increases. Just before the 4090 reaches the 300th submission, each new vkQueueBindSparse submission will take over 70ms to to complete ...

On the 7900 XTX, the cost of each new vkQueueBindSparse submission will sharply rise at the very start until it's ~40th iteration for which after every new submission of vkQueueBindSparse will take over 100ms to finish ...

So we can clearly identify the performance problems of updating the page table mappings in each of these graphs for hardware virtual texturing implementations on both vendors. Making API calls to update the page table mappings can get prohibitively expensive when done in succession for many new frames as is potentially the case with sparse/virtual shadow maps ...

BTW one of the contributors in the blogpost above attempted to implement hardware virtual shadow maps using tiled/sparse resources but then soon after he realized that there were no caching techniques that would work with the implementation so he eventually gave up and deleted the branch. The unhappy reality was that only software virtual shadow maps were compatible with shadow caching techniques ...

In the next day or so, he spitballed an experiment to test the performance of tiled/sparse binding updates. The origins of these graphs within the prior post came about when collecting data from other configurations for his experiment ...
 
Back
Top