Maybe I'm missing something, but how would the ddx/ddy calculation work in that compute shader? How do you know the neighboring pixel is part of the same object and what happens if the neighboring pixels are all of different objects? The only safe way I can think of implementing this is to also store ddx/ddy, and then you aren't really saving much bandwidth and your texture instructions go at 1/4th the normal speed because they work on individual pixels instead of quads...
Didn't explain it, because I tried to keep my post as short as possible (this is off topic discussion after all)... but I failed miserably
Lets first go though the easy case of bilinear filtering from virtual texture. In this case, the texture coordinate already implicitly contains the mip level, as the indirection texture lookup transforms the texture coordinate to the correct 128x128 pixel page depending on the mip level (gradients). Basically the x,y texture coordinate pair contains all the info you need. Bilinear filtering isn't exactly a hot technique itself, but if you use anisotropic mip hardware to calculate the lod level based on gradients (min of x,y clamped to max+1 instead of max) you will get higher detail on slopes. I call this "bilinear anisotropic", and we used it in Trials Evolution (hacks like this are required for 60 fps on current gen consoles).
Trilinear isn't much harder. The virtual texture indirection lookup basically truncates the mip value to floor(mip). That data is implicitly stored to the x,y texture coordinate pair for free. All extra data you need to store is the fraq(mip) portion. 4 bit normalized [0,1] integer is enough for this purpose. We are blending between two adjacent mip levels after all, not two completely different images (**), so using 8 bits or even more (floats
) is pure overkill for this purpose. However if you have a traditional texture atlas (and not a virtual textured atlas), the texture coordinate doesn't implicitly contain any extra information about the mip level, and you need to have extra bits to store the mip level.
Anisotropic filtering with virtual texturing is still a topic that hasn't been researched a lot ("Carmack's Hack" being the state of the art for performance
). Anisotropic filtering can be approximated just by using the trilinear version above, and adjusting the mip calculation based on the gradients (just like we did for our "bilinear anisotropic" for Trials Evolution). This doesn't require any extra g-buffer storage, but can sometimes result in slight oversampling (FXAA in Trials Evolution did take care of that). Of course this isn't a perfect solution, and we absolutely need to do better in the future. Good texture filtering quality is as important as good antialiasing quality.
If you want to do proper anisotropic filtering, you obviously need to store both gradients. Virtual texture indirection lookup points you to a location that stores the most detailed data you need (minimum of the gradients). Both gradients are increments to this value. The smaller gradient increment is always in range of [0,1] (when measured in mip levels), the larger can be more than that (but it's always positive). Again 4 bits should be enough for the first one, and if we share a 16 bit value for them, we have 12 bits remaining. That's more than enough for the second gradient bias.
Another way to approach this problem is to prefilter 128x128 tiles as 128x64 and 64x128 anisotropic tiles. Now we also use the gradient values to adjust the indirection lookup before storing the texture coordinate to the g-buffer. We can store these tiles adjacent to the original 128x128 (splitting the cache basically to 256x128 tiles) if we do not want to increase the indirection texture size (as coordinate bias to anisotropic pages is easy to calculate). Alternatively we can use a hash instead of the indirection texture (cuckoo hash is guaranteed O(1), can be easily coded with no branching/flow control, has no dependency chains and benefits nicely from GPU latency hiding). As a extra bonus, this technique saves bandwidth compared to standard anisotropic filtering, but it doubles the virtual texture cache atlas size.
The last, and the most ambitious way is to store no texture coordinate data at all. Use rasterization only for a depth pre-pass. Depth value is translated to a 3d-coordinate in the lighting shader (all deferred renderers do this already). If you have unique mapping in the virtual texture (***), you can do a hash lookup using this world coordinate to get the virtual texture coordinate. Naive thing would be to add all virtual texture pixels to a hash based on their 3d world coordinates (and update the hash whenever a page is loaded). A better way would be to have a sparse multilayer volume texture where the texture coordinates could be queried (this is basically a hash as well, but hash nodes are (8x8x8) volumes instead single pixels, and it would be easy to query if GPU has paged virtual memory, AMDs PRT OpenGL extension for example). It would contain only the surfaces visible in the screen (or virtual texture cache, because it's a superset of screen pixels). This kind of structure wouldn't need to be super high resolution, because texture coordinates are linearly interpolated along polygons (linear filtering from volume texture would work just fine).
(**) When using trilinear filtering, the virtual texture atlas has a single mip level. This allows you to use hardware trilinear filtering to blend between the current level and one below it. It increases virtual texture atlas memory consumption by 25%. That's usually not a big deal.
(***) You would want to have unique mapping for other purposes as well. It allows you to have unique decals on all your objects in world, and it allows you to precalculate object based texture transformations to the virtual texture cache (for example colorization). Unique virtual mapping shouldn't be confused with unique physical mapping. You don't need to store all versions of pages to the hard drive (like Rage does), you can burn the decals (and colorizations, etc) to pages during page loading.
Sure, but ultimately this will be solved by bindless textures/resources referenced in constant buffers or similar. i.e. you just store a pointer/offset to the material data in the G-buffer and look up into it in the deferred pass. There may be an interim time when people use virtual texturing for this, but in the long run there's no need for atlases/continuous address spaces for this kind of work. This will be far more efficient than redundantly dumping all this data into the G-buffer itself (much of it is constant), and basically reduces the role of the rendering pipeline to just rasterization (and perhaps displacement mapping) and some basic attribute interpolation.
Absolutely. Fully featured GPU virtual memory and data addressing is the future. AMD is touting it with HSA, Nvidia is touting it with Kepler, even ARMs Mali-T604 papers talk about GPU virtual memory. AMDs PRT OpenGL extensions are the first developer controllable virtual memory API for GPUs. It's currently only available for texturing (and has pretty big 64 kB pages), but it's a very good first step. I hope will will soon have unified 64 bit address space between CPU and GPU with same sized (preferably small 4 kB pages) and total developer control over handling page faults and virtual mappings. That would allow us to do all kind of crazy deferred rendering implementations
Anyways, this is an interesting conversation but we're pretty off-topic for this thread... might be worth someone splitting this?
Agreed. This is indeed a interesting topic, and unfortunately something that's not been discussed enough.
--> Please someone move this discussion to it's own thread. Thank you!