My understanding of how texture caches work is hopelessly out of date ... anyone know of any papers which explore it for modern architectures?
The optimal design for cache optimized for 4x bilinear sampling per clock (like Evergreen/Fermi) I would guess to be something like fully associative, even/odd line ordered, 128 byte cache lines with 8x 128 bit banked ports, and a couple of coalescing stages in the pipeline to accumulate multiple cycles worth of texture accesses to form accesses with minimal bank conflicts. You could get away with not using banks I guess (which allows you to avoid having to put 8 ports on the tag part of the cache) but they would be really nice to increase the chances of getting hits for less neat accesses.
What do you do with hits? Do you maintain a buffer of sample instructions and hits and just add the misses afterwards?
The optimal design for cache optimized for 4x bilinear sampling per clock (like Evergreen/Fermi) I would guess to be something like fully associative, even/odd line ordered, 128 byte cache lines with 8x 128 bit banked ports, and a couple of coalescing stages in the pipeline to accumulate multiple cycles worth of texture accesses to form accesses with minimal bank conflicts. You could get away with not using banks I guess (which allows you to avoid having to put 8 ports on the tag part of the cache) but they would be really nice to increase the chances of getting hits for less neat accesses.
What do you do with hits? Do you maintain a buffer of sample instructions and hits and just add the misses afterwards?
Last edited by a moderator: