Texture caches themselves are very efficient as texturing access is typically quite regular and predictable (barring indirect access through a pixel shader using say, a bumpmap to offset the final texels being looked up, in which case access patterns can become majorly chaotic). However you still need to put stuff into the texture cache first before you can read from it, so basically every texturing operation involves a stall, which can last 1000+ clock cycles. So to get around that, GPUs juggle thousands of these pixel (and vertex) job threads, most of which are bound to be stalled while waiting for data at any one time. When you can have enough of these threads flying around at any one time, theoretically enough data should have trickled in so you have something to do all the time and can keep the hardware busy that way.Thinking a bit about caches, I kind of wonder what a texture cache's efficiency is and how much a miss actually costs.
Because of this juggling going on it's probably very difficult, if not outright impossible to accurately measure the latency of any one single set of pixel operations. It's probably not something AMD, Nvidia etc document publically (low-level hardware information is often surrounded by trade secrets and all that jazz), and also, it's the GPU that schedules these threads by itself. While it's probably controlled by an algorithm of some sort (which may again be an undocumented trade secret), there's probably a lot of flexibility in what order it actually complete each batch of threads/pixels, making any measurement unpredictable.
So AFAIK we don't know exact latencies, and it's probably hard to find out, but then again we don't really have to know either. It's not that interesting a number, except for the engineers who work on designing these things in the first place. As users, we want smooth framerates, so what counts is that drawing of each frame finishes in a short, even timespan.
I believe the base formula per pixel is four texels per MIP map times 2 MIP levels for trilinear, but more, or possibly less, for anisotropic filter, so generally 8 texels = 32 bytes for 32-bit RGBA texture map. More for "deep" format textures. But then there's texture compression, so you'll never hit these high numbers except when reading from render target buffers, which will be uncompressed since you're generating them in realtime. Repeat until you've textured every pixel of the whole screen. Of course, this doesn't include multitexturing, in which case you will need to multiply with the number of layers per pixel for that particular polygon. ...And then there's overdraw, but that's highly variable so hard to put any single number on.Perhaps it is possible to estimate how much texel bw is needed for 1280x720 anyways.
...Or my math's off, but then hopefully one of the 3D wizards in this forum will come flying in and stomp all over me.
Last edited by a moderator: