The below is flawed. With trilinear filtering you always sample from one texture that is of greater or equal detail than the screen LOD (n >= 1 below) and from one texture that is less detailed than the screen LOD (n < 1)
Code:
Code:
[Bilinear]
Mip level a: n .... 2n .... 4n
Mip level b: n .... 2n .... 4n
Mip level c: n .... 2n .... 4n
Mip level d: n .... 2n .... 4n
[Trilinear]
Mip level a: n .... 2n .... 4n .... 8n
Mip level b: 0.5n .... n .... 2n .... 4n .... 8n
Mip level c: 0.5n .... n .... 2n .... 4n .... 8n
Mip level d: 0.5n .... n .... 2n .... 4n
Region A: |-------|
Region B: |-------|
Region B is too detailed. It whould be:
Code:
[Trilinear]
Mip level a: n .... 2n .... 4n
Mip level b: 0.25n .... 0.5n .... n .... 2n .... 4n
Mip level c: 0.25n ....0.5n .... n .... 2n .... 4n
Mip level d: 0.25n ....0.5n .... n .... 2n .... 4n
Region B1: |--------|
Region A1: |--------|
Region B2: |--------|
Note that the units are in the size of the mip level in AREA, not length. Hence, each mip is 1/4 the size of its neighbor.
The LOD to mipmap area mapping is NOT linear, so we can't just average this. In addition, the screen area to LOD mapping isn't linear, which complicates things further. Thus the "obvious" average of 3.125 texels per pixel does not apply. If we factor out the LOD to texel area mapping of n^2 we get (2.25^2) texels per pixel, but this is again flawed because the screen to LOD mapping is not linear, and is scene dependant (looking down a cylandrical tunnel? Or two infinite planes?)
However, these nonlinearities can be factored out, since we are only looking for Trilinear texture samples / Bilinear texture samples ratio.
In this case, Trilinear covers EXACTLY 5x the texel area per filtered pixel. At every point, it samples the bilinear filter texel area plus the next more detailed mip level, which is always 4x the detail in texel count. Bilinear filtering is not defined as sampling the "closest" mip level, but rather the mip level that is less than or equal to the screen LOD.
Trilinear does a weighted average of this and the next most detailed mip level which is by definition always greater than or equal in detail than the screen LOD, up to a max of four texels/pixel for this higher detail mipmap.
(ok, so its clamped at just under 4 texels/pixel).
However, what does the texel area per pixel covered have to do with texel bandwith required?
Almost nothing, due to texture caches, yet a lot, due to texture caches.
The texture cache works equally well for both mip levels as long as it is designed intelligently (favors the larger mipmap if there is a collision or is always large enough to fit both).
How large does this cache have to be per texel pipe? Well, lets imagine a 32 pixel wide tile (all recent graphics hardware renders in tiles to get best texel reuse for a cache).
The first line of rendering fetches at most (32 +1) *2 (low res samples) + (32 )* 4 (high res samples).
But each following line only requires (32+1) *1 low res samples (the other possible required texels were in the first rendered line.
and (32 ) * 4 high res samples, which in the worst case are never re-used because each pixel corresponds to exactly 4 texels.
Then why cache these? because this is the worst case, most times the higher res mip level does reuse texels, at at least a clip of 50%.
Overall, the total space needed to cache efficiently for trilinear filtering is sligtly more than 5*2*Tile width or 320 texels in the example above.
This is the per concurrent texture, per pipeline number. That is a little more than 1K for a single texturing single pipeline hardware renderer.
Now what people were asking about was bandwidth, but this can't really be calculated because the worst case is 5x the bandwidth, but this is very unlikely. This depends on the scene, and especially the texture reuse. If the cache is large enough, and many triangles and/or tiles worth of things are rendrerd that use the same textures, then the reuse will be high. WIth maximal reuse, only a few border cases call for texture reloads, and the 5x calculation depends on a full reload, not just a few missing samples that may be in one mip level or another, and not "tied" together in the 4:1 ration that brings about the 5x ratio.
We are also not factoring in one very imprtant effect: The high detail mip of one triangle can be the low level mip of another, and vice-versa. This will cause the apparent 5x bandwidth ratio to break, because this benefit of the texture cache is something that benefits trilinear filtering but not bilinear filtering, which when shifting mip levels has to get new textures, not retrieve older ones already in cache.
The best case scenario is when an entire mip pyramid is required to render say, an infinite plane stretching to the horizon. In this case, during rendering bilinear must fetch the whole textre, and all mip levels at some point.
The same is true of trilinear. If cached well enough, both use exactly the same bandwith: The whole texture mip pyramid, once.