Different filtering methods

Zvekan · Jul 14, 2003

Can anyone point me to a web page with explanation of bilinear, trilinear and anizo filtering methods?

Browsing forums I came across firstcoloredmip x command for UT that shows mip-levels used and transitions from one to another level.

I own a Ti4200 and have Dets 44.03 installed with Image Settings set to Quallity.

Now when anizo is turned off in drivers tranzitions are very step between mip levels no matter if trilinear filtering is turned on or off in UT.

Setting anizo to 8x tranzitions become very nice, but turning trilinear on or off doesn't seem to make a difference.

So in what correlation are anizotropic and bilinear or trilinear filtering?

Zvekan

P.S. Sorry about my english

Dave H · Jul 14, 2003

Anisotropic filtering explained

Tokelil · Jul 14, 2003

Just to get this right...
Memory used for the different methods:
Point sampling : x1
Bi-Linear : x2
Tri-Linear : x4
AF x2 : x8 ?
AF x4 : x16 ?

So AF x4 is taking 16 samples from 2 different mipmap levels?

The computation hit (number of clockcycles used on getting each color) is the same as the memory hit I guess?

KimB · Jul 14, 2003

Tokelil said:
Just to get this right...
Memory used for the different methods:
Point sampling : x1
Bi-Linear : x2
Tri-Linear : x4
AF x2 : x8 ?
AF x4 : x16 ?

Texture filtering methods in current video cards use zero additional memory.

Bilinear filtering samples from one MIP map.
Trilinear filtering samples from two MIP maps.

Bilinear and trilinear filtering can both be used in conjunction with anisotropic filtering.

When no anisotropic is used, there is one bilinear sample taken per MIP map sampled (each bilinear sample is four texture samples), meaning trilinear filtering can be roughly described as doing bilinear filtering twice.

When anisotropic is used, the video card determines for each pixel, based upon the angle and distance of the surface, how many bilinear (or trilinear) samples to take and in what pattern those samples should be distributed. It is very hard to pin it down more than that, since each video card applies anisotropic filtering in a different fashion.

But the essence of anisotropic filtering is this: without anisotropic filtering, the video card makes textures blurry to prevent texture aliasing at high angles. With anisotropic filtering, the video card instead applies more processing power to a given pixel that needs it due to angle, instead of making it blurrier. The degree of anisotropic filtering determines the maximum angle up to which the video card will continue to add more samples.

Tokelil · Jul 14, 2003

Thx for the info Chalnoth! (Though I have a bit of a problem seeing how there can be no bandwidth hit, if more sampels is taken!)

Pete · Jul 14, 2003

Well, there's a difference between "memory used" (as you said) and bandwidth used. AF doesn't use more physical memory, but, as you can imagine, the additional sampling requires additional memory bandwidth.

GraphixViolence · Jul 14, 2003

Tokelil said:
Just to get this right...
Memory used for the different methods:
Point sampling : x1
Bi-Linear : x2
Tri-Linear : x4
AF x2 : x8 ?
AF x4 : x16 ?

Well, first all, you're talking about memory bandwidth, not amount of memory. Each texture sample has to be read from memory, which consumes bandwidth. Second, AF is independent of whether bilinear or trilinear filtering is used.

A more correct summary would thus be as follows:

Point sampling: 1 sample
Bilinear: 4 samples (from 1 mipmap level)
Trilinear: 8 samples (shared between 2 mipmap levels)
2x AF Bilinear: 8 samples (from 1 mipmap level)
2x AF Trilinear: 16 samples (shared between 2 mipmap levels)
4x AF Bilinear: 16 samples (from 1 mipmap level)
4x AF Trilinear: 32 samples (shared between 2 mipmap levels)
8x AF Bilinear: 32 samples (from 1 mipmap level)
8x AF Trilinear: 64 samples (shared between 2 mipmap levels)
16x AF Bilinear: 64 samples (from 1 mipmap level)
16x AF Trilinear: 128 samples (shared between 2 mipmap levels)

Of course, different graphics vendors might interpret these settings differently. Note that the memory bandwidth requirements for the higher levels of AF are so high as to be totally impractical without the use of "adaptive" techniques (i.e. only applying full AF on the parts of the image that actually benefit from the additional samples).

Tokelil · Jul 14, 2003

Ahhh well memory bandwidth is what I meant...

demalion · Jul 14, 2003

Well, then there's the texture cache too, but I don't know if the effect on bandwidh requirements can be represented in a simple fashion for the varying architectures.

mboeller · Jul 15, 2003

I would like to know myself how much texture bandwidth the different filtering methods need?

How efficient is an Cache with all the different filtering methods? Does Trilinear Filtering need two caches or is the texture cache emptied, or..?

I know that the real bandwidth demand is dependent on the specific asic implementation, but I would like to know an rough figure, maybe compared to pure bilinear filtering.

Thanks

Dave H · Jul 15, 2003

A small texture cache is extremely efficient for doing bilinear filtering. It depends based on the angle and position of the surface (and any LOD bias), but the basic mipmap selection algorithm works such that each additional bilinear fragment requires around one texel sample from memory. In other words, with a texture cache, bilinear's bandwidth needs are just about the same as for point sampling.

This may seem a bit counterintuitive at first. Here's a quick, simplified (and not entirely correct) thought experiment that might help out. Imagine you want to texture a 100*100 pixel square parallel to the screen (i.e. "2d"). Using point sampling, it would be perfect if the texture you were using were 100*100. What about if you were using bilinear filtering? What size texture would you want then?

Well if it were a 1*1 square you'd want a 4 texel texture, obvoiusly. If it were a 2*2 square you would want 9--a 3*3 square texture. And so on. For the 100*100 pixel square you want a 101*101 texel texture. In the limit, you fetch 1 new texel for each rasterized pixel. And, in the limit, each texel gets sampled 4 times--that's 4 samples for the bandwidth price of 1, as long as it doesn't get evicted from cache first. And how big would the cache have to be to prevent that? Well, if you think about it, not very big--just big enough to hold two scanlines.

Of course under real-world conditions the mipmap selection algorithm is much more complicated, as it needs to take into account distance from the viewer and the angle in viewspace. Plus there's the fundamental difference that mipmaps only come in certain sizes--there isn't the "perfect size" texture just lying around for you to sample. But the overall point it is averages to around one texel per pixel, and the above analysis gives a hint of why this is correct, or at least plausible.

Ok, so we've established that bilinear isn't really a bandwidth hit over point sampling. What about trilinear? Trilinear uses the same mipmap selection algorithm as bilinear, except it samples from the two closest-sized mipmaps on either side, instead of just the closest one. In theory you'd think this would lead to exactly twice the required memory bandwidth. In practice the actual amount is a bit less than twice, because there exists a largest mipmap (namely the base texture).

But it's important to realize that every GPU out there (except for the original GeForce 256, and that was due to a bug) is capable of one bilinear sample per TMU per clock. In other words, you need two TMU-clocks per trilinear fragment. And it will probably always be this way, even though no one in their right mind (um, except apparently ATI and Nvidia

) would dream of using bilinear over trilinear in this day and age. The reason is that there are many ways textures are used other than just as colors to slop onto surfaces; and some of these ways have a use for bilinear filtering but no use for trilinear (e.g. light maps). (Similarly, some can't be used with any linear filtering, e.g. normal maps.) The obvious design compromise, and the one taken by every GPU, is to make bilinear the one-per-TMU-cycle operation.

So while trilinear almost doubles the texture bandwidth requirements...it actually does double the fillrate requirements. And it does nothing to the other per-fragment bandwidth costs, like color writes and z read/writes. In other words, although trilinear significantly raises the required bandwidth per pixel, it actually lowers the required bandwidth per clock. Trilinear samples are well-behaved with respect to texture cache, as well. (Although you might need a cache almost twice the size.)

If you analyze aniso, it comes out much the same. More samples and thus more bandwidth, but at the cost of more fillrate resources. Anisotropic samples may be less well behaved with respect to texture caches, depending on the sample distribution. But this is probably not enough to make up for the much lower bandwidth/clock costs of spending so long on each pixel. (Remember, AF is only applied to those pixels that need it, and only to the degree that they need it.)

To sum up: although at first glance it would appear that better texture filtering would require greater memory bandwidth resources, in reality the opposite is true. The biggest cost is the on-chip logic, buses and cache to support sampling 4 texels per TMU per clock. And as you scale up to better filtering--trilinear and then anisotropic--the bandwidth costs rise more slowly than the fillrate requirements, at least with any sensible design.

End result--better filtering does not require more bandwidth (per clock)!

Mintmaster · Jul 17, 2003

VERY well put Dave H. A lot of people make erroneous assumptions about bandwidth requirements by not including texture cache effects, thinking a 32-bit trilinear filtered texture requires 256-bits of memory access each pixel.

Texture caches are good enough that I don't think it is that common (i.e. <20%, maybe much less) that you load a texel from memory more than once while rendering, unless it appears more than once, the tile size is large, or if dependant texturing is used.

Just a little note about trilinear filtering. One mip map is always 1/4 the resolution of the other, so texture bandwidth requirements per pixel (once we're in minification range) only go up by 25%, not almost double. Usually texture bandwidth is a good deal smaller than Z/Colour bandwidth, so overall trilinear filtering doesn't affect total bandwidth required per pixel much.

However, as you noted, trilinear filtering requires potentially double the clocks (again, in minification range) so bandwidth requirements per clock go down significantly to nearly half.

Dave H · Jul 20, 2003

Mintmaster said:
Just a little note about trilinear filtering. One mip map is always 1/4 the resolution of the other, so texture bandwidth requirements per pixel (once we're in minification range) only go up by 25%, not almost double.

I'm sure you're right, but I'm a bit confused as to why. (I did think about the assertion that trilinear requires twice as many texel references (in minification range) before I wrote it.)

As I see it, it all comes down to how you do mipmap selection. If the rule is to always use whatever mipmap you would choose with bilinear, and then supplement it with the next highest (i.e. 4x smaller) mipmap, then you would obviously be right. Just to be overly clear, in this scheme the lower mipmap would be larger than ideal half the time, and smaller than ideal half the time (and the higher mipmap would of course always be smaller).

If the rule is to just use the two closest mipmaps--one larger than ideal and one smaller than ideal--then I'm fairly sure I would be right; the number of texels fetched should double. Simple thought experiment: say you're right on the borderline between two mipmaps according to the bilinear selection algorithm. The higher mipmap has n texels and the lower one 4n texels. Using this mipmap selection scheme for trilinear, you would always sample 5n texels. With bilinear, you would either sample 4n texels or n texels, depending on which side of the border you happened to be on; the average, then would be 2.5n--in other words, half the amount of trilinear.

The question, then, is which method gets used in practice. My recollection was that the latter scheme was used. But thinking about it I can't come up with a good reason why that wouldn't lead to texture aliasing.

And yet when I cross-reference my thoughts with what we see in those handy-dandy colorized mipmap screenshots, I can't help but think that I'm right. As I see it, if you selected mipmaps according to the first method, turning on trilinear would not just blend the colored mipmaps but make them appear to move 1/2 a mipmaps closer to the camera. And as far as I recall that doesn't occur.

So what am I missing here?

Xmas · Jul 20, 2003

Dave H, you're correct there. When using trilinear filtering, the fractional part of the calculated LOD value determines how much each of the two mip levels contributes to the final color, while the LOD value is rounded when using bilinear filtering.

bloodbob · Jul 20, 2003

does anyone think there is hope for a cubic magnification filter in the next 10 years ( other then on the matrox cards ).

*I MENT 3DLABS not MATROX*

Dave H · Jul 20, 2003

Xmas said:
Dave H, you're correct there. When using trilinear filtering, the fractional part of the calculated LOD value determines how much each of the two mip levels contributes to the final color, while the LOD value is rounded when using bilinear filtering.

Am I then correct that trilinear fetches (on average) twice as many texels as bilinear? (Ignoring magnification situations.) And mightn't trilinear--by sampling a mipmap as low as -1.0 from the calculated LOD--cause texture aliasing? Or does the fact that that mipmap would contribute very little to the final blend take care of the aliasing issue?

Dave H · Jul 20, 2003

bloodbob said:
does anyone think there is hope for a cubic magnification filter in the next 10 years ( other then on the matrox cards ).

I'm not an expert, but my totally uninformed guess is yes, assuming we're still using textures-on-polygons at the time. Simply because processing power grows at a faster rate than memory capacity, and much faster than memory bandwidth.

(By "processing power" here I mean relative to the particular problem of magnified texture quality; in a more general sense, you might say that processing power and memory capacity grow at the same rate, dictated by Moore's Law.)

Hyp-X · Jul 20, 2003

Dave H said:
Am I then correct that trilinear fetches (on average) twice as many texels as bilinear? (Ignoring magnification situations.)

With bilinear the texel/pixel value oscillates between 0.25 and 1 .
With trilinear it's between 0.5+0.125=0.625 and 2+0.5=2.5 .

So that's a 2.5x increase of bandwidth (average).

Xmas · Jul 20, 2003

Hyp-X said:
Dave H said:

Am I then correct that trilinear fetches (on average) twice as many texels as bilinear? (Ignoring magnification situations.)

Click to expand...

With bilinear the texel/pixel value oscillates between 0.25 and 1 .
With trilinear it's between 0.5+0.125=0.625 and 2+0.5=2.5 .

So that's a 2.5x increase of bandwidth (average).

Where do you get those numbers from?
Let's ignore magnification which is always bilinear.
Also, let's assume LOD is determined per polygon for simplicity. And let's assume texture cache is large enough to keep a whole texture.

Depending on the fractional part of the LOD value, bilinear either needs to sample from a mipmap that is 4n bytes in size (if fract(LOD) < 0.5) or another mipmap that is n bytes in size (if fract(LOD) >= 0.5). Both cases are equally likely, averaging 2.5n bytes that have to be transmitted from memory to texture cache. Trilinear needs to sample from both mip levels, so it needs 5n bytes.

Of course texture cache is usually not able to hold two complete mipmap levels, which means with trilinear you have to read more texels more than once than with bilinear. OTOH, in magnification cases trilinear is no different than bilinear. So, double bandwidth requirement seems to be a good guess.

Hyp-X · Jul 20, 2003

Xmas said:
Also, let's assume LOD is determined per polygon for simplicity. And let's assume texture cache is large enough to keep a whole texture.

Ok.

Depending on the fractional part of the LOD value, bilinear either needs to sample from a mipmap that is 4n bytes in size (if fract(LOD) < 0.5) or another mipmap that is n bytes in size (if fract(LOD) >= 0.5). Both cases are equally likely, averaging 2.5n bytes that have to be transmitted from memory to texture cache.

Let's take a function f(x)=x^2.
f(0) = 0
f(1) = 1
Both cases are equally likely, averaging 0.5
Do you agree?
I hope not. :!:

Trilinear needs to sample from both mip levels, so it needs 5n bytes.

Code:

[Bilinear]
Mip level a:   n .... 2n .... 4n
Mip level b:                   n .... 2n .... 4n
Mip level c:                                   n .... 2n .... 4n
Mip level d:                                                   n .... 2n .... 4n

Only one miplevel sampled at one point.

[Trilinear]
Mip level a:   n .... 2n .... 4n .... 8n
Mip level b:         0.5n .... n .... 2n .... 4n .... 8n
Mip level c:                         0.5n .... n .... 2n .... 4n .... 8n
Mip level d:                                         0.5n .... n .... 2n .... 4n

Two miplevels sampled at every point.

Different filtering methods

Zvekan

Dave H

Tokelil

KimB

Tokelil

Pete

Moderate Nuisance

GraphixViolence

Tokelil

demalion

mboeller

Dave H

Mintmaster

Dave H

Xmas

Porous

bloodbob

Trollipop

Dave H

Dave H

Hyp-X

Irregular

Xmas

Porous

Hyp-X

Irregular

Similar threads