Is bilinear filtering "free"?

mikegi · Jun 29, 2008

I remember reading somewhere that bilinear texture filtering is effectively "free" on modern GPUs. Is that basically correct? For example, I'm working with a scalar texture (L16) in HLSL and want to perform some specific weighting of the values along the x-axis but simple linear filtering in the y-axis.

Originally, I went with point sampling to gather 4 texels in row1 and 4 texels in row2, weighted them, then linearly interpolated between the two. This meant 8 tex2d() calls and a bunch of math. When I tried to expand the filtering to 8 texels in each row, the 16 tex2D() calls + one tex1d() color lookup call made fxc choke with a "texture dependency chain is too complex" error when compiling for SM2.0.

I wasn't happy with all the tex2D() calls and the fact that the 8X sampling would only work on SM3+ so I decided to turn on bilinear filtering on the texture, force the x-axis coord to be on the texel center, but leave the y-axis coord unmodified so that the bilinear filtering in the GPU would do it for me. This cut out quite a few instructions in the pixel shader along with half the tex2D calls.

Obviously, I would have to write a short test app to determine which approach offers the best performance. However, I have to think that a highly optimized GPU function (bilinear filtering) has to be better than all those extra arithmetic and point-sampled tex2D() instructions.

Any thoughts?

Thanks,
Mike

MDolenc · Jun 29, 2008

Yes bilinear filtering is efectively free (there is virtualy no aditional cost for filtering) as long as you don't trash the texture cache.

Enforcer · Jun 29, 2008

Hardware filtering may have limited precision, depending on hardware.
For example:
Tex3D returned only about 8-10 bits precision when sampling L8 texture on G71 (GeForce 7900), even when texcoords were fp32. (not sure about Tex2D)

It works much better now on G92 in DX10 , but probably still less precision than lerp().

And yes it's free in 2D!

mikegi · Jun 30, 2008

I've noticed that the hardware bilinear filtering in ATI GPUs (up to my 3870) is far more grainy than Nvidia GPUs (7900) when zooming in tightly.

I've been testing the new technique and it appears to make a noticeable difference in performance on my Go7800, so I'll stick with it and toss the completely manual filtering I tried first.

Note that this is a scientific app, not a game, so data display accuracy and quality is most important. That might appear to conflict with the type of filtering I'm talking about in this thread (to say the least) but I'm working with data samples that have a continuously varying aspect ratio (1:8 up through 8:1 or more). Simple bilinear filtering can't handle the extreme aspect ratios so I needed an adaptive technique.

Thanks for the info.

sebbbi · Jun 30, 2008

If you only need your program to work on ATI hardware, you can use Fetch4 (proprietary ATI DX9) or Gather (DX10.1) functionalities. Gather gets the four samples (red component only) that would be used for bilinear interpolation when sampling a texture (2x2 single channel texels packed to argb channels). It's very useful if you need to sample adjacent texels in a single channel format (such as the L16 you are using).

Also, you can get bilinear filtering limited on newer chips also, even on ATI's new 4000 series. Here is a quote from Rage3Ds review:

Fillrate

We've tested the RV770's fillrate with a number of apps, and always came up with numbers in the (19,20) GigaTexel interval...too low for a 10 TU/40 TA&TF part - should be near the 25 GigaTexel mark (625*40/1000). For all those of you who cried "FOUL!", relax for a bit - we actually went and asked ATi why we were getting this behavior and they explained that in spite of having 10 TUs on chip, there are only 32 bilinear interpolators, so the INT8 bilinear rate will be limited by that. Once you get filtering limited, with Anisotropic Filtering, or by filtering FP textures, the architecture should behave like a 10 TU part. We've verified this using FP texturing, because that has also allowed us to check the assumption that the RV770 does FP filtering at half-rate

Enforcer · Jun 30, 2008

I found some official info about precision, this is from CUDA but should give some idea about current or future GPUs:
http://developer.download.nvidia.com/compute/cuda/2.0-Beta2/docs/Programming_Guide_2.0beta2.pdf

Code:

 tex(x, y) = (1−α )(1− β )T[i, j] +α (1− β )T[i +1, j] + (1−α )βT[i, j +1] +αβT[i +1, j +1]
for a two-dimensional texture,
...
for a three-dimensional texture,
...
 i = floor(xB ) , α = frac(xB ) , xB = x − 0.5 ,
 j = floor( yB ) , β = frac( yB ) , yB = y − 0.5 ,
 k = floor(zB ) , γ = frac(zB ) , zB = z − 0.5 .

[B]α , β , and γ are stored in 9-bit fixed point format with 8 bits of fractional value.[/B]

Coordinates -> 8 bit...(samples fp32)

cho · Jun 30, 2008

http://www.pcinlife.com/article/graphics/2008-06-30/1214766055d534_4.html

mikegi · Jun 30, 2008

Here are a couple of images showing the differences between ATI hardware bilinear sampler and Nvidia's when I zoom in tightly on the data. The first is from my Nvidia Go7800:

http://www.grlevelx.com/downloads/bilinear_nvidia7800.png

Not bad at all. A little choppy on the upper-right area. Now here's the same data and zoom level on my ATI 3870:

http://www.grlevelx.com/downloads/bilinear_ati3870.png

Ugly! You can clearly see the banding in the sampling in both x and y.

From Enforcer's post, it looks like Nvidia basically subdivides a texel into 256x256 point samples. I wonder what ATI does. I'm not sure where the "graininess" comes into the ATI bilinear sampler. They might use less precision in the position interpolator or less precision in the texture value equations. I know that performing the bilinear filtering manually in a pixel shader eliminates all the graininess in the hardware bilinear filtering.

Mike

CarstenS · Jul 11, 2008

Might be worth a try: Disable "Catalyst A.I." in the driver's options - maybe there's some... unexpected behaviour wrt your application.

According to AMD, AI is not to reduce image quality. If it does, it's a bug.

Mintmaster · Jul 12, 2008

CarstenS said:
Might be worth a try: Disable "Catalyst A.I." in the driver's options - maybe there's some... unexpected behaviour wrt your application.

I can't see how driver settings have anything to do with what mikegi is seeing. This is a hardware deficiency.

ATI has about 1.5 fewer bits of precision when determining the filter weights. It's rarely a problem for image quality because it's not often that textures are that magnified with that much contrast between adjacent texels. Generally you only use filtering for games, as even NVidia's precision is insufficient for anything beyond that.

mikegi, I assume you used a CMP function on one channel of a texture to choose between yellow and green, right? Try using the alpha channel to see if there's any difference. If not, then there's a real possibility of alpha tested foliage or fences looking more jagged on ATI's hardware than NVidias when it's close to the camera.

CarstenS · Jul 12, 2008

Mintmaster said:
I can't see how driver settings have anything to do with what mikegi is seeing. This is a hardware deficiency.

Some levels of A-Interference allegedly force some kind of texture compression - possibly not a lossless one. Thought that running a quick test with just a switch turned the other way might be worth a try. *shrugs*

mikegi · Jul 15, 2008

Mintmaster said:
mikegi, I assume you used a CMP function on one channel of a texture to choose between yellow and green, right? Try using the alpha channel to see if there's any difference. If not, then there's a real possibility of alpha tested foliage or fences looking more jagged on ATI's hardware than NVidias when it's close to the camera.

No, I do the bilinear sampling on the L16 data texture then use the result to look up the output color in a 1D color table texture.

I'm not sure if the grainy ATI bilinear filtering is due to the hardware tex coord interpolation or the hardware data filtering (or both). As I noted earlier, if I do the bilinear filtering manually in the pixel shader then the result is perfectly smooth.

I'd say this is just an academic exercise but I actually saw the ATI grainy output on TV when a user zoomed in very close to a storm cell (my program is a weather radar viewer).

Mike

Mintmaster · Jul 16, 2008

mikegi said:
No, I do the bilinear sampling on the L16 data texture then use the result to look up the output color in a 1D color table texture.

I'm not sure if the grainy ATI bilinear filtering is due to the hardware tex coord interpolation or the hardware data filtering (or both). As I noted earlier, if I do the bilinear filtering manually in the pixel shader then the result is perfectly smooth.

To do bilinear filtering manually you need the texcoord, so that must be fine. It's the filtering units.

It's very rare that filtering weight precision is a problem. Even with EVSM (exponential variance shadow mapping) or SAT-VSM, which are AFAIK the graphics techniques that need more texture precision than any other, don't really care about the precision of the weights.

For your application, is NVidia's precision good enough? If L16 is your data source for your lookup, manual filtering makes sense to me. I doubt you'll see an inordinant performance drop. ATI has fetch4 as well to simplify everything.

mikegi · Jul 17, 2008

Nvidia's filtering is good enough and ATI's is fine 95+% of the time. Note that in the images I posted earlier you're looking at just a couple of texels. That's that level of zoom where the chunkiness becomes visible on ATI. I don't imagine that many 3D programs would do such a thing so you never see the precision problem in them.

Is bilinear filtering "free"?

mikegi

MDolenc

Enforcer

mikegi

sebbbi

Enforcer

cho

mikegi

CarstenS

Moderator

Mintmaster

CarstenS

Moderator

mikegi

Mintmaster

mikegi

Similar threads