Angle independent AF on Geforce 6

Hyp-X

Irregular
Veteran
I played around with pixel shaders a bit and found a (not too fast) way to achieve angle-independent anisotropic filtering on cards that don't support this feature.
It needs PS 2.x so unfortunately it doesn't work on currently available Radeon cards.

These are some of the results on a Geforce 6800 GT:

Original AF 16x


Angle independent AF 16x


The trick is done with the following shader function:
Code:
float4 tex2D_ai(sampler2D tx, float2 coord)
{
	float2 dx = ddx(coord);
	float2 dy = ddy(coord);
	float a = dot(dx, dx) - dot(dy, dy);
	float b = 2*dot(dx, dy);
	a = a / sqrt(a*a + b*b);
	b = sqrt(0.5 - 0.5*a) * sign(b);
	a = sqrt(0.5 + 0.5*a);
	return tex2D(tx, coord, dx * a + dy * b, dy * a - dx * b);
}

It's only good for square textures in it's current form, it needs some scaling for rectangles.

I might release the test program after some cleanup.
 
Hyp-X said:
It needs PS 2.x so unfortunately it doesn't work on currently available Radeon cards.

What part of that is not supported by PS 2.0 or PS2.0b?
 
What part of that is not supported by PS 2.0 or PS2.0b?
ddx, ddy and tex2D with derivatives are not available on PS_2_0. They are available in PS_2_x though.


great how is the performance?
That tex2D instruction, with explicit derivatives,takes 7 cycles on GeForce 6 and GeForce 7. Those sqrt() functions will take about a cycle each. So 14 cycles for the whole thing doesn't seem all that far fetched.
 
Last edited by a moderator:
I'll post some images tomorrow, as I'm sitting in front of a X800 at the moment.

I've plugged the code into somewhere in our game shaders, and the performance hit was (unsurprisingly) quite large, I'll try to give some numbers later.
 
BRiT said:
I just assumed those where other functions you didn't list... Though it seems like tex2d(s,t,ddx,ddy) is supported in normal ps_2_0 profile.

Well, it compiles to texldd, which says: "This instruction is only supported by ps_2_a. It is not supported by ps_2_b."

I love Microsoft's documentations (NOT).
 
Hyp-X: You did the impossible :D ...

3DCenter: The pattern of the anisotropic filtering of the GeForce 6800 range is similar to that of another well-known company. Formely, Nvidia offered much better quality (with a more severe performance hit, of course). Is this new pattern fixed in hardware, or can we expect an option for full, "old-school"-anisotropic filtering in future driver releases?

Luciano Alibrandi (nVidia): The GeForce 6800 hardware has a simplified hardware LOD calculation for anisotropic filtering, compared to the GeForce FX. This calculation is still fully compliant with the Microsoft DirectX specification and WHQL requirements, as well as matching the requirements for OpenGL conformance. This optimization saves both cost and performance, and makes the chip run faster for anisotropic filtering as well as be less expensive for consumers. Consequently, it may not be possible to re-enable the GeForce FX calculation in this hardware generation. We'll look into it.

I'd like to point out, though, that when our anisotropic quality was FAR better than the competitive product, no one in the website review community praised us for our quality of filtering – they only complained about the performance. In the future, we will plan to make a variety of choices available from maximum (as nearly perfect as we can make) quality to the most optimized performance. It's also interesting to note that although you can run tests that show the angular dependence of LOD for filtering, it is extremely difficult to find a case in a real game where the difference is visible. I believe that this is a good optimization, and benefits consumers.

"no one in the website review community praised us for our quality of filtering – they only complained about the performance"
ehm... explains shimmering a lot :rolleyes:
 
Well I understand what he says - still I felt disappointed seeing the direction NV40 took.

What I did btw, does not produce the same results than the FX cards (or the reference rasterizer for that matter), and produces a small difference with those.

This solution has a far bigger performance hit than a hardware implemented AI-AF would have.
 
Bob said:
ddx, ddy and tex2D with derivatives are not available on PS_2_0. They are available in PS_2_x though.

To clarify, the X800 supports ps2.x. There are really no shader model ps2.a or 2.b, these are just HLSL profiles to match ATI and nVidia extensions to ps2.0. So ps2.x is just 2.0 with extensions, what extensions are supported varies though. The X800 doesn't support the derivatives, but it supports other extensions such as long shaders and no texture lookup limitations.
 
Hyp-X said:
The trick is done with the following shader function:
Code:
float4 tex2D_ai(sampler2D tx, float2 coord)
{
	float2 dx = ddx(coord);
	float2 dy = ddy(coord);
	float a = dot(dx, dx) - dot(dy, dy);
	float b = 2*dot(dx, dy);
	a = a / sqrt(a*a + b*b);
	b = sqrt(0.5 - 0.5*a) * sign(b);
	a = sqrt(0.5 + 0.5*a);
	return tex2D(tx, coord, dx * a + dy * b, dy * a - dx * b);
}

It's only good for square textures in it's current form, it needs some scaling for rectangles.

I might release the test program after some cleanup.
What trick? You calculate the right MIP level, but this shader don't do actually proper 16x AF, as far as I can see.
 
Last edited by a moderator:
Actually this computes the wrong mipmap level. As there will be no anisotropic filtering at these angles, in any case not enough of it to compensate for the huge shift in LOD (-3.25 in some places!), this is bound to produce massive texture shimmering.

Really, the only thing you've achieved is the right shape in the AF tester. But that alone is no solid indicator for texture quality, and it surely is no replacement for AF.
 
zeckensack said:
Actually this computes the wrong mipmap level. As there will be no anisotropic filtering at these angles, in any case not enough of it to compensate for the huge shift in LOD (-3.25 in some places!), this is bound to produce massive texture shimmering.

Really, the only thing you've achieved is the right shape in the AF tester. But that alone is no solid indicator for texture quality, and it surely is no replacement for AF.
That's what I thought at first, too. But the code Hyp-X used should result in both the correct mip-level and degree of anisotropy used.

The troube with angle-dependent AF starts when ddx and ddy are not perpendicular. That means the projection of a square pixel into texture space is not a rectangle, but (approximately) a trapezoid. In that case, a simplified AF algorithm fails to calculate the degree of anisotropy correctly. What Hyp-X does is adjust the derivatives so that they're perpendicular and form a rectangle that is aligned along the longest diagonal of the trapezoid. Additionally, the area of the rectangle as well as the degree of anisotropy are identical to the ones of the trapezoid.

The only problem I see here is that the line of anisotropy goes along the longest diagonal, so if your filter footprint is close to a rectangle, it might suddenly swap from one diagonal to the other, possibly resulting in a visible difference. And samples should be weighted according to their distance from the sample center and shape of the pixel footprint when using that method.

I'd like to see how that looks on a GeForce FX.
 
Last edited by a moderator:
Xmas said:
That's what I thought at first, too. But the code Hyp-X used should result in both the correct mip-level and degree of anisotropy used.

The troube with angle-dependent AF starts when ddx and ddy are not perpendicular. That means the projection of a square pixel into texture space is not a rectangle, but (approximately) a trapezoid. In that case, a simplified AF algorithm fails to calculate the degree of anisotropy correctly. What Hyp-X does is adjust the derivatives so that they're perpendicular and form a rectangle that is aligned along the longest diagonal of the trapezoid. Additionally, the area of the rectangle as well as the degree of anisotropy are identical to the ones of the trapezoid.

I don't follow your descriptions well so I add my own (probably equivalent).

Here's the screen space version of the algorithm description:

The problem is with the simplified AF algoritm is that it gives good results only when the ideal angle of anisotropy is vertical, horizontal, or 45deg diagonal.
So what I do is compute the angle of anisotropy and rotate the derivatives with that angle, so the angle of anisotropy becomes horizontal.
After this the simplified algoritm realizes that the angle of anisotropy is horizontal and takes multiple texture samples horizontally - which is incidentally the correct anisotropic direction on the texture.

The only problem I see here is that the line of anisotropy goes along the longest diagonal, so if your filter footprint is close to a rectangle, it might suddenly swap from one diagonal to the other, possibly resulting in a visible difference. And samples should be weighted according to their distance from the sample center and shape of the pixel footprint when using that method.

You mean when there's no anisotropy?
That should be special cased as (a*a+b*b) is zero in that case.

I'd like to see how that looks on a GeForce FX.

Well I've seen the reference rasterizer - which looks suspiciously similar than the FX - and it made a circle out of the square/flower shape.
 
Last edited by a moderator:
aths said:
What trick? You calculate the right MIP level, but this shader don't do actually proper 16x AF, as far as I can see.

I do not calculate the MIP level. I only rotate the derivatives. MIP level calculation and AF filtering is left to the hardware.
 
Finding the angle of anisotropy.

Some math behind these things.
We want to find the angle of anisotropy (alpha).
alpha is the direction in screen space where the texture derivatives are the longest.

We know the horizontal derivatives (ux, vx) and the vertical ones (uy, vy)
The derivatives in the direction alpha is:

u(alpha) = ux * cos(alpha) + uy * sin(alpha)
v(alpha) = vx * cos(alpha) + vy * sin(alpha)

The length of the derivative:

L(alpha) = sqrt(u(alpha)^2 + v(alpha)^2)

So we want to find the alpha value where L(alpha) has a maximum. This is the same place where L(alpha)^2 has the maximum.

L(alpha)^2 = (ux^2+vx^2) * cos(alpha)^2 + (uy^2+vy^2) * sin(alpha)^2 + (2*ux*uy+2*vx*vy) * sin(alpha)*cos(alpha)

We know that:

cos(alpha)^2 = 0.5 + 0.5 * cos(2*alpha)
sin(alpha)^2 = 0.5 - 0.5 * cos(2*alpha)
sin(alpha)*cos(alpha) = 0.5 * sin(2*alpha)

Which results in:

L(alpha)^2 = 0.5*((ux^2+vx^2+uy^2+vy^2) + (ux^2+vx^2-uy^2-vy^2)*cos(2*alpha) + (2*ux*uy+2*vx*vy) * sin(2*alpha))

Let's define:

A = ux^2+vx^2-uy^2-vy^2
B = 2*ux*uy+2*vx*vy

The following L2 function has maximum at the same spot as L:

L2(alpha) = A*cos(2*alpha) + B*sin(2*alpha)

It's easy to find the maximum of this function is when:

cos(2*alpha) = A / sqrt(A^2 + B^2)

From this half-angle equations can reproduce sin(alpha) and cos(alpha) which defines the angle of anisotropy - and in this case can be used to perform the rotation.
 
Hyp-X said:
You mean when there's no anisotropy?
That should be special cased as (a*a+b*b) is zero in that case.
No, I had my math wrong and thought your method would always result in one of the derivatives going along the longest diagonal in the trapezoid. But that happens only when the derivatives are equal in length (and if they're perpendicular as well, there's no anisotropy). Otherwise, the line of anisotropy will go somewhere in between the derivatives, leaning towards the longer one.

You can avoid having to special-case the division by zero by adding an epsilon value. Then the multiplication by a will result in zero, which is ok.




Here's an image that might make it easier for some people to understand how it works:
gradient.png

Left is screen space, right is texture space. The dashed lines are the pixel grid, the white dots represent the pixel centers. The cyan arrow is ddx, the green one ddy. You can see that, in this case, a pixel projected into texture space results in a trapezoid where the longer diagonal is 2 times as long as the shorter one, and they're perpendicular (2:1 anisotropy). Therefore, Hyp-X's method rotates the gradients so you get the yellow and blue arrows instead for ddx and ddy respectively. If you use these gradients, the orange dots describe where the two samples for 2:1 AF might be taken, and the dotted white boxes show the approximated filter kernel for each pixel.
 
That's really cool. Have you tried to see if the instruction cost of texldd can possibly be hidden by adding a bunch of non-dependent instructions after the texture access?
 
Chalnoth said:
That's really cool. Have you tried to see if the instruction cost of texldd can possibly be hidden by adding a bunch of non-dependent instructions after the texture access?

I fear that texldd has a high cost because the driver/hardware doesn't know the the passed ddx/ddy parameters are coherent in the quad, and basicly calculates things 4 times.
I don't know how founded that theory is, or if the driver could somehow tell the hardware that it's a coherent texldd and execute it faster (I guess no games use that yet, so it wasn't important to optimize for it.)
 
Back
Top