How much angle independant AF would be needed before Summed Area Table is quicker?

bloodbob

Trollipop
Veteran
How much AF would be needed before Summed Area Table is quicker?

How many AF samples on a plane(s) that run either horizontally and/or verticle along the screen would have to be used before Summed Area Table would have better memory preformance?
Edit: adaptive AF is okay no need to do 16xAF or whaterverxAF on every pixel on the screen just as long as its axis independant.

Talking simply the memory problems rather then the computational ones.
Because well for a 2048x2048 image you need atleast 22 extra bits.

SAT would probably use 16 texture looks up ( bilinear interpolation ).

But well the biggest problem with SAT texturing is the fact it doesn't have the cache coherency.

So can someone smart and wiser give me an answer?
 
Last edited by a moderator:
There are a number of other problems with SAT beyond the poor cache coherency and the large data element size. Cube maps are one IIRC (one paper suggested converting them to dual paraboloid maps), and filter kernels that run diagonal to the texture grid pattern would seem to cause sampling of quite a few texels outside of that kernel. While the constant time nature (as opposed to quadratic time with AF) of SATs seems nice it's perhaps telling that no major IHV has added native hardware support for them to my knowledge.

As usual, grain of salt. I'm only a hobbyist graphics programmer to keep myself sane from all the mindless ASP/PHP/CF/JSP shit I do all day.
 
Last edited by a moderator:
Yes I know the limitation.
nice it's perhaps telling that no major IHV has added native hardware support for them to my knowledge
Yes and no IHV support 256x AF either so it really doesn't say all that much.
 
Why would you want to compare angle-independent AF with angle-dependent AF via SAT? A rectangular kernel aligned with the texture space axes is a pretty heavy limitation, and though you can alleviate it a bit with an additional, diagonal SAT, it's still not the quality you'd expect from angle-independent AF.

However, if SAT lookups were directly implemented in the TMU, you could drop mipmapping, and maybe use a single lookup for axis-aligned anisotropy while taking multiple box samples along the line of anisotropy, like other AF methods, if it's not axis-aligned.

Regarding cache performance, sample reuse should be quite good (if the kernel of one pixel starts where the kernels of neighboring pixels end). It's just that those texels in a block that aren't used mean wasted bandwidth. If you could read arbitrary 2x2 texel blocks from memory, that would be a huge improvement.
 
Xmas said:
Why would you want to compare angle-independent AF with angle-dependent AF via SAT? A rectangular kernel aligned with the texture space axes is a pretty heavy limitation, and though you can alleviate it a bit with an additional, diagonal SAT, it's still not the quality you'd expect from angle-independent AF.
Ehh correct me I'm I'm wrong but wouldn't you still have exactly the same problems with angle dependat AF? since its dependance isn't on texture its on the orrientaion of the triangle in screen space ( vertical and horizontal trianlge get full AF where as other angles don't get full AF) or am I completely loosing it? If I'm not going insane then why did I wan't angle independant well because I wanted to simplify things so I don't have to worry about the orrientation of the triangle.
 
Last edited by a moderator:
I'm not sure what you're trying to get at. I'm just saying that the original question, "How much angle independant AF would be needed before Summed Area Table is quicker?" is somewhat skewed because angle independent AF gives better quality than SAT because SATs are inherently angle-dependent if used for AF (though in this case, it's the angle in texture space, as opposed to the angle between the gradients with other angle-dependent AF methods).
 
Xmas said:
I'm not sure what you're trying to get at. I'm just saying that the original question, "How much angle independant AF would be needed before Summed Area Table is quicker?" is somewhat skewed because angle independent AF gives better quality than SAT because SATs are inherently angle-dependent if used for AF (though in this case, it's the angle in texture space, as opposed to the angle between the gradients with other angle-dependent AF methods).
Fine okay I'll rephrase my question so that I get angle independant AF regardless of wether or not your using angle indepedant AF.

How many AF samples on a plane(s) that run either horizontally and/or verticle along the screen would have to be used before Summed Area Table would have better memory preformance?

You'll still have all the texture space problems but now there won't be screen space problems ( which are extermly implementation specific ) happy.
 
Last edited by a moderator:
On a related note to this thread, I've always wondered why the IHV's don't combine rip-mapping with their AF implementations to save time when they can. Slightly off axis AF footprints can be covered using fewer samples this way, and in the worst case the hardware will just do what it currently does.

Will thrashing due to sampling so many different rip maps be too much of a problem?
 
One issue might be the fact that ripmaps (assuming you store every possible LOD level) take up four times as much memory as the original texture does, versus only ~33% more for standard mipmapping.
 
With a Summed Area Table AF algorithm you would presumably sample in nearly the same locations for neighboring pixels, so you will get a bit of cache coherence, but less than with the kind of bi/trilinear AF gives. So let's say a ~2X bandwidth hit compared to standard bilinear lookups due to weak coherence.

The issue that you need to store numbers with such great precision (+22 bits per color component) for SATs will give you an additional ~4x bandwidth hit, assuming RGBA32 textures.

With SATs, you need 4 bilinear-filtered accesses, so we get a relative cost of 2*4*4=32.

With trilinear aniso, you need to perform 2 bilinear-filtered accesses with relatively good coherence, so the relative cost will be about 2.

As a result: The cutoff where SATs will be faster than repeated trilinear goes at >16X AF; very few pixels in an ordinary frame will require anisotropy THAT high, and at such high levels the angle-dependency of SATs will kill you 90% of the time anyway.
 
Good analysis arjan.

I'd say the coherence issue may be bigger because there's also granularity to consider. Memory access efficiency is highest when you can cluster your reads better. With ordinary texturing, you can get a 4x4 tile of texture values, which in fact is necessary for compressed formats anyway, and expect that most or all of the values will be used while it's in the cache. With SAT's, you'll probably only use the 4 samples involved in a bilinear filter. This problem gets worse when you need to access multiple tiles.

Using the bias trick mentioned in the HDR demo thread, the additional precision may not be as great of an issue as you say. I think you could get away with 2x precision.

Overall, though, I think your analysis is near correct.
 
Well, if you want to drop angle dependence, you're probably better-off just going with RIP mapping and get better performance (hey, memory is cheap).
 
Chalnoth said:
Well, if you want to drop angle dependence, you're probably better-off just going with RIP mapping and get better performance (hey, memory is cheap).
Except your get worse quality. With SAT with for verticle and horizontal ( in texture space ) you can actually intergrate (well there are a couple other little condition buts its close enough) rather then simply down filter so well basicly your comparing down filtering with analytical filtering in those cases. You could use ripmaps to accelerate intergration though it would be quicker when there is no boundry overlap on average to use SAT then ripmaps.

Mintmaster said:
This problem gets worse when you need to access multiple tiles.
Took you lot long enough :p
 
Last edited by a moderator:
bloodbob said:
Except your get worse quality. With SAT with for verticle and horizontal ( in texture space ) you can actually intergrate (well there are a couple other little condition buts its close enough) rather then simply down filter so well basicly your comparing down filtering with analytical filtering in those cases.
Er, with RIP mapping you're basically doing the same math as you would be doing with the summed area table. Bilinear filtering is going to look better for the base MIP than the summed area table, and for minifcation, you won't notice the averaging.
 
Chalnoth said:
Er, with RIP mapping you're basically doing the same math as you would be doing with the summed area table. Bilinear filtering is going to look better for the base MIP than the summed area table, and for minifcation, you won't notice the averaging.
Ehh how is bilinear going to look better then bilinear? ( I clearly said bilinear SAT in the first post )

As far as you claim you won't notice the bluriness I disagree.
 
Last edited by a moderator:
Chalnoth said:
What blurriness?
On axis in texture space sampling ( i.e vertically or horizontally along the texture) assuming a rectangle sample coverage. Rip maps will over sample up to an extra 50% number of texels. ( Say your trying to sample a 16x3 footprint you can either take 16x2 or 16x4 with ripmaps with SAT you can take 16x3). Edit: doing bilinear ripmaps.

Okay here is a paper in which they have implement a 4d precomputed pyramid to help compute ansiotropy ( rather different implementation to graphics cards ) now I'm fairly certain this 4 d pyramid would also include all the data that would be included in ripmaps. ( total memory usage=16 times the base texture ).

Here is a screenshot from the end of the paper showing bilinear,trilinear,anistropy and SAT ( this implementation has additional reads ect to compensate for diagonals ).
http://www.users.on.net/~mccann/SAT.png
IMHO the samples which would be close to the axises in texture space are blurrier in the ansiotropic filtering then the SAT. Though I must admit this really isn't a good example but I'm not going to write a software rendered to argue with you.
 
Last edited by a moderator:
Chalnoth said:
There's also more aliasing in the SAT shot, though.
I'm asssuming your takling about say along the diagnoals( in texture space)? Geeze how could we fix that? oh we make it blurier along the diagnoals by doing plain SAT.

If you a pixel that cover the texels marked with X with SAT you could directly find the average value. (X=8)
Code:
OOOOOOOO OOOO OO O
OOOOOOOO OOOO OO O
OOOOXXXO OO84 O6 3
OOOOXXXO OO84 O6 3
OOOOXXXO OO84 O6 3
OOOOXXXO OO84 O6 3
OOOOXXXO OO84 O6 3
OOOOOOOO OOOO OO O

OOOOOOOO OOOO OO O
OOOO888O OO84 O6 3
OOOO888O OO84 O6 3
OOOO444O OO62 O4 2

OOOO444O OO62 O4 2
OOOO666O OO62 O4 2

OOOO555O OO62 O4 2

Exactly which texels would you end up sampling with a ripmap? ( I'd suggest it would have to be the bottom right hand entire quarter of the texture ). ( Depending what you sampled you might come up with the right answer though then all I need to do is go back and make X non-uniform. )

In the end SAT can sample any set of pixels that rip mapping can plus more ( even on no axises ).
 
Last edited by a moderator:
There's two things that you're missing here:
1. Highly-anisotropic cases will never in practice have texels aligned in a rectangle in pixel space.
2. Even with summation over all of the right samples, you can still get aliasing.
 
Back
Top