Does this Antialiasing algorithm make sense?

Arun

Unknown.
Moderator
Legend
Hey everyone,

First a disclaimer: This is a messy explanation. Sorry if I'm not very clear at times.

A while ago, I asked for some feedback on an AA algorithm, but didn't describe it very well. I promised to try to implement it but, to this day, I still didn't write any code for it.
The reason is quite simply that the algorithm seems too expensive. It significantly increases per-triangle workload. My guestimate is about 5x the transistor count for an identical triangle throughput.
And thus, the algorithm didn't interest me as much as soon as I realized how expensive it is.

Now, since maybe some of you people are interested in the exact idea, I'm going to explain it here. I have not found any similar work on the web - but I must admit I didn't research extensively. If any similar approach was already tried, I would appreciate references.

First, let's begin by saying the memory bandwidth costs for this approach are EXACTLY the same as for traditional antialiasing. It should be compatible with both MultiSampling and SuperSampling.
Each pixel got X subpixels ( although they aren't traditional subpixels, as explained later ) , with the same characteristics as a normal subpixel: Z, Color, ...

Vertex Shading or T&L occurs in a perfectly traditional fashion. The differences begin just afterwards, and they end as soon as you enter the Pixel Shader.

A traditional triangle system with Ordered Grid Antialiasing is the following:
1.Calculate slopes based on x/y
2.
A. Increment by 1 on the y axis, and increment by the slope on the x axis - determine best X
B. Calculate interpolation based on position
C. Go through the sub-pixel row, evantually determine part of which pixel it is and stuff for MSAA - do interpolation here too, based on the other end of the triangle at this Y position. For each subpixel, do Z checks and stuff
D. End when you meet other end of the triangle.

There. 1 is traditionally called Triangle Setup, while 2 is traditionally called Rasterization. My explanation of 2 is obviously oversimplified - many methods to optimize texture caching by not doing the whole row are used, for example. And I'm assuming the "other end of the triangle" is known like by magic, too, while it should actually be calculated in parallel.


Now, as you may see, beside for doing one-per-pixel Color Calculation for MSAA, there is no use at all of the fact that there really are pixels and subpixels.
What I'm trying to do here is to use that. This algorithm is fully based on this goal.

My triangle system is the following:
Note: I'm using Bresenham, so don't be surprised by what "error" means - Also, there is both a Horizontal Samples value and a Vertices Samples value.

1. Calculate slopes for minor axis / major axis ( you know what I'm talking about - if X > Y, then do Y/X, stuff like that )

2.
A. Increment by 1 on the major axis, increment by the slope on the y axis ( on a subpixel level, potentially on a pixel level too since it might be more optimal not to convert subpixel to pixel every time )
B. Determine coverage for the two possible PIXELs using a Wu-derived algorithm. Required precision depends on number of samples.

Okay, so I'm gonna stop here and do some explaining. This is done on a pixel-level, so you got coverage for the two pixels. This value is then going to be used to adjust the number of subpixels that would be obtained with a traditional algorithm, so that the coverage is the same. This is not done randomly ( in fact, much of the algorithm's cost is in determining where to do it ) , more on that later.

Now, you need to realize that you're actually going to do the traditional thing with a *line* algorithm, thus determining the best position and error at all points. This is done on a subpixel level, and right now, we're only doing it for all the subpixels in the pixel.

C1. For every subpixel in the pixel ( that is, for 4x AA, not 4, but 2, because it's only either Horizontal or Vertical - you could say it's either a subpixel row or column in the pixel ) , determine the best position on the minor axis. This is *stored* in cache. The smallest and largest error values are also stored ( only store if it's smaller / larger than the current smallest / largest value! ) - note that more than one value should be stored beyond 4x AA, because the difference between the perfect case and the worst one then becomes 2. It increases even further in huge numbers, but that's above 16x AA, and I really don't think more than 16x AA is very practical for real-time rendering.

Okay, so what we also did here is store the error. The error is what we're using to determine which subpixel's minor axis should be incremented / decremented. Now, something quite nasty too got to be done per subpixel:

C2: At the same time, for every subpixel row/column ( in the pixel ), calculate how many of the subpixels are on each subpixel ( or only one, that's a good optimization opportunity ) - this is a quite annoying calculation IMO. Won't go into the details, must admit my ideas for that probably aren't too optimal. One approach is to take the number of samples on the minor axis, determine on which pixel it is and where its extreme it is on it on a sample level. Then you add/subsctract ( based on direction ) the number of samples, and check if it goes beyond the maximum or if it's less than the minimum. You get the idea, I guess...

There. So that enables us to calculate a total of the number of subpixels on each pixel.

C3: Add the number of subpixels for every subpixel row/column in each pixel. So you got two values, which are the number of subpixels in each pixel.

See where I'm going to? I've got the number of subpixels, and the number of subpixels for a perfect image. Now, just adding subpixels randomly on one pixel won't work. Also understand that by "adding", I meant incrementing or decrementing the position on the minor axis of a subpixel row/column. You've got to use the error values you've got from above. Won't go into the details as to how to use those error values. The greatest error value should for example be used when you want to add to the pixel which is the nearest to the next pixel ( you can check that based on direction )

Okay, so you got in a small table the edges in the subpixel. Seems easy to do what follows, right? Wrong!

The problem is that traditional rasterization uses a begin X and end X for each Y line. But using this algorithm, there's no guarantee there won't be gaps. The solution?
Well I must admit mine probably isn't too great. You store all the "exception" subpixels which must either be rendered or not in cache, but still use the traditonal Y/X method for rasterization, but you check the following six subpixels not to be exception pixels:
1. Begin subpixel
2. End subpixel
3. Begin subpixel -1
4.End subpixel + 1
5. Begin subpixel + 1
6. End subpixel -1

Okay, so that's awfully unoptimal, I had one or two other lame and slow ideas for it too, but I'm too lazy to type them here.


Anyway, sorry if this "explanation" is horribly messy, confusing and lacking. I hope you at least understood a small part of what I was talking about :)
Yes, this algorithm DOES have problems in real-world situations, I guess. But so does RGSS, really.
Also, some problems with nearby triangles are kinda solved by the whole idea of using the largest/smallest error value, because that means for example that if you 'fix' in one end of the pixel for one triangle, well you'll 'fix' at the other end of the pixel for the other triangle. That's because in one case the error value increases, in the other it decreases.

Anyway...
Feedback, comments, flames? If this is idiotic and makes no sense, please feel free to say so! :)


Uttar
 
Uttar said:
Hey everyone,

First a disclaimer: This is a messy explanation. Sorry if I'm not very clear at times.

Indeed

Each pixel got X subpixels ( although they aren't traditional subpixels, as explained later ) , with the same characteristics as a normal subpixel: Z, Color, ...

A traditional triangle system with Ordered Grid Antialiasing is the following:
1.Calculate slopes based on x/y
2.
A. Increment by 1 on the y axis, and increment by the slope on the x axis - determine best X

What is best X?
The leftmost X in all the subpixel rows?
(Assuming left-to-right rasterizing.)

You should calculate the left_x for all sumsample rows (thats 2 for 4x OG and 4 for 4x RG).

B. Calculate interpolation based on position
C. Go through the sub-pixel row, evantually determine part of which pixel it is and stuff for MSAA - do interpolation here too, based on the other end of the triangle at this Y position. For each subpixel, do Z checks and stuff

You mean go trough the pixel row, don't you?
...determine which subsamples are covered determining if their x position is withing the left_x, right_x of their subpixel row.

Interpolation doesn't have to use "the other end" as you can calculate the horizontal per pixel increment in the triangle setup (it is constant for all the rows of the triangle.)

D. End when you meet other end of the triangle.

There. 1 is traditionally called Triangle Setup, while 2 is traditionally called Rasterization. My explanation of 2 is obviously oversimplified - many methods to optimize texture caching by not doing the whole row are used, for example. And I'm assuming the "other end of the triangle" is known like by magic, too, while it should actually be calculated in parallel.

There's no magic in it, if you can calculate the left end you should have no trouble calculating the right end.

Now, as you may see, beside for doing one-per-pixel Color Calculation for MSAA, there is no use at all of the fact that there really are pixels and subpixels.

It is missing from your explanation, yes. :)

What I'm trying to do here is to use that. This algorithm is fully based on this goal.

My triangle system is the following:
Note: I'm using Bresenham, so don't be surprised by what "error" means - Also, there is both a Horizontal Samples value and a Vertices Samples value.

Which is?

1. Calculate slopes for minor axis / major axis ( you know what I'm talking about - if X > Y, then do Y/X, stuff like that )

What if the left and right edge has different "major axes"?
How would you rasterize the triangle.


I'll continue the reply after supper... :)
 
Thanks for the reply!

The best X is really just rounding it, sorry for not mentionning it. So it's (int) (X+0.5f)

Pixel row / Subpixel row stuff: Well, I guess I should have decided to either explain for MSAA or SSAA... In the case of SSAA, it's a subpixel row, isn't it? Or am I wrong again?

Interpolation: Didn't know that, oopsy.
Magic: Of course it isn't magic. I just meant I didn't explain it clearly - eh, even that wasn't clear! :)

Using the fact there's pixels/subpixels: Missing from my explanation??? Well, at least, it isn't used in the way I'm using it.

Oh, sorry, it's Vertical Samples value. And it's just the number of Horizontal / Vertical samples.

Rasterization: See the end of the post. It's kinda ugly, you got to *redo* Y/X, and do "exception subpixels" and stuff. Although I'm sure even a trained squirel could find a better solution! :)


Uttar
 
Ok, I'm lost in your description.

What do you mean by determining the best position?
Like determining the position of the line along the minor axis at the center of the pixel or at the subsample row/column?

What is the error value?

Also if you would fix theses sentences, than there might be a chance of understanding:

calculate how many of the subpixels are on each subpixel
I've got the number of subpixels, and the number of subpixels for a perfect image.

Ok, it would be good if you describe the final dataset you got, and how you produce the final pixel from it.
I assume you have 4 subpixels.

You have 4 color/Z values.
What else do you have at the end?

I saw you had a horizontal position (?) for both subpixel row, and two vertical position (?) for both subpixel row.
Or it can be only one of them?

I can see how the final pixel is produced if it's only one of them, but I can't see the algorithm working this way.
That would have problems especially at the corners of the triangle.

...to be continued
 
Okay, so I won't explain it all again - I agree it's wayyy too messy to be understandable by someone who didn't know about the algorithm before the reading...

So I'm going to summarize my idea. Won't summarize the traditional system, since you probably even know more about it than me :)

First, though, you asked what the error value is. The error value is used for Bresenham's line drawing algorithm - it's the most accepted line drawing algorithm. The error value begins at 0, and is increased at every step. The system used in Michael Abrash's Wu implementation is:
1 Error Adj value, which is added to a ErrorAcc value at every step on the line ( ErrorAcc begins at 0 )
ErrorAdj is a 16-bit unsigned integer, and the calculation for it is:
"ErrorAdj = ((unsigned long) DeltaX << 16) / (unsigned long) DeltaY;" ( switch DeltaY & DeltaX if DeltaX > DeltaY )

To simplify that, it means ErrorAdj is DeltaX / DeltaY ( or DeltaY / DeltaX ) , but the maximul value 1 - so it's 0->1
Whenever ErrorAcc ( which means Error Accumulator, BTW ) is lower or equal than the last ErrorAccumulator, that means you got beyond 1 - so it's time to increment Y or X, depending on direction.

Summary

You first do calculations based on Minor Axis / Major Axis, for both subpixels & pixels. During those calculations, you use Wu Antialiasing to determine coverage ( not color ) on a pixel level for both possible pixels. You can then trasnform that in the number of subpixels required in each pixel.

Then you determine actual subpixel coverage with a traditional per sub-pixel system, while also storing the biggest and smallest error values ( more than 1 value required for each if using more than 4x AA ) - remember to store the position of the determined subpixel value, too, it's gonna be adjusted later than used for rasterization!

Okay, so you got actual subpixel coverage for each pixel, and required overage for each pixel. Now all you need to do is determine where to "cheat" in order to increase visual quality.

This is where the error values come into play. Biggest = in the direction of the destination point. Smallest: in the opposite direction.

Won't explain rasterization now, too lazy today... So right now, you've got the adjusted minor axis position of all subpixels of the line ( the major axis is always increased by 1, so useless to store it )


Uttar
 
I'm starting to understand it.

So you determine sample coverage traditionally and add or subtract some of the samples to make up for the coverage error.

Since joint edges of polygons are simmetrical, there won't be cracks there. (good).

But I think it gets messy at pixels with a vertex ("triangle corner") within them.
 
First, an additional disclaimer: I'm assuming here that pretty much everything is floating point. In a real architecture, this ain't correct for obvious cost reasons. That's why you got things like "subpixel accuracy", with 12-bit for the Quadro FX and 4-bit for the R300, for example.

Oh, and another one: I'm assuming that the center of the pixel is at x.0 - this isn't correct in many recent architectures. You should be able to make it work for x.5 quite easily though - or anyway, I hope so.

Hyp-X said:
I'm starting to understand it.

So you determine sample coverage traditionally and add or subtract some of the samples to make up for the coverage error.

Since joint edges of polygons are simmetrical, there won't be cracks there. (good).

But I think it gets messy at pixels with a vertex ("triangle corner") within them.

Eh, glad it's becoming clearer :)

Oh, yes, pixels with a vertex within them. Actually thought about that when I was thinking about implementing it, sounds like I forgot to talk about it here.

There's a trick for this, but it of courses costs transistors to implement...

You actually need special-case code for the first and last pixel of the lines ( remember that you're actually drawing 3 lines in the "first pass" - the triangle is really the "second pass" )

You've got to realize first that practically, coverage is determined at the *center* of the pixel. The problem with this though, is that part of the pixel is not covered, so:
1. Maximum coverage is not 100% - and you can't cover some subpixel row/columns!
2. The center is not at the same location.

So, you've got to determine what's the maximum coverage first. I'm assuming you got the starting position in FP, here, BTW, which ain't really correct, just doing it to simplify the matter :) Must admit I didn't think much about how to do it in a more performance friendly way ( = subpixel accuracy ) there, though - shouldn't be too hard for someone who knows how it works on current architectures, I hope.

Okay, so you'd have a begin vertex position like 4.63
If you're actually going to to left, that is in the direction of pixel 4, you've got a 63% coverage. But if you're going to the right, you got a 100-63% coverage - that is, a 37% coverage.

So, you actually got to transform that coverage in a number of subpixels that can be accessed too ( the subpixel you begin by is kinda obvious - depends on direction, as always )

Now, you can actually determine the number of samples covered. But you can't actually determine the number of samples *required* !

The problem is that the coverage at the pixel center isn't correct if what you need is coverage for what you draw - and that's what you need. So you got to use the center of what you draw.
The problem is that computing all that can be kinda expensive. My solution is not the best one IQ-wise, but it should work pretty much fine since you don't need a LOT of accuracy:

You do coverage determination in the pixel equivalent of the middle subpixel of the subpixels that can be drawn.
That means one way to do it is dividing the subpixel value by the number of samples on the major axis.
If the subpixels that can be drawn value is not pair, then you simply take the first one. Not perfect, but how much precision do you really need? Not a lot. Oh, sure, it might not always give the very best result this way, but if really care about IQ, you could always do some even more messy & complex thingies for this.


Yes, it's messy for pixels with a vertex within them - but it's possible, as you may see :)


Uttar
 
Back
Top