Hey everyone,
First a disclaimer: This is a messy explanation. Sorry if I'm not very clear at times.
A while ago, I asked for some feedback on an AA algorithm, but didn't describe it very well. I promised to try to implement it but, to this day, I still didn't write any code for it.
The reason is quite simply that the algorithm seems too expensive. It significantly increases per-triangle workload. My guestimate is about 5x the transistor count for an identical triangle throughput.
And thus, the algorithm didn't interest me as much as soon as I realized how expensive it is.
Now, since maybe some of you people are interested in the exact idea, I'm going to explain it here. I have not found any similar work on the web - but I must admit I didn't research extensively. If any similar approach was already tried, I would appreciate references.
First, let's begin by saying the memory bandwidth costs for this approach are EXACTLY the same as for traditional antialiasing. It should be compatible with both MultiSampling and SuperSampling.
Each pixel got X subpixels ( although they aren't traditional subpixels, as explained later ) , with the same characteristics as a normal subpixel: Z, Color, ...
Vertex Shading or T&L occurs in a perfectly traditional fashion. The differences begin just afterwards, and they end as soon as you enter the Pixel Shader.
A traditional triangle system with Ordered Grid Antialiasing is the following:
1.Calculate slopes based on x/y
2.
A. Increment by 1 on the y axis, and increment by the slope on the x axis - determine best X
B. Calculate interpolation based on position
C. Go through the sub-pixel row, evantually determine part of which pixel it is and stuff for MSAA - do interpolation here too, based on the other end of the triangle at this Y position. For each subpixel, do Z checks and stuff
D. End when you meet other end of the triangle.
There. 1 is traditionally called Triangle Setup, while 2 is traditionally called Rasterization. My explanation of 2 is obviously oversimplified - many methods to optimize texture caching by not doing the whole row are used, for example. And I'm assuming the "other end of the triangle" is known like by magic, too, while it should actually be calculated in parallel.
Now, as you may see, beside for doing one-per-pixel Color Calculation for MSAA, there is no use at all of the fact that there really are pixels and subpixels.
What I'm trying to do here is to use that. This algorithm is fully based on this goal.
My triangle system is the following:
Note: I'm using Bresenham, so don't be surprised by what "error" means - Also, there is both a Horizontal Samples value and a Vertices Samples value.
1. Calculate slopes for minor axis / major axis ( you know what I'm talking about - if X > Y, then do Y/X, stuff like that )
2.
A. Increment by 1 on the major axis, increment by the slope on the y axis ( on a subpixel level, potentially on a pixel level too since it might be more optimal not to convert subpixel to pixel every time )
B. Determine coverage for the two possible PIXELs using a Wu-derived algorithm. Required precision depends on number of samples.
Okay, so I'm gonna stop here and do some explaining. This is done on a pixel-level, so you got coverage for the two pixels. This value is then going to be used to adjust the number of subpixels that would be obtained with a traditional algorithm, so that the coverage is the same. This is not done randomly ( in fact, much of the algorithm's cost is in determining where to do it ) , more on that later.
Now, you need to realize that you're actually going to do the traditional thing with a *line* algorithm, thus determining the best position and error at all points. This is done on a subpixel level, and right now, we're only doing it for all the subpixels in the pixel.
C1. For every subpixel in the pixel ( that is, for 4x AA, not 4, but 2, because it's only either Horizontal or Vertical - you could say it's either a subpixel row or column in the pixel ) , determine the best position on the minor axis. This is *stored* in cache. The smallest and largest error values are also stored ( only store if it's smaller / larger than the current smallest / largest value! ) - note that more than one value should be stored beyond 4x AA, because the difference between the perfect case and the worst one then becomes 2. It increases even further in huge numbers, but that's above 16x AA, and I really don't think more than 16x AA is very practical for real-time rendering.
Okay, so what we also did here is store the error. The error is what we're using to determine which subpixel's minor axis should be incremented / decremented. Now, something quite nasty too got to be done per subpixel:
C2: At the same time, for every subpixel row/column ( in the pixel ), calculate how many of the subpixels are on each subpixel ( or only one, that's a good optimization opportunity ) - this is a quite annoying calculation IMO. Won't go into the details, must admit my ideas for that probably aren't too optimal. One approach is to take the number of samples on the minor axis, determine on which pixel it is and where its extreme it is on it on a sample level. Then you add/subsctract ( based on direction ) the number of samples, and check if it goes beyond the maximum or if it's less than the minimum. You get the idea, I guess...
There. So that enables us to calculate a total of the number of subpixels on each pixel.
C3: Add the number of subpixels for every subpixel row/column in each pixel. So you got two values, which are the number of subpixels in each pixel.
See where I'm going to? I've got the number of subpixels, and the number of subpixels for a perfect image. Now, just adding subpixels randomly on one pixel won't work. Also understand that by "adding", I meant incrementing or decrementing the position on the minor axis of a subpixel row/column. You've got to use the error values you've got from above. Won't go into the details as to how to use those error values. The greatest error value should for example be used when you want to add to the pixel which is the nearest to the next pixel ( you can check that based on direction )
Okay, so you got in a small table the edges in the subpixel. Seems easy to do what follows, right? Wrong!
The problem is that traditional rasterization uses a begin X and end X for each Y line. But using this algorithm, there's no guarantee there won't be gaps. The solution?
Well I must admit mine probably isn't too great. You store all the "exception" subpixels which must either be rendered or not in cache, but still use the traditonal Y/X method for rasterization, but you check the following six subpixels not to be exception pixels:
1. Begin subpixel
2. End subpixel
3. Begin subpixel -1
4.End subpixel + 1
5. Begin subpixel + 1
6. End subpixel -1
Okay, so that's awfully unoptimal, I had one or two other lame and slow ideas for it too, but I'm too lazy to type them here.
Anyway, sorry if this "explanation" is horribly messy, confusing and lacking. I hope you at least understood a small part of what I was talking about
Yes, this algorithm DOES have problems in real-world situations, I guess. But so does RGSS, really.
Also, some problems with nearby triangles are kinda solved by the whole idea of using the largest/smallest error value, because that means for example that if you 'fix' in one end of the pixel for one triangle, well you'll 'fix' at the other end of the pixel for the other triangle. That's because in one case the error value increases, in the other it decreases.
Anyway...
Feedback, comments, flames? If this is idiotic and makes no sense, please feel free to say so!
Uttar
First a disclaimer: This is a messy explanation. Sorry if I'm not very clear at times.
A while ago, I asked for some feedback on an AA algorithm, but didn't describe it very well. I promised to try to implement it but, to this day, I still didn't write any code for it.
The reason is quite simply that the algorithm seems too expensive. It significantly increases per-triangle workload. My guestimate is about 5x the transistor count for an identical triangle throughput.
And thus, the algorithm didn't interest me as much as soon as I realized how expensive it is.
Now, since maybe some of you people are interested in the exact idea, I'm going to explain it here. I have not found any similar work on the web - but I must admit I didn't research extensively. If any similar approach was already tried, I would appreciate references.
First, let's begin by saying the memory bandwidth costs for this approach are EXACTLY the same as for traditional antialiasing. It should be compatible with both MultiSampling and SuperSampling.
Each pixel got X subpixels ( although they aren't traditional subpixels, as explained later ) , with the same characteristics as a normal subpixel: Z, Color, ...
Vertex Shading or T&L occurs in a perfectly traditional fashion. The differences begin just afterwards, and they end as soon as you enter the Pixel Shader.
A traditional triangle system with Ordered Grid Antialiasing is the following:
1.Calculate slopes based on x/y
2.
A. Increment by 1 on the y axis, and increment by the slope on the x axis - determine best X
B. Calculate interpolation based on position
C. Go through the sub-pixel row, evantually determine part of which pixel it is and stuff for MSAA - do interpolation here too, based on the other end of the triangle at this Y position. For each subpixel, do Z checks and stuff
D. End when you meet other end of the triangle.
There. 1 is traditionally called Triangle Setup, while 2 is traditionally called Rasterization. My explanation of 2 is obviously oversimplified - many methods to optimize texture caching by not doing the whole row are used, for example. And I'm assuming the "other end of the triangle" is known like by magic, too, while it should actually be calculated in parallel.
Now, as you may see, beside for doing one-per-pixel Color Calculation for MSAA, there is no use at all of the fact that there really are pixels and subpixels.
What I'm trying to do here is to use that. This algorithm is fully based on this goal.
My triangle system is the following:
Note: I'm using Bresenham, so don't be surprised by what "error" means - Also, there is both a Horizontal Samples value and a Vertices Samples value.
1. Calculate slopes for minor axis / major axis ( you know what I'm talking about - if X > Y, then do Y/X, stuff like that )
2.
A. Increment by 1 on the major axis, increment by the slope on the y axis ( on a subpixel level, potentially on a pixel level too since it might be more optimal not to convert subpixel to pixel every time )
B. Determine coverage for the two possible PIXELs using a Wu-derived algorithm. Required precision depends on number of samples.
Okay, so I'm gonna stop here and do some explaining. This is done on a pixel-level, so you got coverage for the two pixels. This value is then going to be used to adjust the number of subpixels that would be obtained with a traditional algorithm, so that the coverage is the same. This is not done randomly ( in fact, much of the algorithm's cost is in determining where to do it ) , more on that later.
Now, you need to realize that you're actually going to do the traditional thing with a *line* algorithm, thus determining the best position and error at all points. This is done on a subpixel level, and right now, we're only doing it for all the subpixels in the pixel.
C1. For every subpixel in the pixel ( that is, for 4x AA, not 4, but 2, because it's only either Horizontal or Vertical - you could say it's either a subpixel row or column in the pixel ) , determine the best position on the minor axis. This is *stored* in cache. The smallest and largest error values are also stored ( only store if it's smaller / larger than the current smallest / largest value! ) - note that more than one value should be stored beyond 4x AA, because the difference between the perfect case and the worst one then becomes 2. It increases even further in huge numbers, but that's above 16x AA, and I really don't think more than 16x AA is very practical for real-time rendering.
Okay, so what we also did here is store the error. The error is what we're using to determine which subpixel's minor axis should be incremented / decremented. Now, something quite nasty too got to be done per subpixel:
C2: At the same time, for every subpixel row/column ( in the pixel ), calculate how many of the subpixels are on each subpixel ( or only one, that's a good optimization opportunity ) - this is a quite annoying calculation IMO. Won't go into the details, must admit my ideas for that probably aren't too optimal. One approach is to take the number of samples on the minor axis, determine on which pixel it is and where its extreme it is on it on a sample level. Then you add/subsctract ( based on direction ) the number of samples, and check if it goes beyond the maximum or if it's less than the minimum. You get the idea, I guess...
There. So that enables us to calculate a total of the number of subpixels on each pixel.
C3: Add the number of subpixels for every subpixel row/column in each pixel. So you got two values, which are the number of subpixels in each pixel.
See where I'm going to? I've got the number of subpixels, and the number of subpixels for a perfect image. Now, just adding subpixels randomly on one pixel won't work. Also understand that by "adding", I meant incrementing or decrementing the position on the minor axis of a subpixel row/column. You've got to use the error values you've got from above. Won't go into the details as to how to use those error values. The greatest error value should for example be used when you want to add to the pixel which is the nearest to the next pixel ( you can check that based on direction )
Okay, so you got in a small table the edges in the subpixel. Seems easy to do what follows, right? Wrong!
The problem is that traditional rasterization uses a begin X and end X for each Y line. But using this algorithm, there's no guarantee there won't be gaps. The solution?
Well I must admit mine probably isn't too great. You store all the "exception" subpixels which must either be rendered or not in cache, but still use the traditonal Y/X method for rasterization, but you check the following six subpixels not to be exception pixels:
1. Begin subpixel
2. End subpixel
3. Begin subpixel -1
4.End subpixel + 1
5. Begin subpixel + 1
6. End subpixel -1
Okay, so that's awfully unoptimal, I had one or two other lame and slow ideas for it too, but I'm too lazy to type them here.
Anyway, sorry if this "explanation" is horribly messy, confusing and lacking. I hope you at least understood a small part of what I was talking about
Yes, this algorithm DOES have problems in real-world situations, I guess. But so does RGSS, really.
Also, some problems with nearby triangles are kinda solved by the whole idea of using the largest/smallest error value, because that means for example that if you 'fix' in one end of the pixel for one triangle, well you'll 'fix' at the other end of the pixel for the other triangle. That's because in one case the error value increases, in the other it decreases.
Anyway...
Feedback, comments, flames? If this is idiotic and makes no sense, please feel free to say so!
Uttar