Need a thorough explanation please...

Neeyik

Homo ergaster
Veteran
Okay folks - after sifting through the SDK, the MSDN newsgroups, various books, website articles, and so on, I am no closer to finding a full and complete explanation to the process behind multisample antialiasing via the D3D interface. I know roughly how it works (from reading the various aforementioned material) but I really want to know the fundamental differences between how MSAA is applied with regards to the DX, NVIDIA, ATI, OpenGL methods.

The reason why I'm asking is that I'm currently working on a big update to our hardware vocabulary (link); there is much in there that I find highly unsatisfactory but I was running out of time with it to meet an important deadline. Any help would be greatly appreciated - you can email me via neeyik@futuremark.com if you do not wish for your comments to be public. Yes I know there are parts of the vocab that are actually wrong...this is what I'm working on to correct!
 
AFAIK, both D3D and OpenGL have "multisampling" defined in such a way that it can cover both supersampling and what is usually referred to as multisampling on this board. Both methods store multiple samples - sets of color/Z/stencil values - per pixel, with each set corresponding to a given location within the pixel. The main difference is that supersampling performs a full set of computations (Z, gouraud color, texture lookup, pixel shading in general) for every sample, whereas multisampling performs the computations only once per pixel and applies this result to all the samples for the pixel (that is, all the samples that are deemed to be within the polygon currently being rendered).

It is not 100% correct to say that supersampling is the same as 'rendering a picture at a higher resolution and downsample it', as this doesn't correctly describe rotated/skewed/sparse grid supersampling.

As for NV/ATI:
  • Most Nvidia chips (except possibly RIVA 128 and older) appear to have supported ordered grid supersampling at some point in their lifetime.
  • NV20 and upwards support 2x and 4x multisampling (rotated grid with 2x, only ordered grid with 4x), plus some hybrid super/multi-sampling modes (4xS, 6xS, 8xS; perform the full set of computations/pixel shading multiple times per pixel, but not for every sample either)
  • R200/RV250 support only supersampling, with 2 to 6 samples per pixel and a programmable sample pattern.
  • R300 and higher support multisampling, with 2, 4 or 6 samples per pixel and a programmable sample pattern. Supersampling is supposedly supported in HW, but unavailable in any driver set released so far.

Not sure if this helps you any, as to the best of my knowledge this is mostly just common knowledge in this forum ....
 
arjan de lumens said:
It is not 100% correct to say that supersampling is the same as 'rendering a picture at a higher resolution and downsample it', as this doesn't correctly describe rotated/skewed/sparse grid supersampling.
Yes, you're quite right to point this out. As I said in the first post, I was running short on time and so I had purposely left stuff that I thought I knew the most about until the end! As it turned out, the entry for anti-aliasing was, to me at least, the worst of the lot. I will be updating it to include information about sample grids, etc.
arjan de lumens said:
The main difference is that supersampling performs a full set of computations (Z, gouraud color, texture lookup, pixel shading in general) for every sample, whereas multisampling performs the computations only once per pixel and applies this result to all the samples for the pixel (that is, all the samples that are deemed to be within the polygon currently being rendered).
That much I understand - it's sort of the "bit in between" that documents are murky on. What I'm looking for is the blanks in the procedure that I have in my mind:

  • 1. Frame is "prepared" at an increased resolution for rendering.
    2. A mask is used to find out which sub-samples lie along triangle edges.
    3. Next bit I'm unclear on. Those sub-samples that are along an edge are rendered to a separate off-screen buffer, each with a unique colour and z-value? Those that aren't are rendered to a separate off-screen buffer, all with the same colour and z-depth? What resolution are these buffers? Are the non-edge sub-samples rendered to the back buffer?
    4. Is the z-depth information about the edge sub-samples used to determine anything? Is the final blending of these samples weighted by the amount of coverage each sub-sample has of the triangle edge, or it is also weighted by additional information?
As you can see, I know naff all about multisampling :oops: !
 
Starting with a simplified way to rasterize triangles:
You have three vertices. You determine which one is topmost and which one is bottommost. For every line between them, you determine where it starts on the left and where it ends on the right. Then for each line, you render every pixel between those start and end positions.

If you want to add multisampling, on way to accomplish it is:
- Instead of determining one start and end position per line of pixels, you determine N start and end positions, with vertical offset according to the sample positions.
- You now start rendering at the leftmost starting position of those N "sub-lines".
- For every sample of a pixel, you now check if the horizontal sample position lies between the start and end of the related "sub-line". If it does, set the nth bit of the sample coverage mask for that pixel.
- Calculate one color for that pixel. Take the center of the pixel as texture sampling point.
- Determine a Z-value per sample.
- Do a Z test per sample. If the test fails, clear the nth bit of the coverage mask.
- For every sample that has its coverage bit set, write the color (calculated per pixel) and the Z value (calculated per sample) to the multisample buffer.


Color, Z, and stencil are stored per sample. Because the color and stencil values are equal in non-edge pixels, you can easily compress them.
But you still need to store them per sample, because on an IMR you never know whether the edge of a following triangle goes through that pixel. There is no real difference between edge and interior pixels, only that the coverage mask of the latter consists of only 1s.

After the rendering is finished, you take all the samples per pixel and blend them together, usually with equal weighting.
 
Neeyik said:
That much I understand - it's sort of the "bit in between" that documents are murky on. What I'm looking for is the blanks in the procedure that I have in my mind:
Well, let me try to answer/explain ...
1. Frame is "prepared" at an increased resolution for rendering.
Yes, by initializing all the subsamples of every pixel to a user-specified default value.
2. A mask is used to find out which sub-samples lie along triangle edges.
A bit-mask is generated on a per-pixel basis during rendering, telling which subsamples the given pixel covers - for interior pixels this mask will be all 1s, indicating that all subsamples of the pixel in the framebuffer are updated with the computed color of the pixel - for edge pixels, the mask will indicate that only some subsamples should be updated. Note that this mask is only kept internally in the chip until the affected subsamples of the pixel have been updated. It is NOT stored in the multisample buffer itself - the multisample buffer only contains the actual subsamples themselves (color/Z/stencil values), and no information on which samples belong to which polygon or where triangle edges pass.
3. Next bit I'm unclear on. Those sub-samples that are along an edge are rendered to a separate off-screen buffer, each with a unique colour and z-value? Those that aren't are rendered to a separate off-screen buffer, all with the same colour and z-depth? What resolution are these buffers? Are the non-edge sub-samples rendered to the back buffer?
If two or more subsamples in a pixel are covered by the same polygon, they are all assigned the same color/Z/stencil values, no matter how many subsamples are actually covered. All sub-samples are treated the same whether they are covered by a polygon edge or not - edge and non-edge subsamples are always stored in the same buffer and not distinguishable in any way after they have actually been placed in the buffer.
4. Is the z-depth information about the edge sub-samples used to determine anything? Is the final blending of these samples weighted by the amount of coverage each sub-sample has of the triangle edge, or it is also weighted by additional information?
Z depth is used the same way as for standard, non-multisampled rendering: you use it to determine, for each subsample, whether the incoming samples are 'behind' the samples already in the buffer, and not draw them if they are in fact behind. Final blending (which is done just before the buffer is displayed on the screen) usually just takes an average of the subsamples' color values for each pixel, although there are exceptions (e.g. Nvidia's 'quincunx'). The final blending, in any case, only uses the contents of the subsamples and no additional information.
 
arjan de lumens said:
If two or more subsamples in a pixel are covered by the same polygon, they are all assigned the same color/Z/stencil values
The same color, yes, but not the same Z value (stencil is different as you usually do not write a certain value, but modify the current content of the stencil buffer, e.g. increasing or decreasing).

If all samples of a pixel had the same Z value, you could not anti-alias polygon intersection edges, as the result of the Z test would be the same for all samples, i.e. either all pass or all fail.
 
Okie-dokie...it's starting to make a lot more sense now 8) Presumably, the "non-maskable" type of multisampling doesn't bother with the bit-mask, just renders to a higher resolution and samples down to the required size...oh, hang on - that's supersampling! *ping* (bulb comes on)

Just one more quick question, on a related topic: when, in the pipeline sequence, is the z-buffer first written out? Come to think of it, when exactly are the mask checks done (for MSAA)? Ooh, thought of some more too - the sub-samples are all "within" a pixel, (disregarding Quincunx, etc) so what sort of sampling precision is needed for really high levels of multisampling? D3D MSAA is supposed to be one pass rendering; how many additional clock cycles would one estimate low level MSAA adds to the length of the pass?

Sorry for the random burbling now but I'm mildly typsy (sadly, after only one beer!) and I'm in "work-work-work" mood at the moment too!
 
Neeyik, maybe my layman's way of explaining it makes sense: :oops:

The key thing about [Direct3D] Multisampling is that it happens during the rasterizer stage where a triangle is converted into the pixels.

If no multisampling is used, the pixels are just send futher down to the pixel pipeline untouched, but if you enable multisampling another step is added before you get to those lovely pixels: This step increases the resolution of polygon edges (and therefore also depth and stencil values) and render a certain number of sub-pixel samples (2x, 4x, 6x etc.) that are then blended into one pixel ready to be send to the pixel pipeline. (Some like ATI also adds a mask or pattern that can change where these sub-pixel samples are taken from).

Anyway, since all this happens in dedicated silicon before the pixel pipelines, there are no (and can be no) hit in pixel fillrate at all. All those sub-pixel/fragments require memory bandwidth accesses, but that have been covered in detail already.

I hope that I didn't mud the waters too much with this. ;)
 
Neeyik said:
Okie-dokie...it's starting to make a lot more sense now 8) Presumably, the "non-maskable" type of multisampling doesn't bother with the bit-mask, just renders to a higher resolution and samples down to the required size...oh, hang on - that's supersampling! *ping* (bulb comes on)
The "non-maskable" multisampling D3D provides is just a generic way to expose any kind of antialiasing to be set by the application. It is, however, not "controllable" by the application during the rendering process.
"Maskable" multisampling has the following properties:
- It provides n samples for every pixel on screen (unlike Matrox' FAA which is non-maskable)
- You can use a bit mask that is ANDed with the coverage mask to limit rendering to writing to certain samples/sample buffers only. This is useful for depth-of-field, motion blur and similar "T-Buffer effects".

OpenGL offers a special mode with multisampling called alpha-to-coverage. It converts the alpha value to a bit mask containing 1s proportionally to the alpha value. With enough samples, this is a means to render transparent polygons without blending (subpixel dithering).

D3D MSAA is supposed to be one pass rendering; how many additional clock cycles would one estimate low level MSAA adds to the length of the pass?
None, if it's not bandwidth limited.
 
Back
Top