"Pure and Correct AA"

Yeah I saw that too, but what I didn't see is if there's a way to query how many MSAA samples there are at a single pixel... or whether you just have to resolve them all which seems rather wasteful (considering that it's supposed to be MSAA, no SSAA).
From the API point of view the number of samples per pixel is fixed. The hardware may use some kind of sample compression though. That information would have to be sampled as well, and how do you present it to the shader?
I guess it would be possible to add another sample method to Texture2DMS that returns a bool. If the return value is true, all samples for that pixel are identical so you only need to read sample 0. But reading any other sample # would return the same value.
 
I guess it would be possible to add another sample method to Texture2DMS that returns a bool. If the return value is true, all samples for that pixel are identical so you only need to read sample 0. But reading any other sample # would return the same value.
That's my point: if you're processing every sample per-pixel even if they are all the same, then you're doing just as much work as supersampling, although the hardware can potentially save BW where all of the samples are equal.
 
That's my point: if you're processing every sample per-pixel even if they are all the same, then you're doing just as much work as supersampling, although the hardware can potentially save BW where all of the samples are equal.
Presumably this is in some kind of "post-processing" shader, after the heavy lifting of the primary pixel-shading has already been performed.

Additionally, surely it isn't much trouble to determine if all the samples are equal before launching into whatever complexities the post-processing shader performs.

Jawed
 
I took for granted you can't know in advance if all the samples that belongs to a pixel are generated by the same primitive (hence they are equal) and I was thinking about a simple and (hopefully) fast rendering pass which fetches all the samples and generates an 'edge' mask to be binded later as a stencil buffer, so that one can use early stencil rejection to process one sampe per pixel or all of them.
[addendum] it obviously makes sense only if we have to process those samples in some complex way (ie. deferred rendering, etc..) otherwise the cost is not worth it.
 
Last edited:
Presumably this is in some kind of "post-processing" shader, after the heavy lifting of the primary pixel-shading has already been performed.
Not if you're doing deferred shading, which is arguably one of the most important uses of this functionality!

Additionally, surely it isn't much trouble to determine if all the samples are equal before launching into whatever complexities the post-processing shader performs.
Again, even that can be pretty expensive if you have to read *all* of the G-buffer elements and compare them. Really there should be a low-cost way to check if all samples are equal, correct? I guess maybe not, but it seems like there should be a cheaper way... Otherwise it looks like I'm sticking to my SSAA guns for deferred shading ;)
 
?? what do you mean ??
It's nice to be able to do postprocessing on a multisampled buffer, but for the earlier mentioned shader aliasing you want to be able to dynamically switch between supersampling and multisampling inside the shader (and thus to be able to write to individual samples).
 
Hmm with a super-sampled deferred renderer, not only do you lose the nice MSAA sample patterns for edges, but you also lose the bandwidth-savings when writing the MRTs that constitute the G-buffer.

Jawed
 
I took for granted you can't know in advance if all the samples that belongs to a pixel are generated by the same primitive (hence they are equal) and I was thinking about a simple and (hopefully) fast rendering pass which fetches all the samples and generates an 'edge' mask to be binded later as a stencil buffer, so that one can use early stencil rejection to process one sampe per pixel or all of them.
If the hardware is capable of using a compressed MSAA surface as a texture then the TMU would need access to compression flags, obviously. So returning a boolean flag indicating whether the texel position is in a fully compressed tile should be possible in theory. Such a flag would be conservative of course, i.e. if it's false the samples could still be identical.
 
If the hardware is capable of using a compressed MSAA surface as a texture then the TMU would need access to compression flags, obviously.
But the MSAA render target(s) has to be "copied" to make the "MSAA texture" doesn't it? So wouldn't the on-chip compression flags be lost? The surface can't be bound for reading and writing at the same time, so as soon as a new surface is bound for writing these flags are kaput...

Jawed
 
Hmm with a super-sampled deferred renderer, not only do you lose the nice MSAA sample patterns for edges
Not if you do jittered super-sampling by offsetting the projection matrix (Humus had a demo that did this - we also discussed it a fair bit in a recent thread).

but you also lose the bandwidth-savings when writing the MRTs that constitute the G-buffer.
True, but in exchange you gain shader AA, potentially some temporal AA (also discussed in the recent thread), and better texture filtering.

The cost is (at worst), N times the cost of a single frame render, although with no memory size increase. This can be lessened somewhat by amplifying geometry and sending it to different slices using the GS, but then more memory is required.

It's not "pure and correct AA" though - just jittered super-sampling - so I guess that's a bit off-topic ;)
 
But the MSAA render target(s) has to be "copied" to make the "MSAA texture" doesn't it? So wouldn't the on-chip compression flags be lost? The surface can't be bound for reading and writing at the same time, so as soon as a new surface is bound for writing these flags are kaput...

Why is this different from normal render-to-texture? No copy there.
 
If the hardware is capable of using a compressed MSAA surface as a texture then the TMU would need access to compression flags, obviously. So returning a boolean flag indicating whether the texel position is in a fully compressed tile should be possible in theory. Such a flag would be conservative of course, i.e. if it's false the samples could still be identical.
A little misunderstanding, I'm perfectly aware that such a thing is possible in theory (reading back a compression flag), but I took for granted is not with current APIs/HW since I could not find any reference to it.
 
Why is this different from normal render-to-texture? No copy there.
I suppose I should have called it a "swap" in terms of the render target (or MRTs) - when new render targets are defined the existing ones "lose focus", some kind of pointer swapping takes place and the compression flags are reset for the new render targets.

Jawed
 
I suppose I should have called it a "swap" in terms of the render target (or MRTs) - when new render targets are defined the existing ones "lose focus", some kind of pointer swapping takes place and the compression flags are reset for the new render targets.
Pardon me if this is a stupid question (I don't really understand how the hardware works), but might the compression flags be associated with the multisampled depth buffer rather than there just existing one set globally? It seems to me that if you can get access to the "pre-resolved" fragments, they're not going to decompress the MSAA buffers when one goes to read from them.
 
Pardon me if this is a stupid question (I don't really understand how the hardware works), but might the compression flags be associated with the multisampled depth buffer rather than there just existing one set globally? It seems to me that if you can get access to the "pre-resolved" fragments, they're not going to decompress the MSAA buffers when one goes to read from them.
My understanding is that these compression flags are held in on-die memory - in a similar way to hierarchical-Z.

One of the restrictions under D3D10 is that there is only one Z/stencil buffer active at any one time, which means that all concurrently active MRTs share that Z/stencil and so all MRTs have the same dimensions (x,y).

All of this seems consistent with the concept of a single set of compression flags. This patent application, with a brief skim of the diagrams and the tables in the text:

Method and apparatus for anti-aliasing using floating point subpixel color values and compression of same

should give an overview of how compression is architected.

Jawed
 
There's nothing preventing you from flushing the compression flags to memory. It's even required when you want to do AA resolve on scanout. Though it adds a level of indirection when reading the samples.
 
There's nothing preventing you from flushing the compression flags to memory. It's even required when you want to do AA resolve on scanout. Though it adds a level of indirection when reading the samples.
The problem is the granularity of the flags, i.e. how big are the tiles if the hardware uses a tiled compression scheme?

Also, if the flags are conservative by tile (they have to be, really), where/when is any saving going to accrue? It seems to me the cost of reading all the samples and testing them to find out if a destination pixel is interior or edge is slight in comparison with, for example, the "dynamic branching" coherency cost or the cost of reading from all the textures that form a G-buffer.

It seems to me that conservative compression flags only help the maximum FPS, not the minimum, because it's only when the edge-complexity is low that there'll be a speed-up.

Jawed
 
It seems to me that conservative compression flags only help the maximum FPS, not the minimum, because it's only when the edge-complexity is low that there'll be a speed-up.
The fact that hardware uses framebuffer compression should indicate that there is enough to be gained from it, even in complex scenes. Of course for a simple downsample filter in a shader you don't save much, but reading the flag plus an if clause wouldn't cost much either.
 
Back
Top