D3D10 Deferred Shading: AA possible?

Jawed

Legend
Over here:

http://www.beyond3d.com/forum/showpost.php?p=830639&postcount=16

b3d58.jpg


the subject of MSAA with deferred shading has come up.

So, how would D3D10 allow MSAA to be implemented in a deferred shading engine :?:

How would it perform, is it likely to be viable any time soon :?:

Would MSAA in such games be a compelling reason to upgrade to Vista :?:

Or, is deferred shading destined to be nothing more than a medium-term fad, a bit like stencil shadowing :?:

Jawed
 
Maybe the main purpose of MSAA texture being available to PS is deferred shading..

but resolving the MRT MSAA texture objects is somewhat like SSAA and is inefficient I think. most samples in a pixel is the same, but has to be resolved individually.
 
Hehe. stencil shadows :) go tell that to saints row.

Anyway.

The last 3 algorithm ideas I've had, one of which was a 'AA filter' all ended up being doable on SM3 with one tiny exception somewhere. So personally I cannot wait for D3D10.

I would assume as well as getting access to the subsamples, you also get access to the number of samples (colour compression?). Maybe? Otherwise it'd just be supersampling right?

The 'aa filter' idea I mentioned wasn't actually multi-sampled. More like the 'differed AA' example currently floating in the coding forum. However not CPU reliant preprocessing geometry and also not effecting pixels adjacent to edge pixels.
 
I would assume as well as getting access to the subsamples, you also get access to the number of samples (colour compression?). Maybe? Otherwise it'd just be supersampling right?
It's only supersampling if you run a pixel shader for each sample. If the samples are stored in separate surfaces it's just a memory organization detail.
 
Ah, the penny has dropped (or I think it has).

In DX9 the application has no control over MSAA sample resolve. This means that a colour surface containing the deferred operands is "blindly" munged (blended) by the MSAA resolve, as though the operands are colours. The result is that these edge pixels contain meaningless operands that are some peculiar "average" of the triangles' operands.

In D3D10 the application can ask for the un-resolved Z samples and colours (operands). So in the final shading pass, as the final pixel colour is being written to the "screen", the code can detect a pixel containing multiple triangles from the un-resolved MSAA data. It can then loop over the triangles that make up the "edge-pixel", producing each triangle's final colour based upon the operands, and then perform a manual blend of the resulting colours (whether from 2, 3 ... triangles). When a pixel contains one triangle, the computation of the final colour is obviously 1/4 as intensive, and doesn't require the blend operation.

Hmm, so pretty simple. Is that right :?:

Jawed
 
sounds about right. You know - the other thing this would seem to bring within reach is the PS subsuming the ROPs.

I would be surprised if you actually get info about which of a pixels samples are identical though - I suspect that the pixel shader will always see 1,2,4,8,16 samples per pixel, regardless of how well they were compressed. I don't think even the ATI products have PS branching granularity fine enough to make optimizations based on sample count a win (for a short MSAA resolve or - if the PS is doing double duty as an ROP - blend type op).
 
Yeah I think you're right, just a list of samples per pixel. Un-ordered, with no concept of which triangle each sample belongs to. I was thinking samples with the same Z would all logically belong to the same triangle. But that isn't the case.

It's only the ROPs that need to work with compression and unravel a pixel's samples. Logically the render target is uniformly read at the sample resolution.

But in terms of performance, I suppose it would be preferable to identify groups of samples per pixel (whether the pixel is edge or interior), in order to avoid having a potentially complex shader iterate over up to 16 samples for each and every pixel.

Jawed
 
Ah, the penny has dropped (or I think it has).

In DX9 the application has no control over MSAA sample resolve. This means that a colour surface containing the deferred operands is "blindly" munged (blended) by the MSAA resolve, as though the operands are colours. The result is that these edge pixels contain meaningless operands that are some peculiar "average" of the triangles' operands.

Hmm, I wonder if that's what sireric had in mind when he described D3D AA as "broken" way back when.
 
Jawed - I agree that a 1 bit flag that says whether all samples belong to same triangle or not would be a good idea.

Do you know if the PS can access a fragments coverage mask in SM4.0 (needed to emulate a traditional ROP)?

Serge
 
nAo - care to explain a little further? :)
Once you enable centroid sampling the GPU will be forced to not interpolate any texture coordinate out of your primitive when MSAA is activated.
Now imagine to have previously rendered something with MSAA off and to rerender the same stuff with MSAA on..you can now sample your non MSAA (NOT using the SM3 VPOS register) rendered image using centroid sampling, it will make your sample more correct along edges (ie you will fetch less stuff that does not belong to your MSAAed primitive)
It's not perfect but it can help in some situations (deferred shadow mapping..)

Marco
 
But centroid sampling can also induce over-filtering artifacts. An example is a square rendered as two triangles. You may not want the interior edge to be centroid sampled as that may cause filtering artifacts where the triangles meet.
 
You don't need a flag when you have centroid sampling
Centroid sampling is, at best, a hack to work around the failings of another hack (multisampling). The real solution to the edge sampling problem is to simply super-sample edges. Then you get correct LODs/derivatives and you also don't sample outside the boundaries of your triangle.


That said, for the problem at hand (determinining if your current fragment is fully-covered or not), you can use centroid for a partial solution: Just interpolate the same attribute using both centroid and non-centroid sampling. If they match, your fragment is fully-covered for sure. If they don't match, then your fragment is possibly partially uncovered.
 
A problem here is that an edge in one G-buffer is not an edge in another G-buffer. e.g. two triangles abut (sharing a single pixel), with identical Z and identical normal, but different colour.

So the detection of edges needs to be executed for each G-buffer independently. Doesn't it?

As to SS'd edges versus MSAA'd, isn't this always going to be a question of ordered-grid versus sparse-sampling? Is it possible to rasterise around edges (at the super-sampling res) with a non-ordered-grid pattern? Wouldn't that simply shift the artefacts of edge-AA further towards the interior?

Jawed
 
I'm thinking in terms or reading an MSAAed surface - loading an entire pixel's worth of subsamples - and then processing them in the shader. The 1 bit flag would tell you whether all the samples available are identical or not (*), but nothing at all about the way the triangle currently being rasterized covers the current pixel. Isn't this what the slide is referring to, or am I being an idiot?

(*) I assume that the MSAAed surface is stored in compressed form using the following simple scheme: if all sub-samples have identical values, store that value once; otherwise, store a value for each sub-sample. Use some bits lying around somewhere to track whether a pixel is compressed or not... So more or less the equivalent of color-compression, but for buffers in some arbitrary format.
 
But centroid sampling can also induce over-filtering artifacts. An example is a square rendered as two triangles. You may not want the interior edge to be centroid sampled as that may cause filtering artifacts where the triangles meet.
Isn't color a centroid sampled interpolant? never seen color interpolation aritfacts on edges.
Ok, situation is different here, we are actually sampling something that can abruptly change along ad edge as for example a normal.
Bob said:
Centroid sampling is, at best, a hack to work around the failings of another hack (multisampling)
I never said it's a solution, but it can help (and it's basicly free and fast)
The real solution to the edge sampling problem is to simply super-sample edges. Then you get correct LODs/derivatives and you also don't sample outside the boundaries of your triangle.
Often real solutions are simple and expensive, thus not really interesting..at least on current GPUs.
Jawed said:
Is it possible to rasterise around edges (at the super-sampling res) with a non-ordered-grid pattern? Wouldn't that simply shift the artefacts of edge-AA further towards the interior?
Yep it is, just send your geometry N times slightly modifying your projection matrix in order to shift your samples on screen
 
Last edited:
I'm thinking in terms or reading an MSAAed surface - loading an entire pixel's worth of subsamples - and then processing them in the shader. The 1 bit flag would tell you whether all the samples available are identical or not (*), but nothing at all about the way the triangle currently being rasterized covers the current pixel. Isn't this what the slide is referring to, or am I being an idiot?
Yeah.

Except that there's no "triangle currently being rasterised". There's no scene geometry being processed in the final phase of deferred shading, when G-buffers are being interpreted to produce final shader effects (and MSAA in this case). This phase simply uses a screen-covering quad to iterate over the contents of the set of G-buffers.

When I referred to triangles earlier, I mistakenly thought they're relevant to the interpretation of MSAA samples. They're not, merely the set of samples and their values is needed. "Grouping by triangle" is really just grouping by equivalence of values.

Jawed
 
(*) I assume that the MSAAed surface is stored in compressed form using the following simple scheme: if all sub-samples have identical values, store that value once; otherwise, store a value for each sub-sample. Use some bits lying around somewhere to track whether a pixel is compressed or not... So more or less the equivalent of color-compression, but for buffers in some arbitrary format.
As far as I know, MSAA compression works merely in terms of how much data is written to memory (or read-back, when performing MSAA compares).

The actual buffer, in memory, consumes the precise amount of space determined by the dimensions of the surface, the colour format (FX8, FP16, etc.) and the degree of AA.

So there are 1-bit flags floating around, so that the ROPs can determine when a pixel contains fully compressed samples (i.e. all samples are the same) or when the pixel consists of 2 to n samples (n being the degree of AA).

So, "compression" is a mild misnomer - the amount of data written is compressed (like run-length encoding), but the memory space occupied by the buffer is constant.

Jawed
 
I realise, after posting that, that you prolly understood that anyway. You aren't interested in compression of storage, merely the existence of a 1-bit flag.

Sorry. So, ahem, just ignore that.

Jawed
 
Back
Top