"Deferred MSAA" compatible with any kind of Render Targets

RacingPHT

Newcomer
I know many render target is not compatible with MSAA currently, for example, MRT, FP16 or even FP32 is not possible to use MSAA on some hardware, if not all. Some papers discussing deferred shading is not given good advice either.

I'm thinking about this problem for a morning and get an answer, I don't know if this has been used before or there has been some better way, so I decide to share it here;)

the procedure is described below:
1: render the scene with MSAA off, and store the color/Z as two textures.
2: turn on FSAA, use the right backbuffer and z/stencil.
3: render all the visible geometry again, with the same shader(described below), texture and state. great batch, doesn't it ?
(in DX10, it's even better with streaming out), you can even use occlusion query, whatever.
4: in the shader:
1) : generate all the texcoord matching the screen space.
2) : simply find the nearest-Z in the Z-texture, for all neighbor 4 texels, compared with the interpolated-Z of the primitive
3) : use the nearest-Z's corresponding color as output.

Now, you get CORRECT MSAA.

The main idea is that(inspired by Alpha-to-Coverage):
1: All the survived AA-Sample will execute the same shader in the pixel, even if there is only one.
2: so the "half covered pixel" will get some AA-Samples to fetch it's "right" color, which are all the same in that pixel.
3: and finally, the HW will solve it ...

I saught there has been some "Smart Shader AA" on the net, but may not get the correct sub-pixel coverage infomation, because it's just post-process the image on texture space and blurring the edge.

PS: i'm from Beijing so excuse for my poor English:smile:
 
SM3 version shder code is here:
Code:
float4 ShaderAAPS_SM3( SceneVS_Output Input) : COLOR
{
    float3 scrTexCrd = Input.PosPS.xyz / Input.PosPS.w;

    scrTexCrd.x = scrTexCrd.x * 0.5 + 0.5 + pixelSize.x * 0.5;
    scrTexCrd.y = -scrTexCrd.y * 0.5 + 0.5 + pixelSize.y * 0.5;

    float4 offset[] = { float4(scrTexCrd.x + pixelSize.x, scrTexCrd.y, 0, 0),
                        float4(scrTexCrd.x - pixelSize.x, scrTexCrd.y, 0, 0),
                        float4(scrTexCrd.x, scrTexCrd.y + pixelSize.y, 0, 0),
                        float4(scrTexCrd.x, scrTexCrd.y - pixelSize.y, 0, 0)};

    float4 color = tex2Dlod(sampColor, float4(scrTexCrd, 0));
    float zInit = tex2Dlod(sampZ, float4(scrTexCrd, 0)).x;
    float minDetZ = abs(zInit - scrTexCrd.z);

    if (minDetZ > 0.01)
    {
        for (int i = 0; i< 4; i++)
        {
            float4 curColor = tex2Dlod(sampColor, offset[i]);
            float curZ = tex2Dlod(sampZ, offset[i]).x;
            float detZ = abs(curZ - scrTexCrd.z);
            if (detZ < minDetZ)
            {
                color = curColor;
                minDetZ = detZ;
            }
        }
    }
    return color;
}

if I can upload the demo I will put it here :)
 
It's a very interesting idea. I'd very much like to see how it looks. The amount of blurring it introduces should be minimal and the results should be very nice.

It is a method that would probably work very well on XBox 360 because of its limited framebuffer size.
 
It's a very interesting idea. I'd very much like to see how it looks. The amount of blurring it introduces should be minimal and the results should be very nice.

It is a method that would probably work very well on XBox 360 because of its limited framebuffer size.

yes, there are no blur ;)
it may work well with Xbox360, but I didn't develope on the console yet.

the demo ( I don't know if the upload can be used :) )
http://gamesir.enorth.com.cn/AttachFile-335859
 
pic comes, but I can't edit my thread... Is there any way?

AttachFile-335902
 
I see no difference, GeForce 7800 GTX, 91.31 drivers. What hardware and software platform are you testing with?
 
I see no difference, GeForce 7800 GTX, 91.31 drivers. What hardware and software platform are you testing with?

In most cases, HW AA and Deferred AA looks the same, except pixels on the near-plane-clipped edge.
My card is GF6600, 91.45.
 
Interesting. So you are saying render the original pixels in your high precision render target, then use a lower precision target with MSAA, and in the shader, compare depths to the original image to choose the appropriate pixel colour. This last bit won't be very efficient though, as the branching will occur in edge pixels only. You'd also lose tiny sub pixel geometry that msaa would pick up

My next thought is that this probably won't work with alpha testing or blending.
This may also mess up with rotated grid AA?

It also requires rendering geometry twice, and an extra frame buffer.

You would need to do a tone mapping pass/whatever on the original render target before passing it through your shader.

However this has given me an idea. I'll have a think.

[edit]

don't get me wrong, I'm not saying it doesn't work. Just sorting out the pros/cons in my mind.... Via interwebyness
 
Last edited by a moderator:
Now, you get CORRECT MSAA.
Unfortunately, it fails on thin lines where only few pixels are drawn without AA (simply because the the right color/depth is nowhere to be found in the pixel neighborhood). You might get artifacts where there are sub-pixel gaps between polygon edges, and there could be depth precision issues, both problems being more likely with distant objects.
 
Same for me. No AA, whether AA in the driver panel is set to application controlled or forced. X1600Pro, Cat6.8.

RacingPHT,
what's the point of using tex2Dlod to sample from mip level zero?? Surely your render target textures do not have mipmaps at all. And even if you wanted to sample level zero of a texture with mipmaps, set the mip filter to NONE instead.
 
RacingPHT,
what's the point of using tex2Dlod to sample from mip level zero?? Surely your render target textures do not have mipmaps at all. And even if you wanted to sample level zero of a texture with mipmaps, set the mip filter to NONE instead.

Probably to prevent the Microsoft HLSL compiler to move the texture instructions outside of dynamic branches.
 
*thinks* An alternative idea might be to use centroid sampling with texture coords computed in a vertex shader to match the screenspace coords to attempt to work where a pixel is being sampled from. If the pixel is being sampled from left of centre, grab the output from pixel to the left in the non MSAA buffer and so on.

It should be possible to get it to work if centroid works as ATI have shown in this PDF http://www.ati.com/developer/SIGGRAPH04/ShadingCourse2004_HLSL_Slides.pdf
 
Last edited by a moderator:
to Rys, Xmas, SuperCow:
I've tested on some machine and it works. If it does not, there is a new demo which set 4xAA itself and have a bit "optimize"(which may be slower):
http://gamesir.enorth.com.cn/AttachFile-336012

to Xmas: The compiler make sense because the GPU need ddx/ddy which must get a block of tex2D running.. so I have to use tex2Dlod. You can check ATI's papers. And the "CORRECT" means the coverage infomation is correct, not the color :p

to Graham: yes alpha not works, but if you are using deferred shading you may put alpha in this pass :)

to Colourless: I'm thinking... my method have a problem which don't work with intersecting faces where the Z is almost the same... your idea is great...
 
The new version works for me with hardware AA. Shader AA mostly works, but shows what appear to be Z fighting errors on the model geometry. I'll screenshot in the morning to show you.
 
*thinks* An alternative idea might be to use centroid sampling with texture coords computed in a vertex shader to match the screenspace coords to attempt to work where a pixel is being sampled from. If the pixel is being sampled from left of centre, grab the output from pixel to the left in the non MSAA buffer and so on.

It should be possible to get it to work if centroid works as ATI have shown in this PDF http://www.ati.com/developer/SIGGRAPH04/ShadingCourse2004_HLSL_Slides.pdf

Thanks for your idea, and it works. just replace with this function:
Code:
//float4 PosPS_cent : TEXCOORD2_centroid;

float4 ShaderAAPS_SM3( SceneVS_Output Input, float2 ScrPos : VPOS ) : COLOR
{
    float2 scrTexCrd = (ScrPos + 0.5) * pixelSize;
    
    float2 scrTexCrd_cent = Input.PosPS_cent.xy / Input.PosPS.w;
    scrTexCrd_cent = scrTexCrd_cent + 0.5 + pixelSize * 0.5;

    // move the texcoord "insidewards" 1 texel
    float2 crd = normalize(float3(scrTexCrd_cent - scrTexCrd, 0.0001)).xy * pixelSize + scrTexCrd;
    return tex2D(sampColor, crd);
}

this method is much faster, but has a problem, as the color on any polygon edge may shift a pixel to the inside direction.
 
The new version works for me with hardware AA. Shader AA mostly works, but shows what appear to be Z fighting errors on the model geometry. I'll screenshot in the morning to show you.

You are right. some pixel may lose subtile detail, or choose the wrong color.
as a matter of fact, this method can not obtain all the information that a normal HW AA could easily did, due to its "Deferred" nature. the main purpose of deferred AA is to smooth the discontiguous edge, where I thought jagged edges are most visible.

with a crazy renderTarget (A32B32G32R32F, multiple? :rolleyes: ), the idea of doing AA only on light weight format may be more useful, even if the HW support it (at least saves memory space) ...
 
Back
Top