What does DX10.1 add vs. DX10 that makes such a big difference? [deferred shading]

No, I have never described SSAA except in reply to SuperCow who raised the concept as a workaround for D3D10 - and I showed why it sux.

I'm not interested in workarounds. I'm simply trying to describe why D3D10.1's provision of per-sample depth read back is important to the quality of edges in deferred shading algorithms that want to use MSAA.

Jawed
 
Well, since you obviously don't believe me on any part of the above post (can't blame you, since this isn't exactly the first time we disagreed on something), hopefully one or more persons will be willing to arguement with/against all this and side with one of us... I don't think there's any other way out of this now (besides pretending we never had this discussion), since we obviously both think we're right, heh! :)
 
Heh, I remember that being your response (or thereabouts) last time that Ars link was discussed around here back in May. The meat of the rest of the link was gutted too, IIRC.

OT some more, but this thread's gotten huge since I last looked about a week ago. Must delve in and see what's what :cool: A Mintmaster/Jawed/Arun 3-way is always fun to watch :p
 
Mintmaster, what you are essentially proposing is supersampling the shadowing signal but not the lighting signal. However, the advantages of the former are nearly always minimal (and performance might not be that nice compared to simply a better filter), while the latter can matter a fair bit. Specular, anyone?
Yeah, but the latter is a lot more expensive.

Supersampling the shadow isn't expensive in deferred shading, as I learned from that KillZone2 presentation. You divide your shadow map samples between the subsamples. But, like I said earlier, you can correct for this with per pixel normal, and you're back to a performance penalty.
 
I'm not talking about deferred shadowing, but deferred shading. Thanks, hopefully Arun will get it now.
Let me ask you again: What do you need per-sample Z for in deferred shading?
No, I have never described SSAA except in reply to SuperCow who raised the concept as a workaround for D3D10 - and I showed why it sux.
We understand that, but you are clearly valuing accurate per-sample Z. What for?

If you're not supersampling everything (since you told Arun that you're not), and you're not supersampling just the shadowing (since that's not what you're talking about), just what the heck are you supersampling that benefits from slightly different Z on MSAA pixels?

Or are you making a case for using MSAA while filling the G-buffer, but doing lighting/shadowing for each subsample (i.e. SSAA) when doing the compositing/lighting passes?
I trust Jack. See the post I linked earlier.

Jawed
Nothing Jack said in there supports your points. He said it's a big deal, but it's a stretch to think that statement supports the specific claims that you are making.

I personally think the in-shader multisample mask output is a much bigger deal, and Arun agrees there also.
 
Last edited by a moderator:
Yes, this would be a supersampling pass. It'd be "fast" since you'd have no pixel shader code to execute.

The cons:
  • It would require all the geometry is submitted twice - the second time to generate the G-buffer.
  • This second pass would be incapable of getting an "early-Z" benefit from the first pass, since the resolution of the first pass is supersampled.
  • Supersample positions don't line up with MSAA sample positions.
So the whole thing is a joke.

Jawed

Jawed you're only listing the "second pass" solution here - whereas the more obvious workaround would be to "simply" output depth into an additional render target in your G-Buffer. By doing so the three cons you list don't apply anymore and you are able to access each individual depth sample from this additional render target which means each sample will have its unique Z value for lighting. Thus you'd be able to achieve the same visual results in D3D10 and D3D10.1 (in the case where you re-use the MSAA depth buffer as a texture instead of writing it out into a color render target in D3D10).
So D3D10.1 saves you the additional cost of writing out Z data into a color render target since you can now access it directly via the MSAA depth buffer bound as a texture. Unfortunately this also means you cannot use the depth buffer during the lighting pass (as the API disallows the use of two resources bound to the same stage simultaneously), which is a huge limitation since it prevents light volume Z optimizations. The workaround to this is to copy the MSAA depth buffer into another resource so that both can be used...
 
Last edited by a moderator:
Jawed you're only listing the "second pass" solution here - whereas the more obvious workaround would be to "simply" output depth into an additional render target in your G-Buffer.
See the previous discussion between Arun, Jawed, and later on myself. He's arguing that it's not the same, and technically he's right.

Practically speaking, though, his point is rather moot.
 
See the previous discussion between Arun, Jawed, and later on myself. He's arguing that it's not the same, and technically he's right.
Outputting depth into an MSAA depth buffer or into an MSAA color render target can be made the same as long as the formats match (e.g. using a 32-bit floating-point depth buffer in D3D10 you can certainly calculate the Z-buffer equivalent value and writing it out to an R32F component). Once your Z data is into your render target you can access individual samples and therefore apply lighting calculations on each sample since they have individual Z. Was there any other reason why it was suggested the two solutions wouldn't produce the same results?
 
Outputting depth into an MSAA depth buffer or into an MSAA color render target can be made the same as long as the formats match (e.g. using a 32-bit floating-point depth buffer in D3D10 you can certainly calculate the Z-buffer equivalent value and writing it out to an R32F component). Once your Z data is into your render target you can access individual samples and therefore apply lighting calculations on each sample since they have individual Z. Was there any other reason why it was suggested the two solutions wouldn't produce the same results?
A multisampled depth buffer is actually supersampled as it always contains per-sample depth values. A multisample "color" buffer that is used for storing depth only contains one depth value per polygon per pixel, taken at the pixel center or centroid.
 
A multisampled depth buffer is actually supersampled as it always contains per-sample depth values. A multisample "color" buffer that is used for storing depth only contains one depth value per polygon per pixel, taken at the pixel center or centroid.

Very true. Although it seems to me that in the context of multisampling with Deferred Shading it may actually be more correct to use this single depth (evaluated at pixel center or centroid) for lighting calculations since it would be consistent with the other G-Buffer components that are also stored this way.
 
Let me ask you again: What do you need per-sample Z for in deferred shading?
If the G-buffer contains surface specularity factor, how are you supposed to calculate the surface's specular without the surface's Z?

If you use Z from another surface, then the specular calculation will be wrong.

Or are you making a case for using MSAA while filling the G-buffer, but doing lighting/shadowing for each subsample (i.e. SSAA) when doing the compositing/lighting passes?
Of course. Even in D3D10 the shading pass is a supersampled pass if you wrote G with MSAA switched on. DX9 prevents the programmer from accessing the MSAA samples of the G-buffer, so there's no opportunity.

Nothing Jack said in there supports your points. He said it's a big deal, but it's a stretch to think that statement supports the specific claims that you are making.
I'm sorry, but he couldn't be more explicit: "The depth/stencil readback for MSAA and deferred shading is a big deal [...]".

b3da005.gif

Jawed
 
If the G-buffer contains surface specularity factor, how are you supposed to calculate the surface's specular without the surface's Z?

If you use Z from another surface, then the specular calculation will be wrong.
Specularity is not sensitive to small changes in Z position. Have you ever seen highlights on a flat surface (i.e. uniform normal) that were only only 1-2 pixels wide on the screen? Besides, you can correct for it like I told you by using per pixel normal.

Of course. Even in D3D10 the shading pass is a supersampled pass if you wrote G with MSAA switched on.
Only if you want there to be a massive hit from enabling AA and are dumb enough not to supersample the G-buffer for greatly reduced shader aliasing (specularity) with little additional perf hit. KZ2 does it this way because it's only doing 2xAA (Quincux) and the devs haven't got around to optimizing unnecessary lighting calcs.

If you're trying to minimize the AA hit by using MSAA, you only do the shading pass on all subsamples if they're sufficiently different, and you have several choices in how to determine and separate those pixels.

You either supersample it all or just the pixels with different subsamples from MSAA. Your method has the quality of the latter with performance almost as bad as the former, i.e. the worst of both worlds.
I'm sorry, but he couldn't be more explicit: "The depth/stencil readback for MSAA and deferred shading is a big deal [...]".
No need for you to repeat the exact thing that I quoted from him. Read my post again. How does that statement support your specific point that DX10.1 produces deferred shading quality that DX10.0 can't match due to supersampled Z?

Quite simply it doesn't. He could mean a lot of things, including access to the stencil buffer (e.g. for antialiased shadow volumes), or the performance gain.
 
Specularity is not sensitive to small changes in Z position.
Huh? We're talking about two arbitrary triangles here, e.g. one 1m from the camera and the other 10m away.

Only if you want there to be a massive hit from enabling AA and are dumb enough not to supersample the G-buffer for greatly reduced shader aliasing (specularity) with little additional perf hit.
MSAA during G-buffer creation will be significantly faster than supersampling.

No need for you to repeat the exact thing that I quoted from him. Read my post again. How does that statement support your specific point that DX10.1 produces deferred shading quality that DX10.0 can't match due to supersampled Z?
Supersampled-Z isn't available in D3D10 unless you supersample the entire G-buffer creation phase, which is radically slower than G-buffer creation with MSAA. MSAA deferred shading in D3D10.1 provides all the quality of supersampled (in fact more, because ordered grid AA looks worse) and the bandwidth hit during G-buffer creation isn't crippling like it is with supersampling.

Quite simply it doesn't. He could mean a lot of things, including access to the stencil buffer (e.g. for antialiased shadow volumes), or the performance gain.
No, he just mentioned MSAA, deferred rendering and Z readback in the same breath because he likes throwing around buzzwords. You seem so determined to argue with me for some reason when the slide is there for you read making exactly the same point.

Jawed
 
Huh? We're talking about two arbitrary triangles here, e.g. one 1m from the camera and the other 10m away.
Two arbitrary triangles generate two different values even when you write Z into a MSAA render target. There is no advantage in supersampled Z here.

You're entire argument rests on proving the value of having slightly different Z for all samples belonging to the same polygon. You haven't done that yet.
MSAA during G-buffer creation will be significantly faster than supersampling.

Supersampled-Z isn't available in D3D10 unless you supersample the entire G-buffer creation phase, which is radically slower than G-buffer creation with MSAA. MSAA deferred shading in D3D10.1 provides all the quality of supersampled (in fact more, because ordered grid AA looks worse) and the bandwidth hit during G-buffer creation isn't crippling like it is with supersampling.
You have any tests to prove that? We don't even know if MSAA textures use compression, and we don't know how fast we can fill them. When you're running the lighting shader on all the subsamples of a pixel, I seriously doubt that doubling the G-buffer creation time (best case on R600/RV670 in low poly situations only) is going make much of a dent on overall framerate. The primary justification used for deferred rendering is that G-buffer creation takes far less time than the lighting does. If A<<B, then percentage-wise 2A+4B is only a bit smaller than 4A+4B.

MSAA does not offer the quality of supersampling. You don't have per sample normals (which greatly reduces specular aliasing), and rotated grid can be obtained by rendering rotated into the render target at the cost of a little wasted space. It won't precisely match the MSAA pattern, but it'll be much better than ordered grid.

Anyway, forget about that. Let's consider your scheme. You have yet to show why supersampled Z is superior to multisampled Z (via render target) for shading.
No, he just mentioned MSAA, deferred rendering and Z readback in the same breath because he likes throwing around buzzwords. You seem so determined to argue with me for some reason when the slide is there for you read making exactly the same point.
JHoxley and the slide say that Z/stencil subsample access is useful for deferred rendering. Neither me nor Arun are denying that, and in fact we've confirmed it.

You are saying DX10.1 enables deferred rendering to have quality unattainable without it. NOWHERE has ANYONE made that claim besides you.
 
Last edited by a moderator:
The rest of your post is so ridiculous I'm simply not going to bother.
You are saying DX10.1 enables deferred rendering to have quality unattainable without it. NOWHERE has ANYONE made that claim besides you.
No, my point is that D3D10.1 provides better IQ with less performance hit. I suggest when you've calmed down, say in a few days, and come back to cover this stuff again you'll appreciate the value of depth readback - and I never said it was a revolution, merely that it polishes off the final bit of IQ in MSAA.

I mean, really, MSAA textures using compression. What a load of baloney - next you'll be saying that fp32 textures use compression. I'm truly shocked ... you know per-sample normal is another value in the G-buffer (for poly edges), don't you? Rendering rotated at the cost of a little wasted space?, yeah an extra 100% for a 45 degree rotation (based on 1280x1024 which grows to 1630x1630), etc.

Jawed
 
The rest of your post is so ridiculous I'm simply not going to bother.
:LOL:

What, proven wrong so now you're running away? You know your two triangle example is a load a crap, but won't acknowledge it. You still haven't shown why the difference between MSAA Z (i.e. hardware supersampled) and MSAA rendertarget Z gives better quality in shading. You haven't explained why anyone with half a brain would run the lighting shader on all subsamples if Z is the only thing that's slightly different.
and I never said it was a revolution, merely that it polishes off the final bit of IQ in MSAA.
And the latter part is all Arun and I have a problem with. DX10.1 does nothing visible for IQ except for subtle shadow correction, which you said isn't even what you're talking about.

The performance aspect is arguable too, as you completely ignored Arun's point there. Storing distance in a 32-bit rendertarget has far more precision than Z in the depth buffer, so trying to skip this in DX10.1 comes at an IQ cost.
I mean, really, MSAA textures using compression. What a load of baloney - next you'll be saying that fp32 textures use compression. I'm truly shocked ... you know per-sample normal is another value in the G-buffer (for poly edges), don't you? Rendering rotated at the cost of a little wasted space?, yeah an extra 100% for a 45 degree rotation (based on 1280x1024 which grows to 1630x1630), etc.
:rolleyes: Who was the one that said "bandwidth hit during G-buffer creation isn't crippling like it is with supersampling"? How do you save on BW without color compression?

Know any deferred renderer that doesn't store the normal in the G-buffer? MSAA or SSAA, it's the same space. Only with the latter, though, do you get different values for each subsample. Normal not only changes far more than Z within a pixel, but a small change also affects lighting result far more.

You don't render at 45 degrees. That would be moronic. I'm suggesting the same thing the IHV's are:
http://download.nvidia.com/developer/presentations/GDC_2004/D3DTutorial_Sim.pdf (page 34, and there are several angles possible)
 
The performance aspect is arguable too, as you completely ignored Arun's point there. Storing distance in a 32-bit rendertarget has far more precision than Z in the depth buffer, so trying to skip this in DX10.1 comes at an IQ cost.
I agree with the rest, but I don't think this is true for an FP32 depth buffer.

:rolleyes: Who was the one that said "bandwidth hit during G-buffer creation isn't crippling like it is with supersampling"? How do you save on BW without color compression?
You write Z to the depth buffer anyway, so writing it to another render target as well is wasted bandwidth. However as SuperCow already pointed out you can't use that buffer simultaneously for depth testing and for the lighting calculations, so if you want to profit from depth test when rendering the light extends you may need to create a copy of the buffer. Copying the buffer obviously has no overdraw but it's read and write instead of just a write.
 
You write Z to the depth buffer anyway, so writing it to another render target as well is wasted bandwidth. However as SuperCow already pointed out you can't use that buffer simultaneously for depth testing and for the lighting calculations, so if you want to profit from depth test when rendering the light extends you may need to create a copy of the buffer. Copying the buffer obviously has no overdraw but it's read and write instead of just a write.

Let us think of a “smart driver”

After we have written anything to the depth buffer we will issue a copy command to the driver. Then we disable depth write and go ahead rendering our objects and use the depth values from the copy.

A “smart driver” can do the following:

Instead of doing the copy the driver can alias the target resource to the original depth buffer. As long as we don’t activate depth write this is perfect valid as the depth buffer could not change. If we reactivate depth write and still use the “copy” depth texture for an following draw call the real copy need finally done. But in the other case a full read write can be saved.
 
:LOL:

What, proven wrong so now you're running away? You know your two triangle example is a load a crap, but won't acknowledge it.
Actually it's the use case for MSAA Z readback but until the penny drops I see no point in pursuing this with you further. If your lighting equation is dependent on scene Z then you want the frequency of Z to match the sample frequency. For the best image quality on edges, each triangle that falls within a pixel must have a 1:1 relationship between Z and albedo, normal, specularity etc. Lower resolution Z creates an edge artefact for those aspects of lighting that are dependent on Z.

You still haven't shown why the difference between MSAA Z (i.e. hardware supersampled) and MSAA rendertarget Z gives better quality in shading.
I think you need to use more precise language because that's just a blur.

You haven't explained why anyone with half a brain would run the lighting shader on all subsamples if Z is the only thing that's slightly different.
If your lighting pass is per sample anyway it's detecting whether the samples in a pixel are identical (non-edge pixel) or that they're different (edge pixel). Whether you decide to attach any meaning to a pixel where all the samples are the same but Z varies (Z will always vary unless the triangle is square to the camera) is up to you - but my interest here is in enhanced triangle edge quality in deferred shading.

And the latter part is all Arun and I have a problem with. DX10.1 does nothing visible for IQ except for subtle shadow correction, which you said isn't even what you're talking about.
When triangles from two different objects share a pixel, a shared Z is meaningless. In conventional MSAA resolve this is irrelevant - but deferred rendering's shading pass must have Z in order to obtain any meaning from the G-buffer. If the shading pass reads the G-buffer at 2560x2048 (4xMSAA on 1280x1024) but reads Z at 1280x1024 then you will get subtle rendering errors where triangles from two different meshes meet within a pixel. They're subtle errors but they're there nonetheless.

The performance aspect is arguable too, as you completely ignored Arun's point there.
G-buffer creation is bandwidth bound - MSAA'd creation, particularly in making use of the GPU's compression features which are there to save bandwidth, is a big win. Any kind of supersampled G-buffer creation is just a crushing waste of bandwidth because none of the bandwidth savings that come from using MSAA are available.

Storing distance in a 32-bit rendertarget has far more precision than Z in the depth buffer, so trying to skip this in DX10.1 comes at an IQ cost.
32-bit Z to the rescue.

:rolleyes: Who was the one that said "bandwidth hit during G-buffer creation isn't crippling like it is with supersampling"? How do you save on BW without color compression?
The saving is during creation, not on read-back.

Know any deferred renderer that doesn't store the normal in the G-buffer? MSAA or SSAA, it's the same space. Only with the latter, though, do you get different values for each subsample. Normal not only changes far more than Z within a pixel, but a small change also affects lighting result far more.
Two overlapping triangles 10m apart have quite different Z values.

You don't render at 45 degrees. That would be moronic.
If you render at significantly less of an angle you get the cost of 4xSS at the IQ of ~2xSS. Great.

Jawed
 
Back
Top