What does DX10.1 add vs. DX10 that makes such a big difference? [deferred shading]

Jawed

Legend
Mod Edit: This discussion was plucked from the RV670 speculation thread with an eye toward easier access.

What does DX10.1 add vs. DX10 that makes such a big difference? I was under the impression DX10 had everything necessary for MSAA with deferred shading. Sure, DX10.1 makes it slightly more convenient, but I hadn't noticed anything that makes a possible/impossible kind of difference, or even a significant performance difference.
Deferred rendering algorithms would prefer to be able to read the depth of each sample, as far as I can tell. D3D10 doesn't give the developer access to depth, just colour.

The fallback in D3D10 is to use a second render target and write Z there.

So, D3D10.1 should make this much faster.

Jawed
 
Deferred rendering algorithms would prefer to be able to read the depth of each sample, as far as I can tell. D3D10 doesn't give the developer access to depth, just colour.
While that is all very nice in theory, all current DX9/DX10 games that use deferred rendering or shadowing write Z or position separately anyway, because the Z-Buffer isn't precise enough.

Even a FP32 depth buffer (introduced in DX10) would be slightly lower precision than just writing Z separately for a multitude of reasons. However, I'll admit it's probably enough in most cases, but I'd love for someone with personal experience to chime in here.

Regarding STALKER specifically, if the GPU Gems 2 chapter on the subject is not outdated, they're actually writing XYZ position in *FP16*, rather than FP32 Z. One reason for that is it's cheaper in terms of ALU computations (and they made that design decision in the NV30/R300 era, most likely...) and it allows them to implement a very neat shadowing 'Virtual Position' trick.

So, for a DX10 STALKER, this wouldn't be an advantage at all. However, for UE3, I actually think it's *probably* usable. However, here comes another trade-off: what if Epic is still using an INT24 depth buffer in DX10? Then using FP32 there would increase bandwidth requirements for not only depth writing (still less than writing FP32 Z separately), but also depth testing.

It's still an interesting feature though, and I'm sure it'll result in nice performance gains for a game or another down the road. It won't result in mind-blowing differences though, and it wouldn't even be useful for STALKER, for example. But it's certainly a nice feature to have, and I doubt many are going to complain about ATI trying to get it implemented in games! :)
 
http://download.microsoft.com/downl...elopment Drilldown - Direct3D 10 and 10.1.zip

b3da003.gif

b3da004.gif

b3da005.gif

b3da006.gif

The peculiar thing there is multifrequency shading. I've seen ATI patents that appear to be on this subject and they're pretty interesting - the idea being you can vary the sampling density across the image.

I presume being able to create compressed textures and to write to RGBE textures is important, but I dunno how they'll be used.

Jawed
 
I'm not convinced being able to write bitwise to compressed/RGBE textures is very useful (it's just a way to simplify compression for dynamic create-once-use-many-times assets) but multifrequency shading looks VERY interesting indeed! :)

If that works as I'd expect it to, certainly the 10.1 feature with the most potential IMO.
 
It's still an interesting feature though, and I'm sure it'll result in nice performance games for a game or another down the road.
How do deferred rendering algorithms currently deal with per-sample Z for correctly rendered MSAA?

Jawed
 
For DX10, in exactly the way you described previously: they write depth separately as a FP32 value. For a 'fully deferred' engine ala STALKER, writing XYZ position directly is also an option. DX10.0 has no problem reading that back per-sample, since it's "colour" data and not depth per-se.

P.S.: And the API giving you MSAA sample positions is a gimmick, it's easy as hell to get those manually. Being able to choose them might come in handy, however, but that's not specifically useful for deferred renderers as far as I can tell. Per-MRT blend modes might come in handy in some cases, but it wouldn't be in STALKER/UE3.
 
So how does a deferred engine generate 4 distinct Zs per pixel, when doing 4xMSAA?
It doesn't, and why should it? It will generate as many Zs as there are *unique* samples.

However, the more I think about it, the more important I think multifrequency shading is. It's probably the most important feature in 10.1 overall, and the most important one for deferred renderers certainly. At least if it works as I presume it does, which would allow you to easily only shade 'unique' samples without ugly hacks or brute force.

EDIT: Not that being able to readback the MSAA depth samples isn't very useful in some cases too, such as screen-space ambient occlusion ala Crysis (are we sure they aren't doing that per-pixel anyway though? I mean, they are blurring the occlusion anyway, so would it really matter?), of course. My previous posts were specifically regarding deferred rendering.
 
It doesn't, and why should it?
Because without per-sample Z the deferred renderer produces subtle errors at edges, when it assumes that all samples in a pixel have the same Z.

It will generate as many Zs as there are *unique* samples.
And under D3D10 the deferred render is unable to use that data. Only D3D10.1 provides access to multiple Zs per pixel. D3D10 is forced to assume that all samples have the same Z.

However, the more I think about it, the more important I think multifrequency shading is. It's probably the most important feature in 10.1 overall, and the most important one for deferred renderers certainly. At least if it works as I presume it does, which would allow you to easily only shade 'unique' samples without ugly hacks or brute force.
Prolly worth a thread but there seems to be less than scant info out there on this.

Jawed
 
Because without per-sample Z the deferred renderer produces subtle errors at edges, when it assumes that all samples in a pixel have the same Z.
No it wouldn't, and no it doesn't. Sigh...

If all samples are covered by the same triangle, all Zs will be the same but you don't need distinct Zs, because this is multisampling, not supersampling: you only need to shade each pixel ONCE if all the samples are identical.

Prolly worth a thread but there seems to be less than scant info out there on this.
Yup, it probably is, let alone to tempt some of those who might know to come forward... :) This would be a fairly huge benefit to quite a few algorithms if it allows you to do per-unique-sample shading a way or another.
 
Since you still don't seem to understand this, here is a relatively naive way to implement deferred shadowing:
Code:
apply program to sample 0
if[all samples are not identical]
{
 apply program to all other samples
 average samples
}
Obviously, this is sub-optimal, because dynamic branching granularity isn't good enough (and even if it was, it wouldn't handle less than the maximum number of unique samples optimally).

However, that's not the point. The point is that in the case of deferred shadowing for example, you don't need to do it 4 times per pixel for 4x MSAA unless all the samples are unique. There's no real point in doing that: why would you apply lighting once per pixel and shadowing once per sample?

There are many hacks you can use even in 10.0 to optimize things, though. So, you have your 4x MSAA buffers and your final 1-sample/pixel framebuffer. You could go do fullscreen passes that write stencil in the final framebuffer, but only if all the samples are identical. Then, use early stencil rejection to shade all those pixels once in another fullscreen pass, and then do yet another fullscreen pass to shade the pixels with different samples.

I haven't tested that myself on DX10 hardware though. It shouldn't be a problem at all on R6xx, because the early-stencil rejection hardware is amazing, fast and works exactly as expected AFAIK. On G80, however, early stencil rejection only happens in specific cases (it's geared towards Doom 3...) and while I'm sure this should be possible to implement, it might be much harder and less extensible.

It might also be possible to just write Z (int16 buffer to minimize the cost?) in the pixel shader and then use that for early rejection. I think this should work because Z is going to be checked before shading anyway, even if the tile has Z compression disabled (which would likely be the case if you wrote its depth manually in a previous pass!)

Hopefully this should make it clear I'm not just smoking crack when I say you don't need unique Zs...
 
Hopefully this should make it clear I'm not just smoking crack when I say you don't need unique Zs...
In a D3D10 deferred renderer an MSAA'd G-buffer is separate from Z with Z shared per pixel, because Z at the sample level is not available in the pixel shader, nor in the post-resolve view of the G-buffers (i.e. when the G-buffer is consumed by the lighting pass).

D3D10.1 allows the lighting pass to identify the Z for each sample. This allows it to correlate albedo/normal/specularity etc. with the depth of that sample. So, for example, you can calculate the speculars for the two triangles within a pixel correctly, then average those results to produce the AA-resolved pixel. So the specular for 3 samples that lie on one triangle could be quite a different value from the specular on the second triangle, which has just one sample within the pixel. e.g. the first triangle could produce no specular, but the second has an intense specular value.

Under D3D10 the lighting pass cannot discern Z for each sample, therefore it has to assume that Z is shared by all samples. This results in slightly incorrect polygon-edge rendering. In this example, the first triangle's Z may take priority - causing the specular on the second triangle's single sample to be entirely lost. The result being a subtly incorrect MSAA'd pixel.

Now, of course, if you want to go round the houses to kludge a solution to this, then feel welcome. The point is that D3D10.1 simplifies-away all this kludging.

Jawed
 
Under D3D10 the lighting pass cannot discern Z for each sample, therefore it has to assume that Z is shared by all samples.
What the heck are you talking about again? Here are the capabilities of each API:
- DX9: Fetch resolved C/Z.
- DX10.0: Fetch C samples and resolved Z.
- DX10.1: Fetch C/Z samples.

Thus, for deferred shadowing...
- DX9: You need to write a separate Z (INT24 is not precise enough). MSAA is not possible except via smart hacks (hello Heavenly Sword!) and even then it's not perfect.
- DX10.0: You can have an INT24 depth buffer and write a separate FP32 Z buffer (as a colour buffer). You can readback individual samples of that FP32 buffer, and there will be as many unique samples as there are visible triangles for that pixel. You must manually compute the number of unique samples by considering samples with identical values to be non-unique.
- DX10.1: You can have a FP32 depth buffer and readback individual samples from it. There will be as many unique samples as your MSAA level, but it is still possible to recognize unique samples based on the color buffer you need to readback too anyway.

DX10.0 allows you to have the *EXACT* same final results with deferred shadowing as a non-deferred renderer would have. The only thing DX10.1 buys you is higher performance, and as I said this is *partially* compensated by forcing you to use a FP32 depth buffer when an INT24 depth buffer might be deemed enough by the developer otherwise. DX10.1 also allows you to have negligibly *higher* image quality at an extra performance cost.

Of course, being able to readback the real depth buffer samples can have other uses, but those are mostly performance-related, and don't make some new fundamentally new algorithms possible.
 
Under D3D10 the lighting pass cannot discern Z for each sample, therefore it has to assume that Z is shared by all samples.
But in D3D10 you'd have to implement a special code path to output per-sample Z anyway (typically by outputting Z into a color render target), so this means you can access the stored Z value for each sample from this color RT. It seems to me the case you speak of would correspond to the (unlikely) scenario whereby the scene is re-rendered into a non-MSAA depth buffer which can then be bound as a texture? (and why would you re-render the scene then?)
 
But in D3D10 you'd have to implement a special code path to output per-sample Z anyway (typically by outputting Z into a color render target), so this means you can access the stored Z value for each sample from this color RT.
Yes, this would be a supersampling pass. It'd be "fast" since you'd have no pixel shader code to execute.

The cons:
  • It would require all the geometry is submitted twice - the second time to generate the G-buffer.
  • This second pass would be incapable of getting an "early-Z" benefit from the first pass, since the resolution of the first pass is supersampled.
  • Supersample positions don't line up with MSAA sample positions.
So the whole thing is a joke.

Jawed
 
No it wouldn't, and no it doesn't. Sigh...

If all samples are covered by the same triangle, all Zs will be the same but you don't need distinct Zs, because this is multisampling, not supersampling: you only need to shade each pixel ONCE if all the samples are identical.
Arun, Jawed is right here. He may be greatly overstating its significance, but he's right.

Imagine if you had a function that took screenspace XYZ position and could exactly determine whether it's in shadow or not. A shadow edge drawn using slightly different Z in each subsample will be antialiased. An edge drawn using samples with the same Z for each pixel will either all pass or all fail the shadow test, and will thus be aliased. (Yes, you can take the XY position into account for each subsample, but even then you only get correct visibility results if you're lucky, like if the light is behind you).

In reality, though, there is no exact shadow test. The jitter of your samples into the shadow map (along with the shadow map resolution itself) usually have a much bigger impact than the differences in subsample Z. You could store a slope factor per pixel to eliminate this problem if you really wanted to. So again, we're back to DX10.1 being just a performance advantage w.r.t. deferred rendering.
DX10.0 allows you to have the *EXACT* same final results with deferred shadowing as a non-deferred renderer would have.
This I will agree with.

Now, Jawed:
I'm not talking about deferred shadowing.

Lighting/shading.

Jawed
How does lighting/shading depend on per sample Z?
 
Last edited by a moderator:
Okay, I wrote three replies to this already, scrapped one and lost two through bad luck (mostly my connection screwing up) - anyway, I'll try not to make this too long.

Jawed, you're defending SSAA against MSAA, not 10.1 against 10.0. Yes, SSAA has advantages over MSAA. But MSAA's disadvantages are just as true for shading as for shadowing. None of your arguements apply only to what you are arguing for.

Mintmaster, what you are essentially proposing is supersampling the shadowing signal but not the lighting signal. However, the advantages of the former are nearly always minimal (and performance might not be that nice compared to simply a better filter), while the latter can matter a fair bit. Specular, anyone?

And finally, Mintmaster's last point was (AFAICT) that you don't need per-sample Z for shading, unless you also plan to supersample that. And then, err, what's left? You're literally supersampling everything, and if what you wanted was SSAA for deferred shading, you could already do that in DX9.

You claim I moved the dicussion to deferred shadowing, while you were thinking of deferred shading. However, my points for shadowing apply nearly just as well for shading. On the other hand, you suddenly switched the discussion to the advantages of SSAA over MSAA, perhaps even without realizing that's what you are really saying. So, errr!

As for MSAA Z readback being worthwhile, yes, but not to the extend you think it is, nor for the reasons you think it is. And mostly not for the things you're probably thinking of either. I'm sorry if this post and the previous ones might seem rude, but I'm sure as hell not going to say you're right unless I think you are... And I'm not going to say I'm not sure if I actually am, either.
 
Back
Top