Downplaying of DX10.1

For my point of view DX10.1 has lots of really useful features. Our new renderer is a fully deferred one, and DX10.1 seems to have focused on improving the shortcomings on that side.

Subsample reading allows me to finally do proper antialiasing. Currently we need to either do a edge-detect blur filter in post process pass, or render everything in 2x1 (or 2x2) resolution and downsample the result in combine pass (or both for best quality). Subsample reading both improves the AA quality and speeds up the rendering, and it helps the shadow map rendering as well. This is the hands down biggest feature in DX10.1.

Separate blend modes for MRT is also useful when rendering complex volumetric effects to deferred buffers (you need separate blending modes for particle normals, volume density accumulation and other parameters). But it's not that important for basic rendering. I can easily life without it (and most likely we spend resources on something else instead of it).

It's also good that they finally properly support Gather4, instead of the current Fetch4 hack implementation. We are using Fetch4 in lots of our post process filters, so it's a good thing to have it in DX10.1 feature list. Soon all the Fetch4 optimizations will be usable on Geforces as well (or I am hoping so at least).

For me cube maps in texture arrays is just a small and natural incremental feature addition to the texture array operation. It can be used improve ambient lighting quality, but storing huge amount of cubemaps is going to consume a lot of memory, and is not going to feasible in most real games (compared to a small one room ATI techdemo). It's a fun feature to play around with in the future, but certainly not something that resolves the whole realtime global illumination problem, like PR and marketing departments like to say :)
 
Well, you get the performance hit of alpha to coverage, perhaps even a bit more because you have to generate a bit mask instead of letting hardware generate a bitmask from the alpha value for you.
Right, but I don't think TrAA does that. It supersamples. I may be using the wrong terminology, but I'm trying to refer to the artifact free, bulletproof transparency antialiasing.

Alpha to coverage doesn't always work as a replacement for alpha tests, particularly for clever rounded boundaries using low resolution textures, or during magnification of textures.

One of the other reasons I really liked this coverage control is the idea of being able to apply it to techniques like this:
http://fabio.policarpo.nom.br/docs/Curved_ReliefMapping.pdf
There are flaws in the technique that the author doesn't mention (which I uncovered when independently "inventing" a similar technique just before this paper came out :mad: ), but I think there is some promise.
Although this at least gives you a choice between dithered and non-dithered alpha to coverage.
That what I was referring to most. For alpha testing it's better to properly determine coverage with additional texture samples. Also, when using alpha to coverage for hacky order independent transparency, the dithering is rather ugly, so some shader control could probably ameliorate that. Combining alpha to coverage and alpha blending could be a nice compromise in certain situations too.
It's only free if hardware resources are idle in the non-AA case.
True. I was thinking more about VSM or other techniques where you write custom values instead of just Z. Though even there subsample access probably won't give anything better looking than resolving first, so I guess I overstated the benefit.

Could you explain that in more detail? Where does the sample mask come in?
Nevermind, I had a bit of a brainfart there. :oops:
 
Last edited by a moderator:
Subsample reading allows me to finally do proper antialiasing. Currently we need to either do a edge-detect blur filter in post process pass, or render everything in 2x1 (or 2x2) resolution and downsample the result in combine pass (or both for best quality). Subsample reading both improves the AA quality and speeds up the rendering, and it helps the shadow map rendering as well. This is the hands down biggest feature in DX10.1.
What was holding you back from simply adding Z (actually, distance) as another rendertarget in the G-buffer creation? You can read subsamples of that in DX10.0, right?
 
Right, but I don't think TrAA does that. It supersamples. I may be using the wrong terminology, but I'm trying to refer to the artifact free, bulletproof transparency antialiasing.
Ok, "TrAA" isn't really specific. :)

However, transparency supersampling isn't artifact free or bulletproof as it's still just a threshold test per sample. At some point in the mipmap pyramid the texels will all go to one side of the threshold, so in the distance you either get an opaque surface or none at all. What you really want is blending without the sorting problem, and alpha to coverage does a pretty good job with that. The magnification problem remains, but Humus' transparency AA demo shows that there's a solution for that, too.

One of the other reasons I really liked this coverage control is the idea of being able to apply it to techniques like this:
http://fabio.policarpo.nom.br/docs/Curved_ReliefMapping.pdf
There are flaws in the technique that the author doesn't mention (which I uncovered when independently "inventing" a similar technique just before this paper came out :mad: ), but I think there is some promise.
Yes, enabling AA in silhouette-modifying shaders should be one of the main use cases for a shader controllable sample mask.

That what I was referring to most. For alpha testing it's better to properly determine coverage with additional texture samples. Also, when using alpha to coverage for hacky order independent transparency, the dithering is rather ugly, so some shader control could probably ameliorate that.
The dithering should actually work out quite nicely if you apply a filter like ATI's "edge detect" downsampling. Since the samples are different all those dithered areas should be filtered as edges, turning them into really smooth gradients. Though I've yet to see this in practice.
 
What was holding you back from simply adding Z (actually, distance) as another rendertarget in the G-buffer creation? You can read subsamples of that in DX10.0, right?
Honestly, I'd suspect colour-sample readback is (or was, at least) so badly documented in 10.0 that many devs don't even realize it's fully supported...
 
Honestly, I'd suspect colour-sample readback is (or was, at least) so badly documented in 10.0 that many devs don't even realize it's fully supported...
Yeah, that one really irritated me. Admittedly I didn't spend long on it, but there were a bunch of SM4 instructions that I couldn't get to compile due to the documentation and I'm also lucky enough to have the original documentation (functional spec)...

Cheers :cool:
 
Yeah, that one really irritated me. Admittedly I didn't spend long on it, but there were a bunch of SM4 instructions that I couldn't get to compile due to the documentation and I'm also lucky enough to have the original documentation (functional spec)...

Cheers :cool:

Don't mention it.
 
What was holding you back from simply adding Z (actually, distance) as another rendertarget in the G-buffer creation? You can read subsamples of that in DX10.0, right?

I have actually a r32f distance buffer already, as I need that on hardware with no depth-stencil sampling support. I could easily make the color distance buffer a default path on all hardware. However I was under the assumption that SM4.0 does not properly support separate reading of multiple samples, as the DirectX 10 documentation was very minimalistic on this area, and there was no examples either. Texture object's (unfiltered) Load method is the only place where this feature is mentioned at all, and the documentation doesn't state clearly how it should be used to get the samples separately. I can give the sample amount as a parameter, but the return value is only a single float4 vector and I don't want it to resolve the samples for me. Being able to resolve multisample buffer's pixels is not enough to implement proper antialiasing on deferred shader (all samples blended together is not any good when the buffer contains something else than color data).

However I most likely have overlooked some minor detail, as I haven't spent that much time yet on DirectX 10. DirectX 9 is still the top priority platform for us.
 
sebbi: As I said, DX10.0 supports reading the individual samples, Call of Juarez actually uses that to do manual downsampling (to properly handle HDR). However, it's awfully documented, and I'll admit not to be sure how to activate it myself.
 
First you need to create a shader resource view of the type D3D10_SRV_DIMENSION_TEXTURE2DMS or D3D10_SRV_DIMENSION_TEXTURE2DMSARRAY.

In your shader you need a Texture2DMS object.
Texture2DMS<float4, SAMPLES> tsampler;

Then you can read all samples with the load method:
for(int i = 0; i < SAMPLES; i++)
{
float4 v = tsampler.Load(coord, i);
}

SAMPLES need to be a define that contains the number of samples your buffer contains. Therefore you need to compile this shader for every AA level you support.
 
float4 v = tsampler.Load(coord, i);

The SDK documentation states that the texture object Load() method takes "number of samples" as a parameter, not the sample index. So it must be a typo in the SDK documentation then. Too bad this is the only line of documentation written about the subject in the whole SDK :)

Thanks for the help!. I'll experiment with the feature next week. Seems that I can implement AA for the lighting passes this way. The light pass output doesn't have any subpixel info, so after that point I can't really use the subsamples anymore. But it's most likely not needed, as most post process filters (depth of field, motion blur and blooming) are all heavy blur based, and don't need to be done in subpixel accuracy. Screenspace (fake) ambient occlusion is done before the lighting pass (I need the ambient multiplier for the light passes). It most likely needs to be done in subpixel level (to prevent edge artifacts), and might be a problem since I cannot write directly to subpixels (but I can use use a rgba channels of the rendertarget to store 4 subsamples as the ambient multiplier is just a scalar).
 
Last edited by a moderator:
Back
Top