Curious why no hybrid MSAA/SSAA?

OpenGL guy said:
There's a difference between HiZ and early Z, at least on R300 and R420 based products. HiZ can be enabled while alpha test is enabled because HiZ only does early rejection, not early acceptance (conservative algorithm). Early Z test cannot be enabled because it actually would update the Z value if the pixel passed, which would cause problems if it were later killed by the alpha test.
Ah, I knew it :D

Regarding switching between SSAA and MSAA on the fly... I have tried several ideas here none of which work acceptably for various reasons.
Could you elaborate? I don't see a reason other than performance.
 
No. I wasn't concerned with the issues of handling heirarchical Z - that's an extra problem. You can do early Z tests with a bog-standard Z buffer.

Then I still don't get it.
Early z test meaning test before shading/texturing/etc?
Unless your shader writes to z, you can always do early z, right?
Whether you have alphatest or not, if the z is greater than what's in the zbuffer, the pixel will not be visible, so it is not necessary to perform any alphatest at all. Only if the z-test passes, you have to do the alphatest.
So without a hierarchical zbuffer, I don't see any kind of problem?
 
Scali said:
No. I wasn't concerned with the issues of handling heirarchical Z - that's an extra problem. You can do early Z tests with a bog-standard Z buffer.

Then I still don't get it.
Early z test meaning test before shading/texturing/etc?
Unless your shader writes to z, you can always do early z, right?
Whether you have alphatest or not, if the z is greater than what's in the zbuffer, the pixel will not be visible, so it is not necessary to perform any alphatest at all. Only if the z-test passes, you have to do the alphatest.
So without a hierarchical zbuffer, I don't see any kind of problem?

In NVidias early Z implementation you certainly can (or at least could circa NV2X).

However I know of at least one implementation where it wouldn't work, and early Z has to be disabled to render alpha tested geometry.

As for the Mixed MSAA/SSAA, NV2A can strictly do this, but the cost of changing MS->SS and back is significant, so it really requires application support for it to work properly.

i.e the Applcation would ideally batch all the SS geometry together to minise the number of switches.

Doing this in the driver has the potential to be obnoxiously slow or producing incorrect results if it attempts to optimise and affects the draw order.
 
As long as the hardware kept all the alpha blended textures and their order the same as they came in then the rest of the order doesn't matter so if its deferring might as well optimize for least writes. But hehe Chalnoth isn't going to like the suggestion of an IMR being deferred :p

Edit: This is for doing it in the driver of course

Also wouldn't any wise programmer batch his alpha tested polys all together as well as all of his alpha blended polys at the end? There still is a state change when enabling or disabling alpha testing so shouldn't be a performance hit really for an additional state change hopefully at the same time (besides the hit for SSAA of course).
 
Scali said:
Then I still don't get it.
Early z test meaning test before shading/texturing/etc?
Unless your shader writes to z, you can always do early z, right?
Whether you have alphatest or not, if the z is greater than what's in the zbuffer, the pixel will not be visible, so it is not necessary to perform any alphatest at all. Only if the z-test passes, you have to do the alphatest.
So without a hierarchical zbuffer, I don't see any kind of problem?
The problem is that, as Dio said, in a certain architecture Z writes might be commited at the point of test, and not deferred until after the shading. But since the alpha test could discard a pixel (so a Z write should not occur), this is not possible with early Z test.
 
OpenGL guy said:
Sorry, I can't right now.
Hm, that could be a good or a bad sign... ;)

Anyway, any application can do this:
- set multisample mask (one bit set)
- adjust transformation to shift polygons a bit
- render object with alpha test
- repeat for each sample
Why can't the driver do that?
 
Xmas said:
OpenGL guy said:
Sorry, I can't right now.
Hm, that could be a good or a bad sign... ;)

Anyway, any application can do this:
- set multisample mask (one bit set)
- adjust transformation to shift polygons a bit
- render object with alpha test
- repeat for each sample
Why can't the driver do that?
Because when you shift the polygons a bit, you've also shifted your depth samples... You want to move the texture lookup location to match the currently enabled depth sample. However, you then need to move that depth sample to match the texture lookup (i.e. to the center of the pixel). Once you start shifting around the MSAA samples, you've broken Z compression and other stuff. Disabling Z compression while AA is enabled causes a large performance hit as you would expect. Depending on how early in the frame you have to decompress the Z buffer, you can end up at supersampling speed... so you may as well have just done supersampling to begin with. Actually, this is probably slower than supersampling because the tiling modes probably aren't favorable AA plus the fact that you're running through the geometry multiple times...

Hope I haven't said too much.
 
The problem is that, as Dio said, in a certain architecture Z writes might be commited at the point of test, and not deferred until after the shading. But since the alpha test could discard a pixel (so a Z write should not occur), this is not possible with early Z test.

Well, that's the thing I don't get... What does the z-write have to do with the early z-test?
The z value that is either written or not, depending on the alphatest, is known before shading (still assuming that the shader does not modify it).
And as I said before, pixels are discarded if alphatest fails OR ztest fails.
So if either one fails, you can discard the pixel. Meaning that you can still do the early z test, and discard before shading. If it is not discarded by the z-test, it can later be discarded during shading because of the alphatest, but what does that matter for the early z-test? Are the z-test and z-write not independent operations for some reason that is not apparent? I really don't get it. Is this just a practical limitation because of a 'bad' design in certain architectures? If so, which architectures are we talking about? And is there a reason why it is implemented this way?
 
Scali said:
The problem is that, as Dio said, in a certain architecture Z writes might be commited at the point of test, and not deferred until after the shading. But since the alpha test could discard a pixel (so a Z write should not occur), this is not possible with early Z test.
Well, that's the thing I don't get... What does the z-write have to do with the early z-test?
Because the Z unit typically writes out passing Z values immediately to keep things coherent.
The z value that is either written or not, depending on the alphatest, is known before shading (still assuming that the shader does not modify it).
Nope. Picture two overlapping polygons being drawn with alpha test. Because of pipelining, it's possible for all the Z values to be computed before you do a single shader operation. If you used the wrong Z values to reject pixels of one polygon, you can't get them back.
And as I said before, pixels are discarded if alphatest fails OR ztest fails.
This is why I differentiated between HiZ (early Z rejection) and early Z test. Early Z test is basically the full Z unit.

I've had this discussion on the opengl.org message forums in the past.
 
OpenGL guy said:
Because when you shift the polygons a bit, you've also shifted your depth samples... You want to move the texture lookup location to match the currently enabled depth sample. However, you then need to move that depth sample to match the texture lookup (i.e. to the center of the pixel). Once you start shifting around the MSAA samples, you've broken Z compression and other stuff. Disabling Z compression while AA is enabled causes a large performance hit as you would expect. Depending on how early in the frame you have to decompress the Z buffer, you can end up at supersampling speed... so you may as well have just done supersampling to begin with. Actually, this is probably slower than supersampling because the tiling modes probably aren't favorable AA plus the fact that you're running through the geometry multiple times...
Of course you end up at supersampling speed, since you're effectively doing supersampling, but only for some polygons. But I don't see why you'd "have to decompress the Z buffer". Data in the Z buffer don't suddenly become useless only because you shift the samples, especially as you're wanting to shift the samples in a way that exactly matches the MS sample pattern.
Rendering with multisample mask certainly isn't the best thing for performance because it makes compression almost useless for those tiles affected, but it's better than supersampling for the whole frame.
 
Because the Z unit typically writes out passing Z values immediately to keep things coherent.

Okay, so it is a result of the hardware design, as I suspected.

Nope. Picture two overlapping polygons being drawn with alpha test. Because of pipelining, it's possible for all the Z values to be computed before you do a single shader operation. If you used the wrong Z values to reject pixels of one polygon, you can't get them back.

Hum, so hardware basically runs shaders independently from the rasterization? I mean, the rasterizer starts the shaders for the pixels, but doesn't wait for them to finish, moving on to the next poly if possible?
And this would not be possible when effectively the shaders write z, because of the alphatest. Well, that makes sense.
But only if you know how hardware is actually implemented in detail :)
Do you know this because you are in some way related to the design of hardware, or is there any info available on this sort of stuff? I've never seen any in-depth info on how 3d-cards actually work, and apparently this also differs from model to model, vendor to vendor.
 
Xmas said:
OpenGL guy said:
Because when you shift the polygons a bit, you've also shifted your depth samples... You want to move the texture lookup location to match the currently enabled depth sample. However, you then need to move that depth sample to match the texture lookup (i.e. to the center of the pixel). Once you start shifting around the MSAA samples, you've broken Z compression and other stuff. Disabling Z compression while AA is enabled causes a large performance hit as you would expect. Depending on how early in the frame you have to decompress the Z buffer, you can end up at supersampling speed... so you may as well have just done supersampling to begin with. Actually, this is probably slower than supersampling because the tiling modes probably aren't favorable AA plus the fact that you're running through the geometry multiple times...
Of course you end up at supersampling speed, since you're effectively doing supersampling, but only for some polygons. But I don't see why you'd "have to decompress the Z buffer". Data in the Z buffer don't suddenly become useless only because you shift the samples, especially as you're wanting to shift the samples in a way that exactly matches the MS sample pattern.
But if your compressed value depends on sample location...
Rendering with multisample mask certainly isn't the best thing for performance because it makes compression almost useless for those tiles affected, but it's better than supersampling for the whole frame.
But, as I mentioned, you'd have to decompress the whole buffer, meaning very poor performance... worse than supersampling since supersampling would benefit from compression.
 
Scali said:
OpenGL guy said:
Nope. Picture two overlapping polygons being drawn with alpha test. Because of pipelining, it's possible for all the Z values to be computed before you do a single shader operation. If you used the wrong Z values to reject pixels of one polygon, you can't get them back.
Hum, so hardware basically runs shaders independently from the rasterization? I mean, the rasterizer starts the shaders for the pixels, but doesn't wait for them to finish, moving on to the next poly if possible?
And this would not be possible when effectively the shaders write z, because of the alphatest. Well, that makes sense.
Sure. You have FIFOs between different stages of the pipeline. This is to ensure that you don't get stalls when one part of the pipe slows down or speeds up. Say the pixel shader is backed up doing some very complex shader. Down come your two alpha tested polygons... The Z data could be processed long before and shading for these two polygons begins/
But only if you know how hardware is actually implemented in detail :)
Do you know this because you are in some way related to the design of hardware, or is there any info available on this sort of stuff? I've never seen any in-depth info on how 3d-cards actually work, and apparently this also differs from model to model, vendor to vendor.
Well, early Z isn't something that is generally discussed because that's not covered by the OpenGL pipeline :) At least not in the version I learned! HW implementations and optimizations differ from platform to platform, but once you know one design, you can speak with some confidence about aspects of other designs (i.e. when you probably want/need FIFOs).

I am not a HW engineer, but I have experience with several different designs.
 
Scali said:
The problem is that, as Dio said, in a certain architecture Z writes might be commited at the point of test, and not deferred until after the shading. But since the alpha test could discard a pixel (so a Z write should not occur), this is not possible with early Z test.
Well, that's the thing I don't get... What does the z-write have to do with the early z-test?
How can they not be linked?

I described earlier how one 'solution' to this problem is to buffer the Z writes until the shader has completed execution. This solution would have two problems:

1. It needs a latency compensation FIFO to hold the Z writes during the shader operation. This is likely to be millions of transistors because the latency of the shader is very large.

2. New pixels coming in may overlap old pixels. The old pixel's value may be inside the latency compensation FIFO - which means you have to have logic to block the thread until the write retires or some way of extracting these updated Z values from the FIFO. Neither is easy and probably neither is cheap.

By keeping the test and the write in the same place we can enforce read-write coherency with relatively simple control and some tweaks to the required Z cache logic.
 
Simon F said:
It's even more fun if the application changes the Z compare mode.

Heh, curious does anyone having any idea the performance hit with z greater than? Since I'm sure its all been highly optimized for z less than :)
 
There are certain modes, and particularly certain combinations of modes that can interfere with HiZ a bit - so we recommend picking one mode per render target and sticking to it if at all possible - but most of the common cases should work fine.
 
OpenGL guy said:
But if your compressed value depends on sample location...
I don't get it. If you adjust the sample positions, then you're trying to put sample No. x at exactly the same location where it is in multisampling operation.

And even if that's not possible, it's not too bad. Leave the sampling positions where they are. Ok, your texture and geometry sampling points are not in the same location then, but this is the case with multisampling, too. It may give slight overlap artifacts if polygons with alpha test are connected to polygons without alpha test, but considering how alpha test edges look like, I think it is a worthy tradeoff for many games.

Rendering with multisample mask certainly isn't the best thing for performance because it makes compression almost useless for those tiles affected, but it's better than supersampling for the whole frame.
But, as I mentioned, you'd have to decompress the whole buffer, meaning very poor performance... worse than supersampling since supersampling would benefit from compression.
You don't have to decompress the Z buffer if you don't mess with the sample positions, but only use multisample mask, do you?
 
If all you want is a driver toggle to try and force anti aliasing for alpha tested textures, why not simply route alpha to coverage? Simple, no hardware inefficiencies (unless switching the alpha-to-coverage state is hideously expensive) and you get decent AA (as good as you get on normals polys).
 
Back
Top