Unimpressed by Antialiasing (continued)

Mintmaster

Veteran
I apologize for creating a new thread, but I kept getting a message saying "Failed to send email" every time I tried to post since Thursday. Normally I would let it go, but I spent some time typing in a post, and didn't want to see it go to waste.

The original thread is here.

My new post:

Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair, fill-rate limited comparison, because that's what any type of FSAA impacts.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare its scores with GF4's MSAA.

However, your point about anisotropic filtering with FSAA on the GF3/4 is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 (1MB compressed) resolution texture for the bush branches or so to get non-blurry edges with alpha blending. If you're allowed to walk through the bush (or a field with tall grass) and a leaf on the bush branch fills a large part of the whole screen, even this is very inadequate, and you need a high res texture for each bush, branch, or stalk of grass. If you do this for every type of tree/bush/grass, that's a lot of space just for bush textures.

A good example of this is the nature test in 3DMark2001. Each swaying branch in the trees have one fairly low-res alpha-tested texture on a couple of polys that has many leaves. They look like like hundreds of polygons with this effect. The same goes for the grass because the alpha-tested texture has many blades of grass on it. No matter how close the camera gets to the leaves, they don't get fuzzy edges. If you use alpha blending, you will either get blurry leaves when they get close (or even at mid-distance), or you'll need huge textures for each type of branch/grass there is, and you can see that there are a lot (it's not the same texture used everywhere).

Even with the huge texture requirements of the CodeCreatures benchmark, grass still gets blurry when close, and leaves are done with alpha testing. I suppose you could argue that it looks like a depth of field effect, except that if you're looking at something else nearby, it wouldn't make sense.

3D graphics is generally the pursuit of reality, not the pursuit of your fondness for blurry things. Alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need significant computational power, requiring extra cycles.

If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down due to computational requirements.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. If you are executing a shader at multiple points in the pixel, that's supersampling. Multisampling in our context is using the same single pixel shader output value for each sub-sample, and only taking extra samples from the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs doesn't save fillrate, so you're back to doing supersampling. If you want to make an argument for MSAA, you have to stick with this definition.

As for "doing supersampling before outputting the pixel", I have no idea what you're talking about. Even multisampling requires a full size frame-buffer, although you can compress it a bit better through various techniques. Complex pixel shaders with branching would rarely be bandwidth limited anyway, because they take so many cycles to complete.


I forgot about one other important situation: dependent texture reads. Using bumped cube-mapping can cause a lot of aliasing, especially since you can't filter normal maps without creating an incorrect, hacked image. You can use the 4 reflection rays from a 2x2 block to select a mip-map from the cube texture, and maybe even do aniso with the reflection rays, but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection. Other pixels shaders with different uses of dependent texture reads can't be solved by this. The only thing you can do is supersample.

I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has big drawbacks. Heck, you were the one saying ATI's aniso is no good because the few cases in which it doesn't work makes it unusable. I do not hold such an extreme view about MSAA however.

I'm also saying the GF4's implementation is not worth those drawbacks, since it offers a marginal improvement over MSAA. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on Reactor Critical are wrong).
 
Mintmaster said:
You have to make a fair, fill-rate limited comparison, because that's what any type of FSAA impacts.
If you define fillrate as the rate at which you can shade pixels, then MSAA likely doesn't affect fillrate at all. MSAA requires more bandwidth to read/write the extra Z and color values, but since the same color is used for each subsample that passes depth test, you aren't doing any extra work shading: This is the main advantage of MSAA. Now, if you have to wait for the extra samples to be written, then your fillrate may go down, but it isn't always the case that you are waiting, especially if the pixel shader program is relatively complex.
 
I think you're confusing the symptoms of many different types of problems.

fill-rate limited comparison, because that's what any type of FSAA impacts.

MSAA is bandwidth-limited, not fillrate limited. That is the primary argument in favor of MSAA. The two bottlenecks interact in complex ways; however, they are two very different things.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare its scores with GF4's MSAA.

Quake III at 1600x1200 with 4x AA on the GeForce 4 is heavily bandwidth limited. That is why 1600x1200 (with 4xAA especially) costs so much more relative to lower resolutions with the same degree of multisampling applied. 4X SSAA on the GeForce 4 would cause a bigger hit than just 75%, since texture read bandwidth is quadrupled, too.

Complex pixel shaders with branching would rarely be bandwidth limited anyway, because they take so many cycles to complete.

Once shaders become more complicated (so that bandwidth is irrelevant), multisampling will be even more attractive, since it doesn't compound the fillrate hit caused by large shaders, like SSAA. This is why most Renderman renderers use MSAA, rather than SSAA.

If you want alpha-blended grass not to get blurry when you get near it

The popular real-time solutions to this problem would be either adding a (mostly noise) detail texture, tiling the grass texture, or simply changing your goals (it can also be done procedurally on new hardware, if you've got shader cycles to burn).

but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection

Which is why you use trilinear filtering on cube maps. Even low-degree supersampling won't add enough samples to overcome the horrible foreshortening problem you've described.
 
Mintmaster said:
Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

No, rounded edges are not "generally good." Quick example: Imagine a chain-link fence, viewed up close. The chains would generally be a diagonal string of pixels, and a plain alpha test up close would cause them to look wavy. Regardless, it should remain obvious that the tradeoff between an alpha blend and an alpha test can be considered more or less of a toss-up when viewed closely, but the answer is immediately obvious once you start considering textures whose texels are much smaller than screen pixels (i.e. once you start using MIP maps other than the first).

As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 (1MB compressed) resolution texture for the bush branches or so to get non-blurry edges with alpha blending.

You can use many smaller, repeating textures instead. As nVidia's tree demos from way back showed, this can be done pretty well and look quite good.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need significant computational power, requiring extra cycles.

I believe all that you need for this are the variables that are automatically calculated anyway for texture filtering (such as DDX/DDY that the NV30 exposes). Additionally, I was talking more about using a supersampling-like method within the pixel shader on a per-primitive basis (i.e. the programmer decides that pixel shader X applied to surface Z needs supersampling, and so he/she implements it in to the shader).

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen.

Right. What I'm arguing against is using SSAA as an FSAA method.

As for "doing supersampling before outputting the pixel", I have no idea what you're talking about.

I'm talking about outputting only one color for the pixel, despite the possibility of doing something like supersampling within the pixel shader.
 
OK, for anyone making the fillrate vs. bandwidth argument, you are totally missing the point.

When I said fillrate limited, I meant effective fillrate, which is defined by bandwidth. All I meant was the test can't be CPU or TCL limited, as is the case with 1024x768. If the Geforce4 Ti4600 had a much faster CPU to feed it data, it would score 340 fps, according to the 1600x1200 score on Tom's Hardware. Since the 1024x768 score with 4xFSAA is 111 fps, you're still seeing about a ~70% performance hit.

Now I have to make a very important point that nearly every reviewer screws up. 1600x1200 is no more bandwidth limited than 1024x768, provided you aren't CPU/TCL limited at 1024x768. In fact, you are generally LESS bandwidth limited.

Each pixel that passes the Z-test requires a Z-read, Z-write, colour buffer write, and texture reads (some pixels are alpha blended and need colour-buffer reads, but we'll ignore those for now). Assuming a 2:1 z-compression ratio, that's 8 bytes per pixel + texture bandwidth. That's independent of resolution - per pixel means just that. Higher resolutions actually wind up having a higher Z-compression ratio, because fewer pixels are polygon edges, and the planar interior of polygons are what Z-compression algorithms are based on.

As for texture bandwidth, if you are always sampling a mipmap lower than the top level (this hardly happens), it will stay the same per pixel; otherwise texture bandwidth will go down with resolution. Think about it - the same textures (wherever top-level mipmap) get spread over more pixels, so the average bandwidth per pixel goes down. Caching provides the texture bandwidth for free between texels.

It doesn't really matter how much bandwidth is required per frame. What matters is bandwidth required per pixel. That is how it affects fillrate. The way the Geforce4 Ti4600 has its memory and core clock speeds set up, it theoretically has 69.3 bits of memory access per pixel pipe per clock. Since the pixel rendering requires about 64 bits per pixel + texture bandwidth (which varies) and you never get 100% memory utilization, more often than not you are bandwidth limited, lowering the pixel rate down from 4 per clock. Nonetheless, resolution doesn't play into this, except for a bit the other way around from what everyone says.

Look at the 4xFSAA scores from Tom's Hardware. 42 / 111 = 0.38. 1024x768 has 0.41 times the pixels as 1600x1200. Since the FSAA scores scale nearly perfectly with resolution, you've got pretty much the same pixel rate at either resolution.


When talking about the GF4's MSAA, I am telling you that the pixel rate goes down to 30% of its original, say from 3.3 down to 1.0 (a guess, but the 70% performance hit is based on empirical evidence as already shown), regardless of resolution. Sure, the theoretical fillrate with unlimited bandwith is still 4 pixels per clock, but the actual fillrate goes down. With SSAA, you would go from 3.3 down to 0.83. When at 1024x768, if the Geforce4 could do SSAA it would be 83% of the MSAA score. In my opinion, that small performance decrease is well worth it.

However, the Radeon 9700 has a much lower decrease with MSAA due to more bandwith and colour buffer compression. This makes MSAA more useful, as SSAA will probably be less than half the MSAA speed.

OpenGL guy, you are completely right about longer pixel shader programs hiding the bandwidth. I am strictly arguing about the lackluster performance of GF4's implementation of MSAA with normal texturing situations, not MSAA in general. However, one per pixel shader program run per screen pixel may not be enough to avoid aliasing, such as when CND/CMP or branching are used, or dependent texture reads are done.
 
Chalnoth said:
Mintmaster said:
Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

No, rounded edges are not "generally good." Quick example: Imagine a chain-link fence, viewed up close. The chains would generally be a diagonal string of pixels, and a plain alpha test up close would cause them to look wavy. Regardless, it should remain obvious that the tradeoff between an alpha blend and an alpha test can be considered more or less of a toss-up when viewed closely, but the answer is immediately obvious once you start considering textures whose texels are much smaller than screen pixels (i.e. once you start using MIP maps other than the first).

As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 (1MB compressed) resolution texture for the bush branches or so to get non-blurry edges with alpha blending.

You can use many smaller, repeating textures instead. As nVidia's tree demos from way back showed, this can be done pretty well and look quite good.

Rounded edges are an option with alpha tests. To get straight edges, imagine these alpha values in a 2x2 square on the texture:

0.6 -- 0.2
|...........|
|...........|
1.0 -- 0.6

If you used a compare value of 0.6, you'd get a perfectly straight diagonal line form the top left to the bottom right. Basically, you have control over how you want to do curvature.

Using mip-maps other than the first only happens when the "texels are much smaller than screen pixels", as you mentioned. This only happens when the textures are big enough, and if you get close enough to the texture, blurriness will always happen.

As for using repeating textures, this isn't always an option, as the effect is completely different. If you wanted to do the "nature" demo with many different types of grass (yellow, green, wheat, different shapes, i.e. realistic) or flowers, repeating textures just don't cut it. Besides you can repeat alpha tested textures too. What if with repeating textures and using alpha tests, you were maxxed out on texture space? What then?

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need significant computational power, requiring extra cycles.

I believe all that you need for this are the variables that are automatically calculated anyway for texture filtering (such as DDX/DDY that the NV30 exposes). Additionally, I was talking more about using a supersampling-like method within the pixel shader on a per-primitive basis (i.e. the programmer decides that pixel shader X applied to surface Z needs supersampling, and so he/she implements it in to the shader).


DDX/DDY will calculate things like the texture coordinate gradient (as is used for anisotropic filtering), not a texture gradient. For dynamic branching, the input to an "if" statement will usually be a texture sample or some other per-pixel calculation, and this is what you need a gradient of. Without this info, you have no idea how to blend between the two or more branches of the if statement. This gradient is nearly always impossible to find correctly.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen.

Right. What I'm arguing against is using SSAA as an FSAA method.

Well then why didn't you say this in the first place? You were saying that there is no need for SSAA with good texture filtering, and so SSAA is useless. If that's your stance, then I agree with you.

As for "doing supersampling before outputting the pixel", I have no idea what you're talking about.

I'm talking about outputting only one color for the pixel, despite the possibility of doing something like supersampling within the pixel shader.

Supersampling within the pixel shader costs just as much as supersampling. This isn't MSAA, anyway, as you are arguing it to be.
 
gking said:
Once shaders become more complicated (so that bandwidth is irrelevant), multisampling will be even more attractive, since it doesn't compound the fillrate hit caused by large shaders, like SSAA. This is why most Renderman renderers use MSAA, rather than SSAA.

Yes, you're right. However, branching will cause aliasing. The only way to fix that is to supersample. Of course MSAA is faster - it just isn't sufficient sometimes.

If you want alpha-blended grass not to get blurry when you get near it

The popular real-time solutions to this problem would be either adding a (mostly noise) detail texture, tiling the grass texture, or simply changing your goals (it can also be done procedurally on new hardware, if you've got shader cycles to burn).

How would a detail texture stop an alpha edge from being blurry? With an alpha test, you gradually get a blend of the background from the transparent texels to the opaque texels. Detail textures and procedural textures do nothing for this fuzzy edge.

but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection

Which is why you use trilinear filtering on cube maps. Even low-degree supersampling won't add enough samples to overcome the horrible foreshortening problem you've described.

Yes, that was implied. The question is, how do you choose the mip-map level? You need a mechanism for determining that, and it isn't trivial. The 2x2 solution is only halfway there. When a ripple of the ocean is sub-pixel, you need many samples to get the reflection correct. Filtering the normal maps doesn't achieve the same effect, and is a hack. The pixel shader demo in 3DM2K1, for example, doesn't filter the normals completely (i.e. incomplete mip chain). That's why you have so much shimmering, and it would still shimmer (although less) if you had trilinear filtered cube maps, because the normal map has many texels per screen pixel. If you did filter the normal maps completely, the water would look like a smooth mirror in the distance.

Besides, while this case of dependent texture reads has a method for determining mip-map selection (rays from the 2x2 group), other uses of dependent texture reads have no obvious solution.


Again, remember my argument - Multisampling is good. It just isn't perfect. You are losing something by not doing multiple samples for each pixel in the interior of a polygon.
 
Mintmaster said:
OK, for anyone making the fillrate vs. bandwidth argument, you are totally missing the point.
No, I think you didn't state things clearly.
When I said fillrate limited, I meant effective fillrate, which is defined by bandwidth.
And now you do, but no one is going to agree with this statement.

If you take a chip that can do 1 pixel per cycle and clock the engine at 100 mhz, no amount of bandwidth is going to raise that fillrate over 100 megapixels. Bandwidth doesn't define fillrate, but it can be a limiting factor.
 
OpenGL guy said:
Mintmaster said:
OK, for anyone making the fillrate vs. bandwidth argument, you are totally missing the point.
No, I think you didn't state things clearly.
When I said fillrate limited, I meant effective fillrate, which is defined by bandwidth.
And now you do, but no one is going to agree with this statement.

If you take a chip that can do 1 pixel per cycle and clock the engine at 100 mhz, no amount of bandwidth is going to raise that fillrate over 100 megapixels. Bandwidth doesn't define fillrate, but it can be a limiting factor.

Nice worst case senario for his example, but you have to admit that in almost all recent (and not so recent) cards, bandwidth HAS determined effective fillrate.
I think you are trying to NOT see his point.
 
Althornin said:
Nice worst case senario for his example, but you have to admit that in almost all recent (and not so recent) cards, bandwidth HAS determined effective fillrate.
So tell me, how much fillrate does a card with 15.56 GB/s bandwidth have? See what I mean?
I think you are trying to NOT see his point.
Thank you for thinking for me. :rolleyes:
 
Nice worst case senario for his example, but you have to admit that in almost all recent (and not so recent) cards, bandwidth HAS determined effective fillrate

And the "worst case" is going to be more popular on newer cards. DX9 games will have shaders that run at 30 clocks/pixel (or worse). Suddenly, 2.4GPixels/sec is just 80MPixels/sec, and bandwidth is _not_ a limiting factor. MSAA is essentially a free way to improve edge quality in these cases.

DDX/DDY will calculate things like the texture coordinate gradient (as is used for anisotropic filtering), not a texture gradient

No, you just sample the texture, shove it in a temporary, and call the derivative functions.

Yes, that was implied. The question is, how do you choose the mip-map level? You need a mechanism for determining that, and it isn't trivial.

The solution _is_ there by using partial derivatives. The problem is that you've defined a slope where the MIP level changes significantly at every pixel. Not only is this a highly unlikely scenario, the only solution that doesn't require shading dozens of samples is direct texture convolution.

However, branching will cause aliasing. The only way to fix that is to supersample

Or you sample all of your textures exactly as much as needed (projecting the pixel filter kernel onto the texture).

Besides, while this case of dependent texture reads has a method for determining mip-map selection (rays from the 2x2 group), other uses of dependent texture reads have no obvious solution

But most do, provided that you have access to high-quality texture filtering and derivative functions.

Besides, while this case of dependent texture reads has a method for determining mip-map selection (rays from the 2x2 group), other uses of dependent texture reads have no obvious solution

Or, you do something clever like computing normal distributions (based on the frequency of the waves and the solid angle subtended by a wave in a pixel, and then use that computation to look up into a specific MIP level in a reflection map (and probably blend between a diffuse and specular reflection map). In fact, this solution probably more accurate (with fewer temporal aliasing problems) than just using supersampling.

How would a detail texture stop an alpha edge from being blurry?

Just some additional high-frequency noise to break up the obvious lerping artifacts.

Again, remember my argument - Multisampling is good. It just isn't perfect. You are losing something by not doing multiple samples for each pixel in the interior of a polygon.

In some cases, yes -- the only solution is to take more samples. However, in most cases, doing direct convolution on the texture maps coupled with intelligent shaders (that handle aliasing problems as much as possible internally) gets you almost all of the way there -- certainly moreso than doing supersampling on dumb shaders -- at a fraction of the cost.
 
Rounded edges are an option with alpha tests. To get straight edges, imagine these alpha values in a 2x2 square on the texture:

0.6 -- 0.2
|...........|
|...........|
1.0 -- 0.6

If you used a compare value of 0.6, you'd get a perfectly straight diagonal line form the top left to the bottom right. Basically, you have control over how you want to do curvature.

Any compare value would give you a diagonal line with those alpha values.

I really do not see how it is possible for the mere selection of an compare value to allow for alpha tests that always result in straight lines. I think of the alpha values as similar to lines on a contour map of a region. Selecting a specific compare value just picks out one of the lines on the contour map. While it is possible that depending on the angle and compare value, straight lines can be generated, I simply do not see how it will happen the same with every texture.

As for using repeating textures, this isn't always an option, as the effect is completely different.

Even in today's games, the only normal gaming situations where texture pixels are much larger than screen pixels are those situations where it's intentional, such as in lightmaps, or in cloud or fog textures. You don't normally crouch in the grass in Serious Sam, or stare at a wall close-up in any game. In other words, the main drawback of the alpha blend will only show up in either manufactured situations, or in very short spaces of time where it is fairly less noticeable. The sitautions where alpha tests have major drawbacks are much more common and more noticeable.

DDX/DDY will calculate things like the texture coordinate gradient (as is used for anisotropic filtering), not a texture gradient. For dynamic branching, the input to an "if" statement will usually be a texture sample or some other per-pixel calculation, and this is what you need a gradient of. Without this info, you have no idea how to blend between the two or more branches of the if statement. This gradient is nearly always impossible to find correctly.

I really don't see why this is an issue. I was under the impression that the only factors that went into choosing a method of texture filtering had to do with texture orientation. I suppose if your if statment includes a branch between two textures of different sizes you might have a problem, but I don't see why that can't be avoided.

Anyway, I'd like to put a little disclaimer here. I don't know for certain the full capabilities of DX9 hardware, such as whether it is possible to do super-sampling within the shader.

Well then why didn't you say this in the first place? You were saying that there is no need for SSAA with good texture filtering, and so SSAA is useless. If that's your stance, then I agree with you.

Because previously I'd only ever thought of SSAA as an FSAA method. I've never encountered it in any other form.

Supersampling within the pixel shader costs just as much as supersampling. This isn't MSAA, anyway, as you are arguing it to be.

It doesn't have the same memory bandwidth hit, by far, since all blending is done internally. Since it doesn't use the external framebuffer, there's little problem (beyond programming constraints) in making a dynamic method.

OK, for anyone making the fillrate vs. bandwidth argument, you are totally missing the point.

When I said fillrate limited, I meant effective fillrate, which is defined by bandwidth.

Limited by bandwidth would be a better definition of effective fillrate. I'm certain that different benchmarks have different stresses on pure fillrate and memory bandwidth bandwidth.

Particularly when anisotropic filtering is used, the GeForce3/4 architecture is not nearly as memory bandwidth-limited (which can be shown by the reduction of the performance hit when aniso is used in conjunction with FSAA).

With SSAA, you would go from 3.3 down to 0.83. When at 1024x768, if the Geforce4 could do SSAA it would be 83% of the MSAA score. In my opinion, that small performance decrease is well worth it.

This is far from true when you factor in anisotropic filtering. The decoupling of edge AA and texture AA will always result in a higher performance/image quality ratio that that given by SSAA when used as an FSAA method.

Even on the GeForce4, to talk about the MSAA being "lackluster" is meaningless when you're not taking anisotropic filtering into account.
 
Again, remember my argument - Multisampling is good. It just isn't perfect. You are losing something by not doing multiple samples for each pixel in the interior of a polygon.

How do you figure, what is it that's being lost? The only possible benefit is a bit of texture aaing. The extra samples would be better spent on higher quality filtering.

I've read on the board there's a problem with aloha textures, but I still don't get it. I think it's probably an implementation specific issue.
 
Again, about my fillrate arguments:

Do you guys remember where this argument started? It was with me posting the impact of MSAA on the GF4.

Point 1: You cannot be TCL limited or CPU limited in any benchmark when comparing how fast a video card can fill pixels. That's why 1600x1200 is needed for assessing the impact of MSAA on the GF4.

Point 2: Resolution hardly plays any impact into how bandwidth limited you are, unless you are TCL or CPU limited in lower resolutions and not in higher, in which case you aren't fillrate limited either. Bandwidth requirements actually go slightly down per pixel with higher resolutions.

Point 3: OpenGL guy, bandwidth does define fillrate - for a specific GPU doing a specific test, i.e. GF4 doing a Q3 time demo, which is what I am talking about in this whole argument. You can't take my arguments out of the context in which they apply. There is a small dependancy on resolution too, but it's the other way around from what people think, and can be ignored for the most part.

Now whatever you want to call it - effective fillrate, actual fillrate, bandwidth limited fillrate, real-world fillrate, whatever the **** you want to call it - the rate at which a GF4 Ti4600 outputs pixels drops 70% with MSAA. This is what I have shown you guys evidence of, this is my point, and this is related to what the thread is about.

Now stop arguing for the sake of arguing. Althornin is absolutely right - you guys are intentionally not seeing my point and sidestepping the issue.
 
About CND/CMP/alpha tests and branching:

I really do not see how it is possible for the mere selection of an compare value to allow for alpha tests that always result in straight lines

I never said that. Why do you need this? When designing a texture, you are the author. You define the compare value AND the alpha values of the texture. That's how they get the round edges to look like leaves. That's how they get a grate to have capsule shaped holes or rectangular holes.

Alpha tests are important. With unlimited texture space you can get rid of them, but the fact is that you need much more space to get an alpha blended texture to have the same hard edges, depending on your maximum distance to the texture. This is the drawback, plain and simple. If you can work around the drawback, fine. MSAA is fine in this case. This is not always the case, however.


I really don't see why this is an issue. I was under the impression that the only factors that went into choosing a method of texture filtering had to do with texture orientation.

You have to remember what we are talking about here - replacing a CND or CMP instruction with a smoothstep function. These instructions conditionally choose either argument. There is no blend between them, its just one or the other. This causes aliasing because they are discontinuous functions. When talking about branching, it could be similar to the CND/CMP situation, or it could be even more complex due to multiple branches (i.e. if ... else if .... else if .... else).

The way you program around it to avoid aliasing is by using a smoothstep function instead of a CND/CMP. This is a function of the compare variable that, instead of jumping from one to the other, blends gradually about the compare value.
http://www.microsoft.com/mscorp/corpevents/meltdown2001/ppt/DXGLighting.ppt starting on slide 49. Sorry for not posting this earlier - I assumed you all knew what I meant by "replacing CND with a smoothstep function".

If you want to get the same effect as CND/CMP but without the aliasing, you need a dynamic smoothstep function, in which the slope depends on the variables I discussed earlier. Otherwise, a fixed smoothstep won't antialias some conditions (sharp angles), and will blur others (like if you're too close). You also need the gradients with respect to screen axes of the functions that are the inputs to CND/CMP, or else you will get aliasing and blurring in different points in the image, even if they are subject to the same conditions (distance, angle). The inputs define how fast you ascend the smoothstep function with respect to the screen. You want the slope to span one screen pixel, and then you will get an ideal antialiased edge.

No, you just sample the texture, shove it in a temporary, and call the derivative functions

I don't know what your math background is, but there is no way in hell you can calculate a derivative from a single value. You need at least neighbouring points to approximate the derivative. And even if NV30 by some miracle is smart enough to take neighbouring texture samples, who says we're limited to texture samples? A branching condition, if dynamic, will often be based on an mathematical expression, which can in turn involve texture inputs. Unless NV30 is given this expression, it can't calculate DDX/DDY. I think these functions are for the derivatives of the texture coordinates, but I'm not sure. I am 100% sure that you can't get the derivative of an arbitrary function from a single value.


Multisampling basically does extra samples of the edges and intersections of polygons, because that's where 99% of the edges are, and they are very simple and cheap to look for (just some interpolation for the subsamples and Z-accesses are needed). It is a very good idea. However, we are starting to see edges within polygons via the methods mentioned above. You cannot avoid them without extreme additional effort, which doesn't take care of every situation anyway. I am also willing to bet that we will never see a game developer fuss with all of these complex algorithms just to avoid aliasing, especially if they have drawbacks and the supersample solution is still there when needed.
 
I think everyone can agree that for most normal curcumstances Multisampling with good texture filtering should be good enough. However there are those few circumstances where the quality of texture filtering will just not help. In those cases you'd want to be using Supersampling.

A simple solution would be to have hardware with conditional Multisampling/Supersampling. It really should be something controlled by the app developer. For example in OpenGL it would be controlled by a hint.

For Multisampling you'd do glHint(GL_FSAA_HINT, GL_FASTEST)
and
For Supersampling you'd do glHint(GL_FSAA_HINT, GL_NICEST)

In cases where FSAA is being forced by the driver, you could attempt to automatically detect which method to use. If alpha testing you'd enable Supersampling, if doing dependant texture reads you might want to do it, and if doing conditionals in the pixel shader you'd all possibly want it.

Now, there already were some rumours going round a while back that ATI might even be doing something like that since they claimed that R300 wouldn't have alasing with Alpha tests. If ATI can do that, then I would think that exposing it to developers would be a nice idea.

-Colourless
 
Mintmaster said:
You have to remember what we are talking about here - replacing a CND or CMP instruction with a smoothstep function. These instructions conditionally choose either argument. There is no blend between them, its just one or the other. This causes aliasing because they are discontinuous functions. When talking about branching, it could be similar to the CND/CMP situation, or it could be even more complex due to multiple branches (i.e. if ... else if .... else if .... else).

You have to remember that the NV30 hardware will not actually branch in the pixel shader. Instead, it executes all branches, with the final result chosen at the end. Additionally, since all dynamic branches must be based on mathematical data, it should be possible to easily calculate a smooth step function, at no additional performance hit, by simply blending instead of switching between the different textures or rendering paths.

Shouldn't it be possible to use the texture filtering variables (At least two of which are exposed by DDX, DDY) to calculate the smooth function?

What I think is kind of funny here, is that you're essentially talking about the same thing as the alpha test/blend issue.

It is a very good idea. However, we are starting to see edges within polygons via the methods mentioned above. You cannot avoid them without extreme additional effort, which doesn't take care of every situation anyway.

It doesn't take "extreme additional effort." I implemented the alpha blend in UT's OpenGL renderer with just a couple of hours of work (mostly just had a few problems debugging a couple of stupid mistakes related to the management of the many different blending algorithms). It's just the replacement of a compare with a blend. You can remove most aliasing in this fashion.

For example, with the alpha blend, there won't be any aliasing beyond that inherent within the texture filtering. I really do not see why any similar application of a smooth step function would need to be different. If those situations do arise, it doesn't seem to me to be much of a problem to just do supersampling in the pixel shader for that surface.
 
Btw, one additional little thing. It is possible to do anti-aliasing on an alpha-tested surface without resorting to supersampling or even an alpha blend.

Here's how:
1. Compute different alpha values for each MSAA sample based on the same MIP maps used for texture filtering.

2. Do an alpha test on each alpha value computed.

3. Only write the pixel sub-samples for those alpha tests that succeed.

The main benefit is that if the alpha testing hardware is only 8 bits in accuracy (I don't really see a need for more), then such hardware only takes a very tiny amount of die space compared to the overal rasterizer.
 
Chalnoth said:
Here's how:
1. Compute different alpha values for each MSAA sample based on the same MIP maps used for texture filtering.

2. Do an alpha test on each alpha value computed.

3. Only write the pixel sub-samples for those alpha tests that succeed.

Um, that basically is supersampling. Instead of supersampling the entire texture you are only supersampling the alpha values. It could be argued that the extra transistors to do that is a waste. Adding extra scalar pipelines would be a waste. They would need to be able to do the exactly the same operations as the normal pipeline too. It would be a real problem when using pixel shaders.
 
Chalnoth said:
Mintmaster said:
You have to remember what we are talking about here - replacing a CND or CMP instruction with a smoothstep function. These instructions conditionally choose either argument. There is no blend between them, its just one or the other. This causes aliasing because they are discontinuous functions. When talking about branching, it could be similar to the CND/CMP situation, or it could be even more complex due to multiple branches (i.e. if ... else if .... else if .... else).

You have to remember that the NV30 hardware will not actually branch in the pixel shader. Instead, it executes all branches, with the final result chosen at the end. Additionally, since all dynamic branches must be based on mathematical data, it should be possible to easily calculate a smooth step function, at no additional performance hit, by simply blending instead of switching between the different textures or rendering paths.

NV30 doesn't actually branch? Well then that's exactly the same as CND/CMP. These instructions just choose the output. In this case, I don't see why you need mathematical data as input to the branch condition, since CND/CMP don't have this restriction. Still, I think future hardware will branch, but I'm not completely sure about it.

Again, the problem is determining how to blend between the two options. That's what you need a dynamic smoothstep function for. Yet again I am telling you that this is extremely hard, because unless you get the slope of the smoothstep function exactly right on a per-pixel basis, then you will either get aliasing (too steep slope) or blurring (too shallow slope).

Shouldn't it be possible to use the texture filtering variables (At least two of which are exposed by DDX, DDY) to calculate the smooth function?

What I think is kind of funny here, is that you're essentially talking about the same thing as the alpha test/blend issue.

Yes, this is the same issue. However, an alpha test is based only the alpha component of a texture sample. It is the simplest example of using compare function. CND and CMP can be based on anything. One example is a per-pixel dot product. You can choose different shading properties based on an angle this way. There are plenty of other options.

The smoothstep function needs to know how this function that is used in the input of CND/CMP varies across the screen. Specifically, the derivative it needs to know is the rate of change of this input function with respect to the x and y axes of the screen. Then it can use the value of this function (rather than just the yes/no compare result) to blend the two options if it is very close to the compare constant. When you do something like a dot product of texture inputs involved in matrix multiplies, you can see that this derivative is very complex, not precomputable, and can only be approximated by taking neighboring samples and finding the slope.

So now you've already taken multiple samples of the compare function to get derivatives, and now you have to do more math to figure out the approximate slope. Then you have to figure out the blend factor, and finally you can blend between the options of the CND/CMP. Finally, this only works when the slope of the compare function is does not vary much over the whole pixel footprint, or aliasing will once again ensue.

Way too much effort, doesn't work all the time, and it probably won't save any time over supersampling, since you've already done some supersampling of the functions determing the outcome of branching/compare.

It is a very good idea. However, we are starting to see edges within polygons via the methods mentioned above. You cannot avoid them without extreme additional effort, which doesn't take care of every situation anyway.

It doesn't take "extreme additional effort." I implemented the alpha blend in UT's OpenGL renderer with just a couple of hours of work (mostly just had a few problems debugging a couple of stupid mistakes related to the management of the many different blending algorithms). It's just the replacement of a compare with a blend. You can remove most aliasing in this fashion.

You are going back to alpha blending again. I told you that alpha blending is an option if you are willing to live with the drawbacks. That was the first part of my last post. The extreme effort is in implementing a dynamic smoothstep function.

For example, with the alpha blend, there won't be any aliasing beyond that inherent within the texture filtering. I really do not see why any similar application of a smooth step function would need to be different. If those situations do arise, it doesn't seem to me to be much of a problem to just do supersampling in the pixel shader for that surface.

Yes, supersample that surface only. I cannot agree with you more, and this what I have been saying from the beginning. Supersampling is necessary in some cases only, and on a per surface basis it will be a very useful supplement to multisampling. That is all I want - you to say that there are some circumstances where supersampling is useful. You said "There's still no reason for SSAA", and that was not correct, and is what I am illustrating for you.

Colourless said:
However there are those few circumstances where the quality of texture filtering will just not help. In those cases you'd want to be using Supersampling.

Thank you! You must have been actually understanding my point before arguing away.
 
Back
Top