Unimpressed by Antialiasing

The whole arguement that AA is merely a gimmick is just flawed. Everything in 3D is a gimmick. Every new feature is designed to improve image quality in a gimmicky way. Without gimmicks, you have no 3D. The better the gimmicks, the better the image. Rendering on higher resolution isn't even a gimmick since it does not improve per-pixel quality.

When CGI meets Real time 3D, Anti-aliasing is going to be a vital part of it, because it deals with the more subtle parts of human perception (spatial/temporal AA) and bumping up the resolution to 1600x1200 just won't cut it.
 
Chalnoth said:
If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.


Wow, I guess he was way over you head there. :) Try reading his post again.
The methods are to replace eachother, not to be used in conjunction.

Anyways, I assume that NV30 won't have THAT much improved texture filtering so I still want RGSSAA on my R300/NV30..
And I personally still prefer SSAA over MSAA with the current generation of card from a quality point of view at least.

So in the past and present I still think SSAA is better. And from what I can gather at least the Radeon 9700 won't change that situation and I'm fairly confident that the NV30 won't change it either.
The argument is pretty silly, when it comes down to "if we had this thing that doesn't exist, MSAA would be the best option".
Let us know when the magical texture filtering that touches every part of the screen arrives and I'll be all ears (or rather all eyes).


And still I haven't heard about the areas that normally go "untouched" by todays filtering. (This is the 4th and final time I'm asking, maybe you're just ignoring me?)
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like. The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to, say, a bush at 1600x1200, you'll need about a 1024x1024 resolution texture or so to get non-blurry edges with alpha blending. If you do this for every type of tree/bush/grass, that's a lot of space for bush textures. Not only does it consume memory, but it will slow down performance a lot, but not as much as SSAA though.

3D graphics is the pursuit of reality, not the pursuit of your fondness for blurry things. And alpha tests are quite essential in representing things realistically.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically, or selectively. What you are describing is basically just a performance optimization for SSAA. If you are executing a shader at multiple points in the pixel, that's effectively supersampling. Multisampling in our context is copying a single pixel shader output value for each sub-sample, and only supersampling the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

I'm not saying multisampling is useless - in fact, it's a good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right.
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like. The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to, say, a bush at 1600x1200, you'll need about a 1024x1024 resolution texture or so to get non-blurry edges with alpha blending. If you do this for every type of tree/bush/grass, that's a lot of space for bush textures. Not only does it consume memory, but it will slow down performance a lot, but not as much as SSAA though.

3D graphics is the pursuit of reality, not the pursuit of your fondness for blurry things. And alpha tests are quite essential in representing things realistically.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically, or selectively. What you are describing is basically just a performance optimization for SSAA. If you are executing a shader at multiple points in the pixel, that's effectively supersampling. Multisampling in our context is copying a single pixel shader output value for each sub-sample, and only supersampling the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

I'm not saying multisampling is useless - in fact, it's a good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right.
 
Mintmaster said:
50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

It depends on where the limits in the engine are. The more fillrate-limited, the less of a hit you'll get. The more memory bandwidth-limited, the more of a hit you'll get.

Huh? Rounded edges are generally good.

Apparently you haven't seen what I'm talking about. Imagine a diagonal texture that, unfiltered, would have a jagged edge. With an alpha test enabled, it will basically have a wavy-rounded edge instead of jagged edge. This is not desirable for situations where you'd rather have a smooth edge.

You can also make the alpha test give you square edges if you set the compare value right.

I don't see how, since the alpha value is linearly-interpolated.

As for larger textures being reasonable, think a bit more practically.

First of all, it's not realistic to actually get very close to grass. Additionally, alpha blending makes more sense simply because you usually want to fade the grass out in the distance. In a game like Serious Sam, how often do you actually duck within the grass in the middle of the game?

The greatest place for alpha blending is in those textures that you want to have the details often be much smaller than pixel size. For example, a chain-link fence. If you attempt to make a chain link fence using an alpha test, and make it fairly realistically-sized, I guarantee you that you'll end up with massive aliasing artifacts at pretty much any resolution, FSAA or no (depending on distance, of course). If you instead use an alpha blend, it will look significantly superior.

That means if you get close to, say, a bush at 1600x1200, you'll need about a 1024x1024 resolution texture or so to get non-blurry edges with alpha blending. If you do this for every type of tree/bush/grass, that's a lot of space for bush textures.

So? 1024x1024 textures are coming. Quake3 used some 512x512 textures. UT had 1024x1024 compressed textures. How many years ago were those games released? Why is it so hard to believe that 1024x1024 textures will be used fairly commonly in upcoming game engines, at least with TC enabled?

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically, or selectively. What you are describing is basically just a performance optimization for SSAA. If you are executing a shader at multiple points in the pixel, that's effectively supersampling.

Yes, it is a form of supersampling, but it's more related to texture filtering, as all processing is done before pixel output. What I was attempting to describe was a technique that would not require the memory bandwidth hit of supersampling. After all, current multisampling methods use quite a bit more memory bandwidth than is necessary. The crucial differences than the supersampling we see today are the ability to decouple edge AA and texture AA, and the memory bandwidth.
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like. The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to, say, a bush at 1600x1200, you'll need about a 1024x1024 resolution texture or so to get non-blurry edges with alpha blending. If you do this for every type of tree/bush/grass, that's a lot of space for bush textures. Not only does it consume memory, but it will slow down performance a lot, but not as much as SSAA though. The memory footprint is the main concern.

3D graphics is the pursuit of reality, not the pursuit of your fondness for blurry things. And alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen (not the same as the texture coordinate gradients used for AF). This is basically next to impossible, and there is no way in hell developers will spend so much time for each shader that they have with CMP or CND. In the rare cases it is possible, you'll need lots of computational power, requiring extra cycles.

If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down.

For example, think of using the Mandelbrot set (to a limited number of iterations) as a texture. You have to supersample it, plain and simple. No texture tricks will help you, because there are no textures.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. What you are describing is basically just a performance optimization for SSAA. If you are executing a shader at multiple points in the pixel, that's effectively supersampling. Multisampling in our context is copying a single pixel shader output value for each sub-sample, and only supersampling the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on RC are wrong).
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like. The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to, say, a bush at 1600x1200, you'll need about a 1024x1024 resolution texture or so to get non-blurry edges with alpha blending. If you do this for every type of tree/bush/grass, that's a lot of space for bush textures. Not only does it consume memory, but it will slow down performance a lot, but not as much as SSAA though. The memory footprint is the main concern.

3D graphics is the pursuit of reality, not the pursuit of your fondness for blurry things. And alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen (not the same as the texture coordinate gradients used for AF). This is basically next to impossible, and there is no way in hell developers will spend so much time for each shader that they have with CMP or CND. In the rare cases it is possible, you'll need lots of computational power, requiring extra cycles.

If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down.

For example, think of using the Mandelbrot set (to a limited number of iterations) as a texture. You have to supersample it, plain and simple. No texture tricks will help you, because there are no textures.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. What you are describing is basically just a performance optimization for SSAA. If you are executing a shader at multiple points in the pixel, that's effectively supersampling. Multisampling in our context is copying a single pixel shader output value for each sub-sample, and only supersampling the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on RC are wrong).
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 resolution texture for the bush branches or so to get non-blurry edges with alpha blending. If you're allowed to walk through the bush (or a field with tall grass) and a leaf on the bush branch fills a large part of the whole screen, you'll need 16Kx16K textures or more for each bush branch or stalk of grass. If you do this for every type of tree/bush/grass, that's a lot of space just for bush textures.

A good example of this is the nature test in 3DMark2001. Each swaying branch in the trees have one fairly low-res alpha-tested texture that covers many leaves. They look like like hundreds of polygons with this effect. The same goes for the grass because the alpha-tested texture has many blades of grass on it. No matter how close the camera gets to the leaves, they don't get fuzzy edges. If you use alpha blending, you will either get blurry leaves when they get close (or even at mid-distance), or you'll need huge textures for each type of branch/grass there is, and you can see that there are a lot (it's not the same texture used everywhere). Even with the huge texture requirements of the CodeCreatures benchmark, grass still gets blurry when close.

3D graphics is generally the pursuit of reality, not the pursuit of your fondness for blurry things. Alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need lots of computational power, requiring extra cycles.

If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down due to computational requirements.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. If you are executing a shader at multiple points in the pixel, that's supersampling. Multisampling in our context is using the same single pixel shader output value for each sub-sample, and only taking extra samples from the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

As for "doing supersampling before outputting the pixel", I have no idea what you're talking about. Even multisampling requires a full size frame-buffer, although you can compress it better through various techniques. Complex pixel shaders with branching would rarely be bandwidth limited anyway, because they take so many cycles to complete.


I forgot about one other important situation: dependent texture reads. Using bumped cube-mapping can cause a lot of aliasing, especially since you can't filter normal maps without creating an incorrect, hacked image. You can use the 4 reflection rays from a 2x2 block to select a mip-map from the cube texture, and maybe even do aniso with the reflection rays, but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection. Other pixels shaders with different uses of dependent texture reads can't be solved by this. The only thing you can do is supersample.

I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on Reactor Critical are wrong).
 
Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,

Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.

50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.

Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review

That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.

However, your point about anisotropic filtering is completely valid.

First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?

Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.

Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.

As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 resolution texture for the bush branches or so to get non-blurry edges with alpha blending. If you're allowed to walk through the bush (or a field with tall grass) and a leaf on the bush branch fills a large part of the whole screen, you'll need 16Kx16K textures or more for each bush branch or stalk of grass. If you do this for every type of tree/bush/grass, that's a lot of space just for bush textures.

A good example of this is the nature test in 3DMark2001. Each swaying branch in the trees have one fairly low-res alpha-tested texture that covers many leaves. They look like like hundreds of polygons with this effect. The same goes for the grass because the alpha-tested texture has many blades of grass on it. No matter how close the camera gets to the leaves, they don't get fuzzy edges. If you use alpha blending, you will either get blurry leaves when they get close (or even at mid-distance), or you'll need huge textures for each type of branch/grass there is, and you can see that there are a lot (it's not the same texture used everywhere). Even with the huge texture requirements of the CodeCreatures benchmark, grass still gets blurry when close.

3D graphics is generally the pursuit of reality, not the pursuit of your fondness for blurry things. Alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.

Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.

If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.

Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.

A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need lots of computational power, requiring extra cycles.

If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down due to computational requirements.

Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.

Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.

Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. If you are executing a shader at multiple points in the pixel, that's supersampling. Multisampling in our context is using the same single pixel shader output value for each sub-sample, and only taking extra samples from the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.

As for "doing supersampling before outputting the pixel", I have no idea what you're talking about. Even multisampling requires a full size frame-buffer, although you can compress it better through various techniques. Complex pixel shaders with branching would rarely be bandwidth limited anyway, because they take so many cycles to complete.


I forgot about one other important situation: dependent texture reads. Using bumped cube-mapping can cause a lot of aliasing, especially since you can't filter normal maps without creating an incorrect, hacked image. You can use the 4 reflection rays from a 2x2 block to select a mip-map from the cube texture, and maybe even do aniso with the reflection rays, but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection. Other pixels shaders with different uses of dependent texture reads can't be solved by this. The only thing you can do is supersample.

I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on Reactor Critical are wrong).
 
Back
Top