Chalnoth said:
Mintmaster said:
First of all, the main reason I didn't like MSAA on GF4 was because it had nearly the same performance hit as SSAA,
Try looking again. The MSAA on the GF4 has a much lower performance hit than SSAA, usually around 50% for 4x FSAA. Additionally, the performance delta is increased when you enable anisotropic filtering.
50% for 4xMSAA on the GF4? Yeah, sure, when the no-FSAA score is CPU or TCL limited. You have to make a fair comparison.
Q3 scores at 1600x1200: 138.8 fps without FSAA, 41.5 fps with FSAA.
Source: Tom's Hardware's Parhelia review
That's a 70% performance hit. If you did SSAA with the same RAMDAC blending that the Geforce4 has, that would be a 75% hit (i.e. 1/4 performance). The Radeon 8500 has a serious performance hit with SSAA because HyperZ gets disabled, so you can't really compare it's scores with GF4's MSAA.
However, your point about anisotropic filtering is completely valid.
First, you have alpha compare tests. Now, Chalnoth, I have heard you repeatedly say you can just use alpha blending instead. Do you know what grass looks like with alpha blending when you're up close?
Yes, I have. It looks better than when there is an alpha test, because at the edges of the alpha test, instead of having a relatively smooth (although blurry) border, you will see "rounded" edges of each texture pixel. And using larger textures isn't unreasonable. It's being done.
Huh? Rounded edges are generally good. Real leaves are round and opaque at the edges, not blurry and step-like (if you alpha-blend the same alpha-tested texture, you'd get a blurry step-like border, not smooth). The same goes for grass. You can also make the alpha test give you square edges if you set the compare value right.
As for larger textures being reasonable, think a bit more practically. If you want alpha-blended grass not to get blurry when you get near it, texel spacing has to be about the same as screen resolution. That means if you get close to a bush that fills the screen at 1600x1200, you'll need about a 1024x1024 resolution texture for the bush branches or so to get non-blurry edges with alpha blending. If you're allowed to walk through the bush (or a field with tall grass) and a leaf on the bush branch fills a large part of the whole screen, you'll need 16Kx16K textures or more for each bush branch or stalk of grass. If you do this for every type of tree/bush/grass, that's a lot of space just for bush textures.
A good example of this is the nature test in 3DMark2001. Each swaying branch in the trees have one fairly low-res alpha-tested texture that covers many leaves. They look like like hundreds of polygons with this effect. The same goes for the grass because the alpha-tested texture has many blades of grass on it. No matter how close the camera gets to the leaves, they don't get fuzzy edges. If you use alpha blending, you will either get blurry leaves when they get close (or even at mid-distance), or you'll need huge textures for each type of branch/grass there is, and you can see that there are a lot (it's not the same texture used everywhere). Even with the huge texture requirements of the CodeCreatures benchmark, grass still gets blurry when close.
3D graphics is generally the pursuit of reality, not the pursuit of your fondness for blurry things. Alpha tests are quite essential in representing things realistically and cheaply. The only substitute for the same effect is a bunch of polygons, which is very expensive performance-wise.
Next you have pixel shaders with CMP/CND, which result in the same hard edges. Some papers have suggested the idea of a smooth-step function rather than the discontinuous CMP/CND, but that needs to have a dynamically changing slope according to how big the texture is relative to on screen pixels, or you get that same blur when close. You then wind up with a complex pixel shader that may reduce performance to SSAA levels anyway.
If it "reduces performance to SSAA levels anyway" then adding SSAA will make the performance unplayable.
Edit: Sorry, misunderstood the post. No, this won't reduce performance to "SSAA levels anyway." If all the calculation is done within the shader, there is less of a memory bandwidth hit, and no need to couple edge AA with texture AA. Additionally, if only part of the scene would benefit from this filtering, the performance would be significantly higher than with SSAA because only part of the scene would use the additional sampling.
A dynamic smoothstep function is not easy to implement at all - I don't even think you know what I mean. You would have to factor in the slope of the polygon, the distance from the camera, the resolution, and the gradients of the functions being compared with respect to the screen. This is basically next to impossible, and there is no way in hell developers will spend so much time for every shader with CMP or CND. In the rare cases it is possible, you'll need lots of computational power, requiring extra cycles.
If you are using CND or CMP, MSAA can't produce the same image as SSAA. Period. Replacing the CND or CMP with a non-aliasing function requires way too much effort, isn't robust, and slows things down due to computational requirements.
Chalnoth, you were also whining about how the 9700 doesn't have true branching in the pixel shader. Well, if you were using dynamic branching, you'd have aliased edges everywhere with MSAA, except for certain situations. You could program around it, but again it's quite hard.
Again, it's still more efficient to just do it before pixel output. For example, it may be possible (don't know if it is on the NV30 or not) to go ahead and take multiple texture samples, effectively doing super-sampling before outputting the pixel.
Now you are basically making arguments for SSAA. SSAA doesn't necessarily mean the entire screen. It just means supersample antialiasing, and so it can by done dynamically/selectively. If you are executing a shader at multiple points in the pixel, that's supersampling. Multisampling in our context is using the same single pixel shader output value for each sub-sample, and only taking extra samples from the Z buffer. It saves fillrate in this way, but doing multiple pixel shader runs as you imply doesn't save fillrate, so you're back to doing supersampling.
As for "doing supersampling before outputting the pixel", I have no idea what you're talking about. Even multisampling requires a full size frame-buffer, although you can compress it better through various techniques. Complex pixel shaders with branching would rarely be bandwidth limited anyway, because they take so many cycles to complete.
I forgot about one other important situation: dependent texture reads. Using bumped cube-mapping can cause a lot of aliasing, especially since you can't filter normal maps without creating an incorrect, hacked image. You can use the 4 reflection rays from a 2x2 block to select a mip-map from the cube texture, and maybe even do aniso with the reflection rays, but it still isn't sufficient, since adjacent 2x2 blocks have no interaction with each other in the mip-map selection. Other pixels shaders with different uses of dependent texture reads can't be solved by this. The only thing you can do is supersample.
I'm not saying multisampling is useless - in fact, it's a very good idea that works most of the time. What I am saying is that there are situations where it's not sufficient, and has drawbacks. I'm also saying the GF4's implementation is not worth those drawbacks. However, it looks like the 9700 did it right, or very close to right. NV30 may be even better (assuming those 4xFSAA performance estimates on Reactor Critical are wrong).