Unimpressed by Antialiasing (continued)

Now whatever you want to call it - effective fillrate, actual fillrate, bandwidth limited fillrate, real-world fillrate, whatever the **** you want to call it - the rate at which a GF4 Ti4600 outputs pixels drops 70% with MSAA

I think this is probably where the problems arise. In DX8 and DX9 titles, fillrate is/will be primarily limited by shaders (and therefore, clockspeed), rather than bandwidth. Bandwidth-limited fillrate is primarily an issue for DX7-era engines that the hardware can run at full speed.

I don't know what your math background is, but there is no way in hell you can calculate a derivative from a single value. You need at least neighbouring points to approximate the derivative. And even if NV30 by some miracle is smart enough to take neighbouring texture samples, who says we're limited to texture samples? A branching condition, if dynamic, will often be based on an mathematical expression, which can in turn involve texture inputs. Unless NV30 is given this expression, it can't calculate DDX/DDY. I think these functions are for the derivatives of the texture coordinates, but I'm not sure. I am 100% sure that you can't get the derivative of an arbitrary function from a single value.

I'm not sure what your background with hardware engineering is, but the hardware already knows neighboring values for expressions inside shaders - what do you think the other 3 pipes in a 4 pipe system (or 7 in an 8 pipe system) are calculating? This is precisely the recommended method for computing the MIP level (and line of anisotropy), extended to allow inputs from temporary registers.

Again, about my fillrate arguments:

All you've shown is that a GeForce 4, playing Quake III at 1600x1200, 32bpp (I assume) suffers a 70% performance hit with 4xMSAA.

However (hypothetically), what if ZCompression only works at resolutions less than 1600x1200? What if some option you've selected causes Z rejection to be disabled?

Unless you know exactly what the driver and hardware are doing behind the scenes, you can't derive any broad conclusions from your sample. What if 1280x1024 2xAA is significantly better optimized than 1600x1200 4x? The various execution units in the hardware interact in complex ways (which are completely undocumented to the public). Just because one arrangement of game settings causes a 70% drop in performance by enabling AA doesn't mean all will.
 
Mintmaster said:
NV30 doesn't actually branch? Well then that's exactly the same as CND/CMP. These instructions just choose the output. In this case, I don't see why you need mathematical data as input to the branch condition, since CND/CMP don't have this restriction. Still, I think future hardware will branch, but I'm not completely sure about it.

It's not a limitation...it's the nature of the issue. Consider that all input that is not stored in the form of either vertex data (same for entire primitive) or constant data will come from textures. I really don't see how data pulled from textures can be anything but a greater than/less than switch. As long as the texture that is the source of the compare is filtered, then there should be little problem with aliasing.

The smoothstep function needs to know how this function that is used in the input of CND/CMP varies across the screen.

I still don't see why this is necessary. As long as the source texture is filtered, there shouldn't be any aliasing problems with a very simple smooth step function. Can you produce a specific scenario?

Yes, supersample that surface only. I cannot agree with you more, and this what I have been saying from the beginning. Supersampling is necessary in some cases only, and on a per surface basis it will be a very useful supplement to multisampling. That is all I want - you to say that there are some circumstances where supersampling is useful. You said "There's still no reason for SSAA", and that was not correct, and is what I am illustrating for you.

There is no reason for SSAA that is downsampled in the framebuffer. Doing it within the pixel shader would be an ideal use of performance...only if it were necessary for that particular situation.
 
Back
Top