The issue with the non-AA case is not with the output, but the inputs.
With Z/Stencils passes NV3x/NV4x effectively bypasses and the quad dispatch engine can "double pump" quads to the ROP's. ATI can't do this, but with FSAA the Z data is working at the level of AA (i.e. of 2X AA, 2X the Z buffer is created) - with AA its still working on a single quad, but with more Z detail per quad.
With Z/Stencils passes NV3x/NV4x effectively bypasses and the quad dispatch engine can "double pump" quads to the ROP's. ATI can't do this, but with FSAA the Z data is working at the level of AA (i.e. of 2X AA, 2X the Z buffer is created) - with AA its still working on a single quad, but with more Z detail per quad.