Mintmaster said:
Yeah, but even NVidia said that compression is not very significant without AA. With AA, compression only reduces the increase. When you add everything together, even with fairly ideal 4:1 Z compression you need 48 bits written per normal 3D pixel. Alpha adds another 32-bits, and textures can also be significant although compression helps a lot. Finally, getting 100% out of your memory controller is nearly impossible. There will be very isolated circumstances where NVidia will be able to achieve 8 pix per clock.
You forgot to include that it isn't going to be overly-common for a pixel to actually be written each and every clock on the GeForce FX. For DOOM3, for example, one pixel will be written each clock (assuming perfect efficiency in memory bandwidth, etc...) only when doing the initial z-only pass, which will take very little memory bandwidth. For this game, essentially every other pixel will have many textures applied, meaning it will take many clocks to calculate.
In other words, what you're describing is only a problem if what is being written is single-textured trilinear-filtered polygons (no anisotropic). This is just not the case today, and I see no reason for it to be the case often in the future.
True, but in this case 4x2 would have saved die space. Also, how many of today's games use 5-6 textures at once? Today's games rarely use more than two, and a lot even use just one for the majority of the pixels.
But 4x2 would have been less efficient, primarily for DOOM3 (or any game in the future that will do an initial z pass).
And most games today use at least two textures per pass, with many of the more recent ones using far more (Serious Sam, UT2K3, for example).
I think you are completely wrong about the Geforce4 not being bandwidth limited with 2xFSAA. If you were right, 2xFSAA would hardly have any performance hit.
It doesn't.
Also, the Geforce4's FSAA scores nearly halve going from 2xFSAA to 4xFSAA, unless you are CPU/T&L bound. 4xFSAA doubles the Z and colour buffer bandwidth compared to 2xFSAA, so the bandwidth requirements are nearly doubled. Coupling these two facts, both 2xFSAA and 4xFSAA equally saturate the memory bandwidth, as excess bandwidth at 2xFSAA would prevent such a proportional pattern.
Okay, so the GeForce4 begins to be bandwidth limited at 2x FSAA. The point still stands. For any game that uses more than a single texture per pixel, the GeForce FX will be no less efficient in using its memory bandwidth than the GeForce4. And with the improved compression with FSAA, it will be quite a bit more efficient.