Movieman:
Welcome aboard!
Unfortunately, your post is almost entirely incorrect.
(Don't worry, it still happens to me quite a lot
)
movieman said:
For AF, yes, since that's almost entirely down to memory bandwidth.
No. It varies based on scene characteristics and the behavior of the texture cache, but AF is generally more of a fillrate hit than a bandwidth hit. Quick explanation:
Modern GPUs are all capable of taking one bilinear sample per pipe per clock. Applying AF (indeed, applying trilinear) therefore means spending multiple clocks on those fragments where it is needed. AF does mean fetching more texels, but assuming the texture cache is behaving as intended this doesn't necessarily mean more fetches from external memory. Meanwhile, spending multiple clocks on a single fragment means less bandwidth is being devoted to the stuff that only gets done once per fragment--z fetch, z write, and color write. As these tend to be the main consumers of DRAM bandwidth, your bandwidth requirements per clock would tend to lessen.
Except that AF also allows you to change the LOD calculation to fetch more detailed mipmaps, which does mean higher texture traffic. In general, AF tends to be at least as much if not more of a fillrate hit than a bandwidth hit, but, as I said, this can vary somewhat. What is
not true is that it is primarily a bandwidth hit.
In any case, this is partially irrelevant as AF is only applied to color textures, not to many of the textures used as inputs to shaders, nor to the procedural output of shaders. Turning on AF will impact both workload and IQ less in a fragment shader heavy scene than in a fixed-function scene.
For AA, unlikely, since without clever optimizations in the hardware the chip will have to run the pixel shaders multiple times to calculate the pixel color (e.g. 4 times for 4xAA).
If we were talking supersampling AA (SSAA), you'd be right. But we're talking multisampling (MSAA). MSAA only samples z-values multiple times per fragment; color is calculated only once. Assuming you have the requisite z-samplers and z-calculators per pipe (which it turns out is not entirely true for R3x0's 6xMSAA mode, although the impact will be very slight or nonexistent), the performance hit from MSAA is entirely a matter of bandwidth--extra z reads and writes, and extra color writes of the same color value. (Compression does help a great deal.)
As such, MSAA only antialiases the edges of joined or overlapping polygons. In general, one can't compare MSAA to SSAA directly. Rather you need to compare MSAA + AF as a team against SSAA; while SSAA does the entire scene (inefficiently), MSAA handles the edge aliasing while AF handles the texture aliasing, each more efficiently than SSAA would.
The only problem is what I alluded to before--AF only antialiases textures, not polygon interiors in general. With sufficiently advanced fragment shaders (particularly if their output is a high contrast "texture"), this can leave some areas of the screen in need of antialiasing. The solution in such cases is to build the antialiasing into the shader calculation itself; this is the way it's done for high-end offline renderers. Not that I think this issue will show up in HL2 or any other early DX9 games.
So it all depends on what the limit is: if it's vertex shaders then AA probably won't affect performance much. If it's CPU, then AA won't affect it much.
True. However, based on the performance of the NV3x cards, and what we know about their performance characteristics (i.e. they are nearly on par with their R3x0 counterparts in VS 2.0, but woefully underperform in PS 2.0), we know that neither is the case with those cards at these settings.
If it's pixel shaders, AA will probably be too slow to be playable.
Nope. MSAA won't have much impact on performance unless bandwidth becomes a problem, which it likely won't until 4xMSAA or so.
And I can't see it being CPU-limited if the difference between the FX and 9600 Pro is so high: it would have to be GPU-limited somehow.
It is clearly almost entirely GPU limited at these settings for the NV3x cards, and also, although probably less so, for the 9600 Pro. It is clearly CPU limited at these settings for the 9800 Pro. Whether something is CPU limited, GPU limited, or whathaveyou applies only to a particular combination of game, scene, platform, GPU and settings.
Equally, as someone pointed out, the difference in performance between the 9600 Pro and 9800 Pro is similar to their relative shader performance.
Not at all. The 9800 Pro is getting 1.27x the 9600 Pro's performance, but it has 1.9x the fragment shader resources. (Double the pipes at 95% the clock rate.) If the 9600 Pro is entirely fragment shader limited, we'd expect the 9800 Pro to score around 90fps at these settings with an infinitely fast CPU. If, as is more likely, the 9600 Pro is slightly CPU limited as well, the 9800 Pro's innate theoretical performance would be even higher.
Of course this is irrelevant until someone comes up with an infinitely fast CPU, but it does mean that the 9800 Pro should have a good deal of headroom to increase graphical settings without losing much if any performance.