How can AF be implemented effectively on consoles/RSX?

darkblu said:
year 2000 was the last year of the past millennium. because years count started from 1AD, not 0AD. so the year number is actually an index, 1st year, 2nd year, etc, etc, 2000th year - the last index of the 2nd millennium.

That is not actually always true. The counting of 0 or not count of 0 is a tricky issue in ancient chronology (a lot of my academic studies have been focused on BCE history). The Gregorian calendar may have "started" at 1AD when created, but when other chronologists back filled data did they go from 1 BCE to 1 CE or did they add the 0 year? There are a lot of counting problems with this system when trying to align reignal years to a new arbitrary system based on the (wrong!) birthdate of a historical figure and frequently disparities of 1 year are caused by the various treatments of the 0 year by later historians.

As for the millenium proper or popular, I will defer to Wiki and the entire debate although I personally don't know anyone who celebrated 2001 as the beginning of the new millenium. If you have hard feelings one way or another I can conceed the 2000 year.

So let me ask: What is the motivation for disregarding the PS2 within the context of discussing texture filtering on 2nd generation 3D Consoles? There is no reason not to include it in such a comparison, specifically because it does tell us about what is acceptible as standard on consoles.

On Topic: And no, the Xbox 360 does not have worse filtering issues compared to the GCN. Like I said, I cannot say much for all the Xbox1 titles, but the MP ones I have played were pretty nasty. Of the half dozen 360 titles I have played I cannot say any were worse than my GCN's best games. I am sure there are exceptions on both extremes (360's worse, GCN's best), but saying:

"none of them come close" to having as many texture filtering IQ issues as the 360 is pretty laughable. Maybe others don't have their old consoles hooked up, but I have my GCN on a CRT and run it in Progressive Scan. Of course what else can I say? Besides going through a library of 500 games and screenshots to "prove" there is an issue? Based on post history I can say right now that is a losing battle. But most would agree last gens consoles had some poor texture IQ.

Not that I am defending the lack of AF or low levels of AF on many 360 titles. It is lame and AF adds a LARGE level of quality to an image. As the PC is my primary platform for gaming I quite love my AF and the R5x0 series HQ AF is a big selling point to me on the PC end of things. Yet it has been pointed out not every bad shot has been a texture issue, and ERP has said he has not heard of any issues with Xenos AF. Some games use it well, and some do not. So the continued push to blame the hardware seems premature as I stated before.
 
Ben Skywalker said:
You have an interesting take on balance Faf, not a critique at all- but you certainly don't have the same perspective that a lot of others do. Absolutely no worries about FB bandwidth, and multi pass bilinear...... and that doesn't seem unbalanced to you?
Bilinear(and Trilinear) were single pass - I think you're confusing things with DC here (which used 2-passes for trilinear). The 2xFill for untextured writes is not really different then NVidia/ATI boosting speed for their Zixel fills, or should we call lack of balance on PC chips because they optimize for special cases? Not to mention both ATI/NVidia also run FP16 at half speed.

I guess where I would argue lack of balance is on two of the tradeoffs (blendmodes and mipmapping implementation). If those were done right, the IQ argument would be gone for other platforms.
 
Sorry, darkblu, missed your post.
darkblu said:
ok, i see now, i originally though you were referring to increasing the performance hit of aniso vis-a-vis aniso at limited bw, but you actually meant in relation to isotropic.
Well performance hit relative to trilinear is what matters when considering whether to enable it or not (assuming such a consideration takes place). Assume a dev doesn't like the 20% drop of aniso. If increasing GPU bandwidth gives trilinear a bigger boost than aniso (so that it's more than 20% now), is that going to make a dev more or less likely to enable aniso? He'll stay with trilinear and probably throw a bigger load at the GPU.

actually they don't have to be so heavy. if you have a simple shader doing aniso and another simple one doing trilinear you'd experience increased bw requirements in the aniso case as soon as both shaders take sufficiently similar clocks to execute (ideally same clocks)
And how exactly will a simple shader with aniso take a similar number of clocks as a simple shader with trilnear? There's a very narrow range of texture:ALU op ratios you need to make this happen and also be bandwidth bound with aniso; moreover, your texel/pixel ratio (which changes dynamically across the surface) must be just right also.

not necessarily. there's nothing wrong with a unified shader architecture where you have N universal shader units and N decoupled TMU units. if at a given moment you have 50:50 fragment vs vertex shaders, and the latter do no use any TMUs you can end up with 2 TMUs per fragment ratio.
Again, that's a very specific scenario. The point I was making was that aniso generally takes more clocks, so increased BW per pixel does not generally translate into increased bandwidth per clock. When it doesn't take more clocks, you're usually not BW bound.
 
BenSkywalker said:
Overwhelmingly those games seem to lack the problems of the 360 native titles. Burnout Revenge as an example doesn't have the kind of issues with filtering that the 360 native titles have- likely due to their native platforms(although it has what is likely the poorest HDR implementation I have ever seen, but I dgress). I would not utilize last gen ports as examples of what I'm talking about.
You're reading into my words a strict definition which was not meant. Cross-generational doesn't necessarily mean the game is on both generations of hardware but rather that the mindset used to develop it has not fully matured to the next generation level. The PS2 got this effect as well.
 
BenSkywalker said:
Actually, there were several that utilized more then two. Giants, Sacrifice and Evolva spring quickly to mind. It didn't seem to help out the R100 much at all though IIRC.
Are you sure those were 3 textures and not 4? Given the dominance of the Geforces at that time (and other dual texturing architectures), three textures would be an odd number to pick. Also, they may not have even coded for more than 2 textures per pass for the same reason. Anyway, like I said, many factors are different now that make 3:1 TMU:ROP no issue at all.

Six ROPs based on those numbers. They peak when having 2x MSAA enabled too- no matter how perfect the compression implementation you aren't going to gain bandwidth enabling MSAA.

As far as the filtering on Xenos- what platform released this millenium has more texture aliasing then the 360? I own them all, and I certainly haven't seen one that comes close. You take a title like 'The Outfit'(using that as I was just playing for a few hours)- it has serious texture aliasing issues for the first couple of mip levels, then goes blurry all while using bilinear filtering. If the 360 is supposed to be so developer friendly- why are they having these issues?
Because some people are idiots who think an LOD boost looks good? I saw this with the NFS games to make the road look "clearer" out of the box, and it looked downright ugly. I even saw an NVidia driver tweaker make a set of drivers this way that were labelled as having crystal clear image quality.

We started benchmarking AF on the PC back in 2001, right? How long did it take for PC devs to put an AF option in the game? This was for a PC audience, too, where a large portion of game buyers actually know what it is.

Unless someone with a XB360 dev kit tells us that AF runs at one quarter the speed of an X800XT, I refuse to believe there is a hardware problem. I used AF globally on R300 at 1600x1200. Half the texturing units, half the clockspeed, twice the resolution. Translation? AF is doable with one eighth the texturing power per pixel. ATI would have had to make a huge blunder for AF not to be doable.

There's every reason to believe this is a dev problem and none to believe it is a hardware problem.
 
Mintmaster said:
Sorry, darkblu, missed your post.

absolutely no hard feelings ; )

Mintmaster said:
Well performance hit relative to trilinear is what matters when considering whether to enable it or not (assuming such a consideration takes place). Assume a dev doesn't like the 20% drop of aniso. If increasing GPU bandwidth gives trilinear a bigger boost than aniso (so that it's more than 20% now), is that going to make a dev more or less likely to enable aniso? He'll stay with trilinear and probably throw a bigger load at the GPU.

actually the way it works is, you have your scene assets and you have your shading/sampling requirements, the work of the graphics engine dev is to make those somehow meet at the desired framerate, if possible. so no, when you get more BW you don't necesserily say, 'hey, i can throw in more trilinear!', much more likely you'd say 'phew, we actually may make it to the target fps with the right shading/sampling!' so an increase of BW that alleviates your AF may turn out to be the difference between roads with trilinear and roads with AF in your title (versus more roads with trilinear ; )

And how exactly will a simple shader with aniso take a similar number of clocks as a simple shader with trilnear?

by using the extra clocks in the isotropic shader. let's return to your hypotetical architecture example - 1 clock per trilinear and 2 clocks per aniso sampler. you can always get your isotropic shader to do something for that extra clock, say, just use a fancy op that does not co-issue so nicely and voila - your isotropic shader is clocks-equal to your anisotropic shader. but the latter eats more BW.

There's a very narrow range of texture:ALU op ratios you need to make this happen and also be bandwidth bound with aniso; moreover, your texel/pixel ratio (which changes dynamically across the surface) must be just right also.

yes, you're right about the degree-of-aniso factor. but the thing is, you don't have to deliberately provide for that, you just have to have a statistical case that meets this scenario, i.e. 'statistically i have a sufficient number of fragments in my scene that take a 2nd degree aniso.'

you'd be surprised how often the 'just right' alignemnt happens on certain GPUs that you become BW/latencies limited. all you need is an UMA ; )

Again, that's a very specific scenario. The point I was making was that aniso generally takes more clocks, so increased BW per pixel does not generally translate into increased bandwidth per clock. When it doesn't take more clocks, you're usually not BW bound.

aniso implementations so far may have been genererally taking more clocks. but this tells nothing about the nature of the algorithm, as there's no fundametal principle that states aniso takes more clocks (the fact that it's computationally more complex does not automatically translate to more clocks), yet there's one that says aniso takes more samples. ergo higher BW per fragment. how you speread that across clocks is totally irrelevant to the algorithm, as long as you don't try to get more BW than the hw can deliver ATM. at the end of the day you want your fragments to meet their deadlines. whether you have infinite BW that you utilize in one clock and then do ALUs or you speread BW across clocks is a detail of GPU and setup specifics. ergo largely a matter of semantics.
 
Last edited by a moderator:
darkblu, you say "the way it works is..." and "when you get more BW", but how often does a console get a sudden unexpected specification boost in bandwidth per clock? ;)

Consider two independent scenarios where you missed your target, where scenario 1 got a 20% boost by disabling AF and scenario 2 got a 30% boost by disabling it. They have different targets (independent scenarios), but both miss by the same amount. Scenario 2 is not more likely to keep the feature.

darkblu said:
'statistically i have a sufficient number of fragments in my scene that take a 2nd degree aniso.'

you'd be surprised how often the 'just right' alignemnt happens on certain GPUs that you become BW/latencies limited. all you need is an UMA ; )
We're not looking for BW limited cases, because those happen all the time. We're looking for cases where aniso doesn't increase clock cycles much (i.e. math heavy shaders or weird scheduling) and is also BW bound (i.e. texture heavy or pretty short). Only then will you increase BW per clock. For UMA to affect the argument, you need a BW heavy CPU task to kick in simultaneously which is not alleviated by the increase in BW that we're talking about in the first place.

Yes, in the worst case maybe statistically 5% of BW bound pixels meet the tight simultaneous conditions set out above that the shaders consume more BW per clock with AF. But remember that statistically maybe 80% covers all the other BW bound pixels that consume less BW per clock with AF enabled. So you can see that statistics actually favours my stance. You could be right in some contrived case, but not for a real world statistical distribution.

EDIT: <smacks head> Stupid me, why argue when the data is there?
NoAF, 8xAF. Radeon 9700: 10% hit. 9500 Pro: 6% hit.
NoAF, 8xAF. Radeon 9700: 37% hit. 9500 Pro: 30% hit.

Regarding aniso implementations, sure, you could be foolish enough to add 80% of the logic needed for double speed texturing whilst only getting double speed AF on pixels that need it. ;) Let's say you get 100 fps with trilinear and 50 fps with aniso (i.e. a brutal texture heavy case). Doubling the texture units gets you up to, say, 90fps. On average, doubling aniso speed halves the extra cycles, so you're up to 66fps. Which looks like a more effective use of transistors? I really doubt you'll see any IHV do this, doubt it even more so than single-cycle trilinear.

I was thinking that this was a rather lengthy diversion from that on-the-side quote of mine, but this last paragraph relates well to the topic on hand. There's nothing we can really do about the performance hit of AF. Just use it selectively.
 
Last edited by a moderator:
Mintmaster said:
EDIT: <smacks head> Stupid me, why argue when the data is there?
NoAF, 8xAF. Radeon 9700: 10% hit. 9500 Pro: 6% hit.
NoAF, 8xAF. Radeon 9700: 37% hit. 9500 Pro: 30% hit.
Note that you are comparing the difference un-vsynced average framrates here, which is a much different situation than vsynced the locked at 30fps or 60fps standards which console games strive for. Benchmarks showing differences in minimum framerates would be much more relevant to the topic at hand.
 
So let me ask: What is the motivation for disregarding the PS2 within the context of discussing texture filtering on 2nd generation 3D Consoles? There is no reason not to include it in such a comparison, specifically because it does tell us about what is acceptible as standard on consoles.

A couple of different reasons. First off is the GPU vs rasterizer element. The functionality of the GS is not comparable to the NV2A or Flipper, and it isn't fair to compare them. Next is the PSX lacked 3D hardware. Most people consider the dawn of PC 3D gaming to have started with GLQuake- despite that there were numerous software based titles that predated it. Some incredibly talented developers pulled off some amazing things with the PS2, but they have a very hard time standing up to Wreckless on the XBox which seemed to be mainly composed of sample code copy and pasted from nV's dev rel site.

On Topic: And no, the Xbox 360 does not have worse filtering issues compared to the GCN. Like I said, I cannot say much for all the Xbox1 titles, but the MP ones I have played were pretty nasty. Of the half dozen 360 titles I have played I cannot say any were worse than my GCN's best games. I am sure there are exceptions on both extremes (360's worse, GCN's best), but saying:

I have all the consoles hooked up to the same HDTV right now- I stand by my statement. The aliasing on the 360 is significantly worse overall then either the GC or original XB. Yesterday I picked up Dynasty Warriors5(there is a serious lack of titles on the 360 right now) and it lacks issues with aliasing all together. Of course, it uses decidedly last gen graphics(although no slowdown with ~50-100 characters on screen is nice). Currently I have 15 games for my 360 and the overwhelming majority of them have serious issues with texture aliasing.

But most would agree last gens consoles had some poor texture IQ.

I'll quote myself here-

After that the GCN and XBox have horrific texture filtering without a doubt- what they lack is the amount of texture aliasing that the 360 exhibits in numerous titles.

I have not said that last gens consoles had great texture IQ, what I have been talking about is the need for AF. This applies to the PS3 and Wii as well as the 360. The difference is right now we have the 360 along with a library of games to see what it is doing. As of this point, I would say that it is by far its weakest area(the lack of AF).

Are you sure those were 3 textures and not 4?

AFAIK, there was base, lightmap and Dot3. It is possible they were set to utilize two TMUs per pass but IIRC PlanetMoon stated that they did set up the game to run using 3TMUs if available(Giants).

Unless someone with a XB360 dev kit tells us that AF runs at one quarter the speed of an X800XT, I refuse to believe there is a hardware problem.

If its' performance hit were comparable to AA why wouldn't MS be telling devs to turn it on? I'm not saying there is a catastrophic performance hit, but look at the R3x0 benches, a ~30% performance hit using 16x adaptive AF- 8 ROP parts don't have the best track record in terms of performance when utilizing AF.
 
BenSkywalker said:
I'm not saying there is a catastrophic performance hit, but look at the R3x0 benches, a ~30% performance hit using 16x adaptive AF- 8 ROP parts don't have the best track record in terms of performance when utilizing AF.
Check my previous post. ;)
 
kyleb said:
Note that you are comparing the difference un-vsynced average framrates here, which is a much different situation than vsynced the locked at 30fps or 60fps standards which console games strive for. Benchmarks showing differences in minimum framerates would be much more relevant to the topic at hand.
Okay, kyle, you're just being silly. Do you honestly believe that the AF % hit for minimum framerate is so much less on the 9700 than the 9500P? Even when it's the opposite for average framerate? Minimum framerates aren't caused by the bizzare set of conditions that darkblu is suggesting. They're caused by a higher workload and more pixels of all sorts. I'd bet you $1000 you couldn't find me a game where all the following hold:
A) The %age performance hit of AF on average framerate is more on the 9700
B) The %age performance hit of AF on minimum framerate is more on the 9500 Pro
C) A and B happened reproducibly and well outside the margin of error.

The AF hit is always a bit more for higher bandwidth architectures. Here's another example (6600GT vs. 6800). The reason is simple: If you're throttled by BW then it's more likely you have texture units sitting idle when doing trilinear.

Oh yeah, and vsync is irrelevant if devs just enable triple buffering and find a way to manage with 3.7MB less space. ;)
 
Last edited by a moderator:
I just want to say that hardocp has a lot of reviews where they list min and max.

For instance their 6600 review

1107740308vQr0RJ0aPh_3_3_l.gif


I'm not sure if that helps with the discussion.
 
BenSkywalker said:
AFAIK, there was base, lightmap and Dot3. It is possible they were set to utilize two TMUs per pass but IIRC PlanetMoon stated that they did set up the game to run using 3TMUs if available(Giants).
Okay, googling turned up these:
http://www.aceshardware.com/Spades/read.php?article_id=25000229
http://gear.ign.com/articles/316/316226p1.html
So R100 does have a substantial advantage over NV15 in triple-textured scenarios. In any case, these games didn't use 3 textures everywhere, so it made sense to abandon it, especially with the loopback ability of R200 and NV30 onwards. More importantly, there's no reason for RSX to be at any disadvatage for a 3:1 TMU:ROP ratio. Remember how small G71 is and how its pipelines can dual issue when not texturing, so it doesn't make sense to remove texture units when they worked so hard to make them compact and multitasking.

BenSkywalker said:
If its' performance hit were comparable to AA why wouldn't MS be telling devs to turn it on?
Nobody said the hit should be comparable to AA. They're two different things. You can remove the performance for AA with simple architecture changes. You can't do that for 16xAF unless you make it horribly inefficient (transistor-wise) for every pixel needing less than 16xAF. The MSAA workload is constant for each pixel on the screen (through BW can change). The AF workload changes dynamically. AA and AF are two fundamentally different things. If the dev and hardware don't fool around with LOD, then AF will provide no better resistance against aliasing than trilinear. This seems to be a big misunderstanding you have. AF provides more detail, not less aliasing.

In fact, a lot of people on these boards are complaining about NV4x/G7x's texture aliasing with AF enabled. IMO NVidia does it to reduce cycle count for their non-decoupled TMUs, which probably explains why they don't rape the TMU-deficient ATI parts, esp. with AF enabled. I'm not really sure though.

BenSkywalker said:
I'm not saying there is a catastrophic performance hit, but look at the R3x0 benches, a ~30% performance hit using 16x adaptive AF- 8 ROP parts don't have the best track record in terms of performance when utilizing AF.
How many freaking times do we have to tell you ROPs have absolutely nothing to do with it? Go look up some 6600GT benchmarks (e.g. NoAF, 8xAF, 4% drop). A whopping FOUR pixels per clock, the same as the original Geforce.
 
Mintmaster said:
Okay, kyle, you're just being silly. Do you honestly believe that the AF % hit for minimum framerate is so much less on the 9700 than the 9500P?
I'm not being silly, I am speaking honestly from the experience of having used a 9700pro and playing many games both with and without AF. In that experience I found that while AF often had much effect on my maximum framerate and hence my average framerate as well, in many cases it made little to no effect on my minimum framerate.
Mintmaster said:
Even when it's the opposite for average framerate? Minimum framerates aren't caused by the bizzare set of conditions that darkblu is suggesting. They're caused by a higher workload and more pixels of all sorts. I'd bet you $1000 you couldn't find me a game where all the following hold:
A) The %age performance hit of AF on average framerate is more on the 9700
B) The %age performance hit of AF on minimum framerate is more on the 9500 Pro
C) A and B happened reproducibly and well outside the margin of error.

The AF hit is always a bit more for higher bandwidth architectures. Here's another example (6600GT vs. 6800). The reason is simple: If you're throttled by BW then it's more likely you have texture units sitting idle when doing trilinear.
I respect what you are saying here, but if you are throttled by BW then wouldn't AF also inflate that problem? Regardless, my point was to contest your point, but rather simply to point out that minimum framerates are more relevant to performance considerations in console games.
Mintmaster said:
Oh yeah, and vsync is irrelevant if devs just enable triple buffering and find a way to manage with 3.7MB less space. ;)
Triple buffering doesn't change the fact that vsync caps the framerate, and that is all was referring to in regards to vsync.
 
Looks like the Giants situation ended up looking pretty good for the R100- the benches I was thinking of was from when the game launched(could have been PlanetMoon patched 3TMU support and that is why it was discussed, it has been awhile). The game does utilize three layers on pretty much all surfaces in the game(though obviously the sky doesn't).

You can't do that for 16xAF unless you make it horribly inefficient (transistor-wise) for every pixel needing less than 16xAF.

Are you honestly saying that having several MBs of eDRAM is efficient transistor wise? It is for what they are doing, if they had budgeted a hundred million or so to AF instead we certainly could have seen free AF. That was their design decission and I understand why they did it- but to discount the staggering cost of eDRAM in terms of transistors isn't an effective approach at looking at the situation. AA workload also changes dynamically, but you still need to have the maximum space possible for non compressed ZBuffer. We are used to considering worst case when talking about AA- and if we were talking about some sort of FSAA then it would have the side benefit of additional filtering for pixels(not to mention a known workload requirement that doesn't alternate as do MSAA and AF).

In fact, a lot of people on these boards are complaining about NV4x/G7x's texture aliasing with AF enabled.

ATi has the same thing going for them with all of their parts since the R100(I see it every time I fire up a game on my main rig- although it is much better then it was while nV is much worse then they were). What I'm talking about with the 360 is the near proximity mip levels, those are the problematic ones(although as I mentioned, not in every game).

How many freaking times do we have to tell you ROPs have absolutely nothing to do with it? Go look up some 6600GT benchmarks (e.g. NoAF, 8xAF, 4% drop). A whopping FOUR pixels per clock, the same as the original Geforce.

http://www.beyond3d.com/reviews/bfg/6600gtoc/index.php?p=12

Or a ~25% drop compared to 16ROP part-

http://www.beyond3d.com/previews/nvidia/g70/index.php?p=19

You are honestly going to say that the amount of ROPs has nothing to do with AF performance all else being equal? If what you are saying is that they redesign the entire chip around the hypothetical design that many here have envisioned that it won't make much of a difference then of course. But at this point is there anything that the people here have not changed from the available NV4X parts when discussing RSX? Pretty much the entire configuration has been reworked, is that what people are honestly expecting for RSX at this point?
 
Are you honestly saying that having several MBs of eDRAM is efficient transistor wise? It is for what they are doing, if they had budgeted a hundred million or so to AF instead we certainly could have seen free AF.
You just don't get it. Any architecture with free AF is just stupid. It's like a 300HP pickup truck that only uses 100HP if not towing, simply so that the 0-60 time stays at 15 seconds whether towing or not. If you can take 4 free extra samples for AF, then you have 80% of the hardware needed to sample 4 different textures without AF. But instead, you're limiting yourself to only one sample without AF.

You mentioned something about FSAA as opposed to MSAA as well (implying supersampling, I guess). Making supersampling free is also very stupid. You have to multiply every pixel-related part of the hardware by 4 to get free 4xSSAA, which means you're equipped to do No-SSAA four times as fast also. Why would you restrict the No-SSAA rendering to one quarter the speed the hardware is capable of?

Regarding eDRAM: Let's say MS didn't put it on there and used those 105M trannies for more shading power. Now all the data transfer - framebuffer, vertex, texture, CPU - is going over a tiny 128-bit bus. The shading power is probably useless. It should be like the PS3, you say? So now we need a couple hundred more pins on the CPU (making it harder to shrink also) for the second bus, another memory controller on the CPU, another bus on the mainboard, compression logic on Xenos, lower yeilds from a bigger chip instead of two smaller ones, etc. Even with all that, you're still not guaranteed to have better performance. It is not clear at all if an equally performing system without eDRAM would be any cheaper, and I'd guess no. Remember that eDRAM is not just for AA. AA gets a bigger benefit, but regular rendering consumes a lot of bandwidth as well and also gets a big boost.

AA workload also changes dynamically
Not really. Every pixel needs to z-test each subsample. Every pixel needs to blend each subsample, unless the colour is the same for all subsamples in destination and source. Every block of pixels needs to be checked for compression suitability. A few parts of MSAA hardware can change utilization by a factor of 4 during rendering, but most of it doesn't. With AF, the workload changes by a factor of up to 16 for everything except the inital AF calculation (which is trivial in size compared to the texture samplers). Making AF free means you'll have a lot of hardware sitting idle a lot of the time.


Let me get this straight. You're comparing the AF hit of a 6600GT in Doom3 with that of a 7800GTX in 3DMark03? :LOL: And from that you conclude more ROP's decreases the AF hit?!?

WTF are you smoking? That's like saying a Civic slows down in mud more than a Lancer Evo because the latter has a bigger exhaust. With this kind of conclusion jumping it's a good thing you aren't a scientist or a politician. I hope...

You are honestly going to say that the amount of ROPs has nothing to do with AF performance all else being equal?
Yes, I am. Dave is saying the same. Every hardware designer will tell you the same. If anything, more ROPs will increase the performance hit because a deficiency of ROPs could mask the extra cycles of AF. And how the hell "is all else equal" in your example? The benchmark is different, the shading pipelines are different, the memory bus is different...
 
kyleb said:
I'm not being silly, I am speaking honestly from the experience of having used a 9700pro and playing many games both with and without AF. In that experience I found that while AF often had much effect on my maximum framerate and hence my average framerate as well, in many cases it made little to no effect on my minimum framerate.
You have to understand the purpose of the benchmarks I put. It's not to show the impact of AF on playability; rather, it's to show the impact of bandwidth on AF hit. The 9700 and 9500PRO are identical in all ways except bandwidth. Discussing minimum framerates as opposed to maximum framerates is quite irrelevant in this context.

I respect what you are saying here, but if you are throttled by BW then wouldn't AF also inflate that problem?
I think you should take a closer look at the second paragraph of this post. In an alternate universe with a higher bandwidth XB360, devs would probably be less inclined to enable AF.

This is the main point: For all hardware ever made by ATI and NVidia that supports AF, a higher bandwidth product would not have a lower performance hit for AF.
 
Mintmaster, your posts are very interesting and the 97/95pro benchmarks were certainly enlightening, but there is one thing I don't understand (honestly!). If I read your last two posts correctly, you are arguing that a) Adding ROPs will increase the relative performance cost of AF and b) increasing bandwidth will also increase the relative performance cost of AF.

Your arguments seem sound overall, but what, then, will that mean for the future? Are w looking at ever increasing (relative) costs of AF with increasing hardware capabilities. Is there any hardware part on GPUs (except making AF cost less cycles in hardware) that can be improved and will actually decrease the AF hit?

Hmm, after thinking about it some more, you don't say anything about increasing both the number of ROPs and the available bandwidth. In that case the relative cost would just stay the same, so I guess that answers my earlier questions ;)
 
I am confused aswell a little bit.. If you increase BW all things being equal or whatever.. can latency actually increase?
 
The simplest way to make the hit for AF more acceptable is to get rid of fixed function units, it's the static balance between shaders and samplers which amplifies the problem.
 
Last edited by a moderator:
Back
Top