How can AF be implemented effectively on consoles/RSX?

Dave Baumann · Jun 3, 2006

BenSkywalker said:
I meant multiple cycles, not actual passes

This is normal graphics operation - every chip generated by ATI and NVIDIA that is capable of AF (with only NV10 being an abberation) only takes a Bilinear sample per cycle per texture unit and any form of increased sampling requirements (trilinear, bilinear-AF, trilinear-AF) just takes more cycles.

I would assume with the relatively speaking low raw fill rate that stalling the pipes would have a rather nasty performance hit to go with it.

In many operations you aren't going to be ROP limited - if we're to take shaders out of the equation and just look at textures and pixels then the more ROP's you have the more likely you are to be texture limited where AF is applied to all your textures.

If the chip needed extra cycles for AF then having additional ROPs could grant them 'free' AF under most circumstances.

No. If you are limited by the texture samples then you are limited by the sampling end of the chip (texture samplers) not by the pixel output end of the chip (ROPs), so having extra ROPs isn't going to make much difference in this case. If you want cheaper AF then you want more texture sampling capabilities.

Also, am I not mistaken or is the entire assumption that RSX is only 8 ROPs based on one specific output format they have listed in a dev doc? Do we know if it is accurate or not?

It has fillrate graphs which are consistent with 8 pixels at 32bpp (although descibed as bandwidth limited) and also states that FP16 write drops to 4 per cycle and FP32 drops to 2, which is consistent with 8 NV4x/G7x ROPs.

BenSkywalker · Jun 3, 2006

No. If you are limited by the texture samples then you are limited by the sampling end of the chip (texture samplers) not by the pixel output end of the chip (ROPs), so having extra ROPs isn't going to make much difference in this case. If you want cheaper AF then you want more texture sampling capabilities.

It has fillrate graphs which are consistent with 8 pixels at 32bpp (although descibed as bandwidth limited) and also states that FP16 write drops to 4 per cycle and FP32 drops to 2, which is consistent with 8 NV4x/G7x ROPs.

I know these were two seperate replies but I wanted to lump them together. The doc states that they are bandwidth limited in writes- is that why it is assumed they won't use more then 8 ROPs? This goes back to the other portion that I quoted- if you have excess ROPs idling due to limited bandwidth availability for writes then doubling the amount of cycles spent for sampling will not have a negative performance impact. The GS was imbalanced horribly in terms of its raw fill compared to what it could, by typical standards, do with it. Also- is it not likely that the RSX will need to fully emulate the GS? I would assume(though I don't know for certain) that they would want to have a comparable number of pipelines. I'm not trying to say that it will or won't happen- but I don't see why they would cripple a chip they can easily build by cutting its transistor count by so much. I could perhaps see a 12 pipe card(one quad disabled to improve yields) but it doesn't make much sense to me that Sony would decide to cripple the potential processing power of RSX to keep it balanced(as I have yet to see them design anything balanced to date for their consoles

).

Dave Baumann · Jun 3, 2006

The doc states that they are bandwidth limited in writes- is that why it is assumed they won't use more then 8 ROPs?

The fillrates numbers are one indicator, but as I said, the fact that it sates FP16 writes are at 4 per clock and FP32 2 are consistent with G7x having 8 ROPs and I don't see that the ROP structure would change for that given that it hasn't for anything else.

but I don't see why they would cripple a chip they can easily build by cutting its transistor count by so much.

It won't be crippling it. Even in the ew cases where you might be ROP limited with a 128-bit bus you'll then be bandwidth limited so there isn't a point where they would be used effectively, so theres no point in having the silicon for them.

kyleb · Jun 3, 2006

Nesh said:
Yeah but COD2 I think has AF on the PC version while the 360 doesnt. Or am I wrong?

THe PC as a platform, like the 360, has AF. Difference being that on the PC version you enable it at will where as with the 360 it was up to the developers and they used some really crappy texture filtering. I can understand why they did though as the framerate in CoD2 can get really nasty when running HD output and enabling AF would have almost certianly made that worse.

Acert93 · Jun 3, 2006

kyleb said:
I'm guessing you missunderstood him there. Developers can certianly get more efficient resuilts by appliying AF selectively, but I highly doubt that the 360 prevents anyone from apply AF globally to their games; at least not for any other reason than the fact that doing so would almost certianly make the games run slower than what they do without it.

Maybe I did misunderstand, but this is where I got the impression from:

DaveBaumman said:
Again, the difference between PC's and consoles is that PC's are mainly utilising AF in a very inefficent way and applying them across everything, but its up to the console developers to pick and choose the textures and AF levels selectively.

(Note: I ment Dave, not Mint

) Maybe Dave can clarify? We learned in 2005 that it did not have a "level" of AF (.e.g 2x, 4x, 8x, 16x) but was set per surface by the developer.

What about anisotropic filtering? What would be the standard and what other levels of anisotropic filtering will developers be able to achieve in real world situations?

Todd Holmdahl: We support a custom adaptive anisotropic filtering. So, the â€œlevelâ€ is not really relevant here.

Anyhow, browsing the old thread a comment caught my eyes from ERP which may explain some of what is going on:

ERP said:
What doesn't seem like a significant cost on a PC can be a dramatic cost on a console. PC's are rarely pushing the envelope graphically and the devs usually just let the users decide. On a console you pick a framerate and you trade things off to make it work.

Devs are making decisions to trade off polygon counts, texture layers, shader complexity, rendering features etc etc etc.

...

It should be noted that it's also something easy to turn off , and if you are having performance issues and don't have time to track down what they are, it's an easy switch to throw to ship your game.

...

I've never heard anyone complain about the excessive aniso cost on Xenos

As I mentioned above, some PS3 games lacked AF as well, yet we know G71 handles AF very well. So performance may be an issue, but it may be a bit broader of an issue as well.

kyleb · Jun 3, 2006

Dave's comment states that "its up to the console developers to pick and choose the textures and AF levels selectively" in respect to the previous portion of that sentence which states "PC's are mainly utilising AF in a very inefficent way and applying them across everything." That implies that selective AF is more efficient, which it is; but the statement does nothing to suggest that cannot be enabled globally.

As for Todd Holmdahl's comment, past the marketing speak, I read that to say:

Unlike the x4aa spoke of in your last question and which we (falsely

) claim will be required, developers can use AF how they choose (not that they neseccarly will at all

).

Inane_Dork · Jun 3, 2006

kyleb said:
...but the statement does nothing to suggest that cannot be enabled globally.

Look, if your programming department cannot figure out how to specify AF for every texture filter, you should fire them. You don't need a master switch.

kyleb · Jun 3, 2006

Note the difference between not needing and not being an option.

Shifty Geezer · Jun 4, 2006

Acert93 said:
Anyhow, browsing the old thread a comment caught my eyes from ERP which may explain some of what is going on:
...
As I mentioned above, some PS3 games lacked AF as well, yet we know G71 handles AF very well. So performance may be an issue, but it may be a bit broader of an issue as well.

Which ties in with my original query - what could be added to consoles to enable AF as standard? In the same way Xenos has eDRAM to provide AA and transparent fill, is there a technique/system that could be added to a console to add AF without adding considerable cost and without taking away from the rest of the shading performance?

Nemo80 · Jun 4, 2006

All i can see is that GRAW SinglePlayer has no AF, but AA, while Multiplayer has AF (although extremly ugly optimized patterns) but no AA (and no HDR). So i guess it definately is a performance issue - at least on Xenos.

Dave Baumann · Jun 4, 2006

kyleb said:
Dave's comment states that "its up to the console developers to pick and choose the textures and AF levels selectively" in respect to the previous portion of that sentence which states "PC's are mainly utilising AF in a very inefficent way and applying them across everything." That implies that selective AF is more efficient, which it is; but the statement does nothing to suggest that cannot be enabled globally.

The point being that with PC games you have control panel overrides and more options in game these days - if you are using the control panel then its just applying the hardware AF mechanism across every texture. PC developers may choose to implement AF correctly and actually set it across certain textures, as we as the case with Doom3, but that appear to be not too common at the moment.

Console titles don't have the options of PC titles as they are sticking to a fixed platform and they are in control of what goes in and what goes out for the performance they are targetting - it not wise to enable global AF because it will require performance and in many cases it just wont make any appreciable difference, so its up to the developers to pick what textures it should or shouldn't be enabled on and also consider the performance implications. Although NV2A had decent AF the performance wasn't there until this generation, so it just another thing that developers are going to need to pick up and learn how to utilise effectively (along with massive leaps in shader power in relation to fill-rate, tiling, etc., etc.)

Shifty Geezer said:
Which ties in with my original query - what could be added to consoles to enable AF as standard? In the same way Xenos has eDRAM to provide AA and transparent fill, is there a technique/system that could be added to a console to add AF without adding considerable cost and without taking away from the rest of the shading performance?

Endless bandwidth, unbounded sampling capabilities and gobs of texture cache, at a stab...

More of those elements is really the only way to make it cheaper. However, these are silicon costs for diminishing returns - there's a lot more texture information that goes in that you'll actually notice as requiring high quality filtering. You'll end up wasting an awful lot silicon for a fairly limited use.

The flipside, and on that I think is more pertinent right now, is not what we need more of, but how to make better use of the resource that is now there. For instance, as we become more shader limited then this makes the cost of higher quality filtering lower. There are a number of operations that previously have been done via texture lookups but can be achieved through shaders instread, with provides the benefits of reduced bandwidth overhead and more sampling capabilities for other texture operations. Given that Xenos, and RSX, have such a large leaps in shader capabilities in relation to their local bandwidths (even in comparison to current PC graphics processors) developers aren't likely to be utilising that part of the chip fully yet - at least not on first gen titles (X800, which was in the initial dev kits had a 1:1 texture:shader ratio, Xenos has a 1:3 ratio).

I'd say that nothing it likely to significantly change: a.) because its a waste for the mjority of sampling thats actually required, b.) because there is still plenty to be done on the development side (utilising it on the correct textures, making more use of the shader hardware thats available).

Nemo80 said:
All i can see is that GRAW SinglePlayer has no AF, but AA, while Multiplayer has AF (although extremly ugly optimized patterns) but no AA (and no HDR). So i guess it definately is a performance issue - at least on Xenos.

AFAIK the single an multiplayer elements were done by different dev teams.

Nemo80 · Jun 4, 2006

Dave Baumann said:
AFAIK the single an multiplayer elements were done by different dev teams.

Yes, but do you really think they did 2 different engines for one game? That's extremly unlikely. It seems more like a downgrade of gfx to retain a better multiplayer performance (similar to GOW).

supervegeta · Jun 4, 2006

Shifty Geezer said:
The subject of AF appears every now again, especially regards a general lack thereof in XB360 screenshots as they're the most prolific source of screenshots at the mo'. It's said of Red Steel for Wii that that'll have 8xAF. It's also been suggested by some that PS3 is showing AF where XB360 isn't. Now I am not saying this is the case (no XB360 vs. PS3 rubbish thanks), but, if so, whether on PS3 or Wii, what hardware trickery can be used to get over the massive texture BW demands of AF? Are we looking at large texture caches on GPU perhaps? Hasn't as much been suggested of RSX? Is the BW the major limiting factor, and are there any other special devices that could be used to add this IQ enhancement?

Since lost planet don't have any blurring texture in the distance, and also don't have any giaggie, i guess you have to make this question at Capcom.

A few example of lost planet images with no blurred textures in the distance :

http://media.teamxbox.com/games/ss/1369/full-res/1140223372.jpg

http://media.teamxbox.com/games/ss/1369/full-res/1145036145.jpg

Dave Baumann · Jun 4, 2006

Nemo80 said:
Yes, but do you really think they did 2 different engines for one game? That's extremly unlikely. It seems more like a downgrade of gfx to retain a better multiplayer performance (similar to GOW).

They may have used to different builds, I don't know. However, the different dev teams will have different targets and priorities in mind, as well as how much they need to achieve in the timescale. The use AA in one mode does not preclude the use of AF - AA on Xenos is mainly a vertex cost operation, entirely separate from texturing and the more its limited up at this end of the pipeline the less it is going to be texture limited.

Shifty Geezer · Jun 4, 2006

supervegeta said:
A few example of lost planet images with no blurred textures in the distance :

I get the impression these aren't actual in-game shots but promo shots (with touchup/artificially high IQ), unless XB360 really is rendering this

http://media.teamxbox.com/games/ss/1369/full-res/1134401028.jpg

at 2560x1440 pixels with lots of AA and AF...

supervegeta · Jun 4, 2006

Shifty Geezer said:
I get the impression these aren't actual in-game shots but promo shots (with touchup/artificially high IQ), unless XB360 really is rendering this

http://media.teamxbox.com/games/ss/1369/full-res/1134401028.jpg

at 2560x1440 pixels with lots of AA and AF...

But i have the demo and it look exaclty like that depsite the resolution.

Fafalada · Jun 4, 2006

Ben Skywalker said:
The GS was imbalanced horribly in terms of its raw fill compared to what it could, by typical standards, do with it.

Imbalanced how? Software used that raw fill to good advantage/effect - imbalanced would imply something else in the design made it impossible to utilize it.

Also- is it not likely that the RSX will need to fully emulate the GS? I would assume(though I don't know for certain) that they would want to have a comparable number of pipelines.

Not of any importance whatsoever.
There is a number of things on GS that cost a couple of cycles that typically take hundreds on other chips, I don't see how any consumer chip currently available could do 1:1 GS emulation at full speed.
Either you need some kind of workaround for GS friendly stuff (which - if possible, would involve some kind of dynamic recompilation of display lists) - or you just accept that you will be incompatible with some stuff (this might be an option if PS3 was MS machine).
In either case - matching stuff like number of pixel pipelines is obviously irellevant.

Anyway, in normal rendering 8ROPs@550mhz has roughly 5x higher fill then GS, if emulation isn't possible with that, I don't see how 16 would make a difference.

I'm not trying to say that it will or won't happen- but I don't see why they would cripple a chip they can easily build by cutting its transistor count by so much.

So I take it you consider Xenos to be crippled as well?

(as I have yet to see them design anything balanced to date for their consoles ).

I couldn't disagree more - but then what do I know about internals of Sony consoles

Acert93 · Jun 4, 2006

Nemo80 said:
All i can see is that GRAW SinglePlayer has no AF, but AA, while Multiplayer has AF (although extremly ugly optimized patterns) but no AA (and no HDR). So i guess it definately is a performance issue - at least on Xenos.

You said this in the other thread and was shown to be wrong. Why repeat it? Here are some MP shots that clearly show some Anti-Aliasing:

Shot 1
Shot 2
Shot 3
Shot 4
Shot 5

As nice as 2xMSAA or 4xMSAA may be, they still don't remove all the jaggies. This demonstration (bottom of the page) shows how even 8x MSAA is not enough in all situations. The above GRAW shots, while still having some minor aliasing, do appear to have some MSAA applied.

Nemo said:
Yes, but do you really think they did 2 different engines for one game? That's extremly unlikely.

GRAW MP was developed on the GR2 engine, and once the gameplay was how they wanted it, "artists went into production and maximized the power of the Xbox 360 to get the coolest graphics." We do know that the MP was co-developed by another team.

scooby_dooby · Jun 5, 2006

Yes, red storm designed the MP engine and the co-op maps for the campaign mode.

Anyone who'd played GRAW would realise these were 2 completely different engies.

Mintmaster · Jun 5, 2006

darkblu said:
am i reaing you correct here that all conditions being equal a greater bandwidth would increase the performance hit? how so?

Because on most hardware and most workloads, regular trilinear filtering will benefit more from increased bandwidth than anotropic filtering. Imagine rendering a single textured object on a single ROP, single TMU architecture. Suppose you need 6 bytes per pixel for colour/z (RGBA8, 4:1 z-compression), and texture data is 1 byte/pix with trilinear and 4 bytes/pix w/ 2x aniso (I'm purposely handicapping the aniso).

Then with infinite bandwidth on tap, the hardware takes 1 cycle/pix for the trilinear case and 2 cycles/pix for aniso. You get 1 pix/clk for tri, 0.5 pix/clk for aniso. With 4 bytes of BW/clock, you get 0.57 pix/clk for tri, and 0.4 pix/clk for aniso. So the infinite BW scenario has a 50% drop for aniso, and the limited BW scenario has a 30% drop. This is what I was claiming.

Now, I suppose there are some math-heavy cases where a decoupled TMU wouldn't take more cycles per pixel, and thus BW per clock would increase. But because you're math heavy, BW shouldn't really matter then. On Xenos there would be a very small chance of a shader saturating BW while aniso is enabled but not with trilinear. On RSX the TMUs aren't decoupled so I think enabling aniso always increases the cycle count (I'm not positive, but this data suggests so).

ok, how about an aniso sampler that produces the sampling coords in a single clock and passes them to multiple isotropic units - how would that not be more bandwidth per fragment?

Since you're using multiple isotropic units, the number of pixels using those isotropic units simultaneously is reduced. In your system, more BW per fragment, but fewer fragments per clock. It's exactly the same as one isotropic unit per pixel being used over multiple cycles.

Either way, it's just one bilinear sample per TMU per clock, regardless of filtering method. The only reason aniso could use more BW per clock is mipmap level (see below).

keep in mind that with anisotropy you may easily get much less texel reuse than in a properly-mipmapped isotropic case where texel reuse between two adjacent fragments is 25-50% at non-magnification, count in the rest of the neightbours and you get ~100% reuse (poly edges notwithstanding). which is far not the case with aniso.

If you read my post, I did keep that in mind, as I said I was comparing to a trilinear surface viewed head on. I think it's stupid to compare anisotropic filtering to blurred trilinear. For trilinear filtering you can reduce bandwidth by applying an LOD bias for a lower mipmap, but you don't see devs doing that.

We know Xenos is fine with looking at a surface head on, and the detail in aniso is the same as the detail of a surface viewed head on. So I prefer to compare aniso to head on trilinear, i.e. same mipmap, since it is more "apples-to-apples" in my book.

How can AF be implemented effectively on consoles/RSX?

Dave Baumann

Gamerscore Wh...

BenSkywalker

Dave Baumann

Gamerscore Wh...

kyleb

Acert93

Artist formerly known as Acert93

kyleb

Inane_Dork

Rebmem Roines

kyleb

Shifty Geezer

uber-Troll!

Nemo80

Dave Baumann

Gamerscore Wh...

Nemo80

supervegeta

Dave Baumann

Gamerscore Wh...

Shifty Geezer

uber-Troll!

supervegeta

Fafalada

Acert93

Artist formerly known as Acert93

scooby_dooby

Mintmaster

Similar threads