Is AF a bottleneck for Xenos?

Titanio said:
Well, if you can give us a better potential explanation as to why it's just not being used in many games, do tell us?

I'm surprised myself because the theoretical bandwidth requirement I've seen quoted isn't THAT high. Which is why I'm wondering if things are so tight in terms of bw that even a small amount extra would be the straw that broke the camel's back in many of these cases. The cost could be small, yet still too much, if BW was sufficiently scarce.
Bandwidth is not the issue, and has never been the issue. Especially on Xenos.

I already gave an explanation. It costs clock cycles. If a triangle sufficiently tilted with respect to the camera (e.g. a road, far away), Xenos can only render these single-textured pixels at a rate of 500 MPix/sec. Add more textures, include overdraw, etc. and it can be a problem. Still, AF should be entirely usable by developers, and enabling AF on the base texture in should be negligible at 720p.

My guess is it's just not a priority for console developers. Enabling it globally usually isn't a good idea. In Test Drive, for example, enabling it only for the base texture in the road for the final rendering of the scene (and not during cube-map rendering or post processing) should make a big difference with little performance impact.

BTW, this is one area that RSX should have a theoretical advantage over Xenos because its texturing rate is 65% higher, but that depends on a number of factors (and there is data out there to suggest that this may not be the case).
 
Mintmaster said:
Bandwidth is not the issue, and has never been the issue. Especially on Xenos.

I already gave an explanation. It costs clock cycles. If a triangle sufficiently tilted with respect to the camera (e.g. a road, far away), Xenos can only render these single-textured pixels at a rate of 500 MPix/sec. Add more textures, include overdraw, etc. and it can be a problem. Still, AF should be entirely usable by developers, and enabling AF on the base texture in should be negligible at 720p.

My guess is it's just not a priority for console developers. Enabling it globally usually isn't a good idea. In Test Drive, for example, enabling it only for the base texture in the road for the final rendering of the scene (and not during cube-map rendering or post processing) should make a big difference with little performance impact.

Well finally, there's a decent, different explanation, thank you! :LOL:

I'm kind of surprised if it were a computational thing, but as you explain it, it seems plausible.

It would be preferable if Xenos were capable more typically of global AF rather than relying on developers to selectively apply it, as most would appear not to be bothered doing that. And the effectiveness of selective application may be limited depending on how close they are to the cross-over point to unacceptable performance (i.e. how much they can use it could be quite limited..but yeah, in a racer's example, the road ought to be a top priority at least).
 
aaaaa00 said:
ERP already explained this.

If you're running behind schedule, and you've got performance problems, and you don't have time to figure out what the real problem is and fix it, you just start turning things off to make your ship date.

AF is one easy thing to turn off, and even if it gets you just 1% back, that's 1% closer to your ship criteria than you were before.
thank you for boiling it down

I'm sure the future is bright for adding these features (if found necessary) with more time on Xenos.

Of course, I don't play screenshots and would hardly notice this in the heat of a game. :p

Which may explain:
Originally Posted by Mintmaster
My guess is it's just not a priority for console developers.
 
Mintmaster said:
Bandwidth is not the issue, and has never been the issue. Especially on Xenos.

Maybe framebuffer B/W, but I wouldn't say texture B/W is "never" a non-issue, especially sharing it with XeCPU...

I already gave an explanation. It costs clock cycles. If a triangle sufficiently tilted with respect to the camera (e.g. a road, far away), Xenos can only render these single-textured pixels at a rate of 500 MPix/sec. Add more textures, include overdraw, etc. and it can be a problem. Still, AF should be entirely usable by developers, and enabling AF on the base texture in should be negligible at 720p.

Isn't that the whole point of fully decoupling the TMUs and having out of order shading/ texturing and dynamic load balancing so that those extra cycles don't become an issue?

Btw, please could you explain the 500 Mpixel/sec number.

...
My guess is it's just not a priority for console developers. Enabling it globally usually isn't a good idea...

Maybe, but we'll have to wait and see if Rev and PS3 devs have the same mentality...
 
Jaws said:
Maybe framebuffer B/W, but I wouldn't say texture B/W is "never" a non-issue, especially sharing it with XeCPU...
We're talking about AF here, Jaws. Look at the topic title, look at the context of the conversation. In those calculations, 8GB/s is all the rendering bandwidth needed from main memory. All the rest is there for XeCPU and vertex traffic (which shouldn't be much when AF limited).

Isn't that the whole point of fully decoupling the TMUs and having out of order shading/ texturing and dynamic load balancing so that those extra cycles don't become an issue?

Btw, please could you explain the 500 Mpixel/sec number.
Jaws, you have to lay off your excitement for this decoupled architecture (as displayed when R520 was released). It doesn't bring you any miracles. All it does is let you do arithmetic ops in parallel if you need to, and the "ultra-threading" can avoid data dependency issues. If you have 16 texture units, then you have 16 texture units. Period.

If you need to do 16 filtered texture accesses per pixel for an extremely angled surface, then Xenos can only put out a net of 1 pixel per clock. Hence 500Mpix/s.
 
Jaws said:
Maybe framebuffer B/W, but I wouldn't say texture B/W is "never" a non-issue, especially sharing it with XeCPU...
As has been explained numerous times in the thread, he's saying its not an issue on bandwidth because AF itself isn't really a bandwidth intensive operation since its very cache efficient and, much of the time, it just going to be cycling through taking samples directly from the texture cache.

Isn't that the whole point of fully decoupling the TMUs and having out of order shading/ texturing and dynamic load balancing so that those extra cycles don't become an issue?
Yes. If your backed up by pixel fillrate, vertex shading, Pixel Shader operations or any operation that isn't texturing, then the cheaper any texture operations become, especially AF which is cache efficient. I think that Mintmaster was probably talking about a hypothetical texture limited situation (Note, I also suspect that current titles are probably quite texture limited because they aren't taking full use of the ALU horsepower yet, as the ratio of math to texture ops is the highest seen on any graphics until X1900, and developers, especially console developers, aren't used to that yet - there may be plenty of mileage left in switching some texture ops over to math ops).
 
Mintmaster said:
We're talking about AF here, Jaws. Look at the topic title, look at the context of the conversation. In those calculations, 8GB/s is all the rendering bandwidth needed from main memory. All the rest is there for XeCPU and vertex traffic (which shouldn't be much when AF limited).

DaveBaumann said:
As has been explained numerous times in the thread, he's saying its not an issue on bandwidth because AF itself isn't really a bandwidth intensive operation since its very cache efficient and, much of the time, it just going to be cycling through taking samples directly from the texture cache.

Sorry, but I have issues with sweeping, declarative statements like "never". If it was rephrased "shouldn't" be an issue, then that's different.

Mintmaster said:
Jaws, you have to lay off your excitement for this decoupled architecture (as displayed when R520 was released). It doesn't bring you any miracles. All it does is let you do arithmetic ops in parallel if you need to, and the "ultra-threading" can avoid data dependency issues. If you have 16 texture units, then you have 16 texture units. Period.

My excitements in check than you. Call it hype generated on this board...

DaveBaumann said:
Yes. If your backed up by pixel fillrate, vertex shading, Pixel Shader operations or any operation that isn't texturing, then the cheaper any texture operations become, especially AF which is cache efficient. I think that Mintmaster was probably talking about a hypothetical texture limited situation (Note, I also suspect that current titles are probably quite texture limited because they aren't taking full use of the ALU horsepower yet, as the ratio of math to texture ops is the highest seen on any graphics until X1900, and developers, especially console developers, aren't used to that yet - there may be plenty of mileage left in switching some texture ops over to math ops).

I'll definitely give it the benefit of the doubt. I'd like to see what this architecure can pull off when it's singing...
 
Dave Baumann said:
If your backed up by pixel fillrate, vertex shading, Pixel Shader operations or any operation that isn't texturing, then the cheaper any texture operations become, especially AF which is cache efficient. I think that Mintmaster was probably talking about a hypothetical texture limited situation (Note, I also suspect that current titles are probably quite texture limited because they aren't taking full use of the ALU horsepower yet, as the ratio of math to texture ops is the highest seen on any graphics until X1900, and developers, especially console developers, aren't used to that yet - there may be plenty of mileage left in switching some texture ops over to math ops).
Succinctly put. This was something I had not guessed at the beginning of the thread. Because of the nature of the system, a certain number of ALU ops are "free" per texture fetch.
 
For the sake of education, can someone derive the 8GB/s figure? :???: Is that just for 2xAF? The calculations I had seen before don't check out with that.

It also seems to be a higher number than I would have expected, even peak. That would be a significant chunk of a theoretical 22.4GB/s pipe, IMO, assuming you needed to always accomodate it as a potential worst case.
 
This thread is very informative. I want rep points! :p

Mintmaster said:
My guess is its just not a priority for console developers.
But halve the games up there are from PC developers making non launch titles.

Titanio said:
For the sake of education, can someone derive the 8GB/s figure? Is that just for 2xAF?

It also seems to be a higher number than I would have expected, even peak. That would be a significant chunk of a theoretical 22.4GB/s pipe, IMO, assuming you needed to always accomodate it as a potential worst case
QFT.
 
The cache efficiency of aniso isn't THAT high. Given a well-designed texture cache and a Feline-like multipass algorithm, it should be roughly the same as what you get with trilinear when facing the texture straight on (which should lead to texture memory traffic of about 1.5 texels per cycle per bilinear-filtering-texture-mapper-unit, barring any LOD-bias mumbo-jumbo).

For games that use compressed textures or very math-heavy shaders, this is not likely to be a problem. If you use uncompressed textures (so that you can saturate bandwidth on textures alone) then I suspect that Xenos will take a much heavier performance hit with aniso than PC GPUs, mainly because Xenos has a much higher ratio of texture to non-texture traffic that the PC GPUs.
 
tema said:
But halve the games up there are from PC developers making non launch titles.

Most PC developers don't care about Aniso either, at least not to the point of applying it selectively to each texture. I think most developers either let users tweak it through the Control Panel, or just provide a global setting that is applied to all textures.
 
Mintmaster said:
My guess is it's just not a priority for console developers.
Which is just sad, as it is one of the cheapest, easiest and at the same time most effective ways to improve the visual quality of a game.

Giving the artists an AF on/off checkbox for texture layers and adding three lines of code to set the AF level per texture hardly takes even a day. If that's not worth doing, I don't know what is.

The same is true for AA.
 
nAo said:
Umh..I'm not so sure about that..
Of course it depends on the system whether the cost is prohibitive, but in most cases at least some MSAA is pretty damn cheap.
Code changes are fairly minor as well, enabling AA when creating a render target is simple, so is downsampling a render target if required. In many cases it is already a big improvement to use it only on the backbuffer/main render target.
I don't know about the tiling required on Xenos, though.

It makes me bang my head on the keyboard when devs don't enable AA even in cases where it is absolutely free. And the difference is just night and day for me.
 
  • Like
Reactions: Geo
MS has stated to properly use the EDRAM and get the advertised FREE AA among other things the game engine needs to be built for tiling. Early games have either been ports or games built with a very narrow development time in order to reach the launch window. Once Devs are given time to really tweak code and build for the Xenos strenghts they may need to cut corners a bit.

Considering the time devs were given the first gen games on the 360 look really good, and games like condemned do have AA+HDR+AF and that game
 
swanlee said:
MS has stated to properly use the EDRAM and get the advertised FREE AA among other things the game engine needs to be built for tiling. Early games have either been ports or games built with a very narrow development time in order to reach the launch window. Once Devs are given time to really tweak code and build for the Xenos strenghts they may need to cut corners a bit.
That's a theory that I used to subscribe to until I rad that emulation of tiling was present even on alpha kits (IIRC). There was nothing stopping a new engine being written with predicated tiling in mind, and from talk on this forum actually writing for tiling isn't too hard as long as you design for that from the beginning.

Also, does the eDRAM have any role to play in AF? I don't think it does as it's a matter for the texturing. So in this particular query about AF on XB360, I don't think 'learning to use Xenos properly' can apply unless it does AF in a very weird and unconventional way.
 
Xmas said:
Of course it depends on the system whether the cost is prohibitive, but in most cases at least some MSAA is pretty damn cheap.
I see your point but I was not referring to MSAA cost per se..(since we were talking about consoles..)
I'm just saying that sometimes is not as easy as turning a swtich on.
 
  • Like
Reactions: Geo
Shifty Geezer said:
Also, does the eDRAM have any role to play in AF? I don't think it does as it's a matter for the texturing.

I was about to say. I don't think tiling and the apparent implementation issues surrounding that is relevant here.
 
Back
Top