Is AF a bottleneck for Xenos?

"and from talk on this forum actually writing for tiling isn't too hard as long as you design for that from the beginning."

Yea but alot of games were ports, and or games that were developed under very tight time constraints. I willing to bet that the majroity of games were started on pc's before devs kits were available and code was ported. some games like condemend have all the good next gen features. I think it's simply a case of first gen titles and not a problem with the hardware.
 
I think it's relevent if a dev did not build there engine for tiling but still wanted to ad AA there would be some slow down with AF turned on do to potential bandwidth issues for these effects, so most probably kept AA and left out AF. I just don't think it's correct to fully judge the capability of the 360 based on 1sr gen titles produced under serious time constraints.
 
swanlee said:
I think it's relevent if a dev did not build there engine for tiling but still wanted to ad AA there would be some slow down with AF turned on do to potential bandwidth issues for these effects, so most probably kept AA and left out AF.

Even if you don't use tiling, any bandwidth consumption for AA is still limited to the daughter die. You wouldn't be suddenly trading off main memory bandwidth for AA versus AF.
 
Shifty Geezer said:
in this particular query about AF on XB360, I don't think 'learning to use Xenos properly' can apply unless it does AF in a very weird and unconventional way.

Shifty, did you see my post? I'm curious what you make of this. PR BS?
 
That statement just says they have adaptive AF (per object, rather than global like on the PC) so you can't call the AF level per game 4x or 16x. Hence they can't attribute a a standard requirement for AA as it depends on the title and the scene, which answers the question of the interviewer. It gives no insight to how AF is achieved, what the demands are, how that eats into other resources, and such, and doesn't help explain why AF isn't featuring in these early titles.
 
Titanio said:
For the sake of education, can someone derive the 8GB/s figure? :???: Is that just for 2xAF? The calculations I had seen before don't check out with that.
...

There are a couple of ways you can look at this. You can break it down into b/w usage per frame and resolution or just look at 16 TMUs being utilised 100% for worst case with it's peak texel fillrate. E.g. for 2xAF (16 samples, 32bit textures),

16 TMUs x 0.5 Ghz x 16 samples per texel x 4 bytes per sample

~ 512 GB/sec for uncompressed textures

for 1:8 texture compression,

~ 64 GB/sec

Titanio said:
I'm not saying tiling is trivial, I'm wondering if it is relevant to AF, or how relevant it

Overlapping triangles from tiling will need to be shaded again, therefore AF applied again and b/w usage again. The hit will depend on triangle sizes and number of tiles...

EDIT:

Left out colour at 4 bytes/sample
 
Last edited by a moderator:
Jaws said:
Overlapping triangles from tiling will need to be shaded again, therefore AF applied again and b/w usage again. The hit will depend on triangle sizes and number of tiles...

Thanks for the calcs Jaws!

I didn't know before that redundant polygon processing went all the way to shading of those polygons, but if that's the case, this also makes sense. But even amongst some games that don't tile at all, we've seen poor texture filtering. I guess I should revise my statements to say "it would be disappointing if X360 games could not more typically afford the cost global AF + the (small?) extra cost of AF incurred when processing redundantly while tiling" ;)
 
Last edited by a moderator:
It doesn't go as far as Pixel Shading (hence applying textures), because once transformed and projected into screen (tile) space pixels that are outside of that tile will be clipped and not rendered.
 
No, a triangle that occupies more than one tile does not lead to any pixels being shaded twice.

Such a triangle is clipped to each tile, so there's no overlap.

As for texel bandwidth - any one texel will be re-used multiple times in the shading of multiple pixels. That's what texture cache is for. If those bandwidth calculations were meaningful then AF would still be a distant dream.

Jawed
 
Jawed said:
As for texel bandwidth - any one texel will be re-used multiple times in the shading of multiple pixels. That's what texture cache is for. If those bandwidth calculations were meaningful then AF would still be a distant dream.

True.

I thought it didn't go all the way down pixel shading, which I guess leaves us back to square one as far as the relevancy of tiling is concerned (to AF). Thanks to both of you for the clarification.
 
Dave Baumann said:
It doesn't go as far as Pixel Shading (hence applying textures), because once transformed and projected into screen (tile) space pixels that are outside of that tile will be clipped and not rendered.

Jawed said:
No, a triangle that occupies more than one tile does not lead to any pixels being shaded twice.

Such a triangle is clipped to each tile, so there's no overlap.

Okay, this makes more sense.

Jawed said:
As for texel bandwidth - any one texel will be re-used multiple times in the shading of multiple pixels.

Yep, obviously, the purpose of the texture cache to reduce b/w demand.

Jawed said:
If those bandwidth calculations were meaningful then AF would still be a distant dream.

It's as meaningful as ATI stating 256 GB/sec for framebuffer b/w for Xenos. AA would be a distant dream for anything less. Point being, as already stated, worst case...
 
Titanio said:
I thought it didn't go all the way down pixel shading, which I guess leaves us back to square one as far as the relevancy of tiling is concerned (to AF). Thanks to both of you for the clarification.
Tiling is a concern because a complex rendering engine will lose great chunks of general performance if the dev simply flicks the switch to turn on automatic tiling, e.g. as render targets get generated three times instead of once. Automatic tiling is not a panacea as far as I can make out, unless the engine is simple or doesn't demand huge amounts of performance.

Ultimately predicated tiling is more workload for devs - in some ways it's analogous to the "enforced DMA with double-/triple-buffering" style of working that goes along with SPE memory management.

Jawed
 
Jawed said:
Tiling is a concern because a complex rendering engine will lose great chunks of general performance if the dev simply flicks the switch to turn on automatic tiling, e.g. as render targets get generated three times instead of once. Automatic tiling is not a panacea as far as I can make out, unless the engine is simple or doesn't demand huge amounts of performance.

Ultimately predicated tiling is more workload for devs - in some ways it's analogous to the "enforced DMA with double-/triple-buffering" style of working that goes along with SPE memory management.

This is all true, but unless the impact on performance of "automatic tiling" relates to the same bound as that which holds back AF (whatever that is), is it directly relevant to AF?
 
Jaws said:
There are a couple of ways you can look at this. You can break it down into b/w usage per frame and resolution or just look at 16 TMUs being utilised 100% for worst case with it's peak texel fillrate. E.g. for 2xAF (16 samples, 32bit textures),

16 TMUs x 0.5 Ghz x 16 samples per texel x 4 bytes per sample

~ 512 GB/sec for uncompressed textures

for 1:8 texture compression,

~ 64 GB/sec
AFAIK, the Xenos GPU's texture units are not capable of taking more than 4 samples per texel per clock per texture unit (+1 for a second, unfiltered texture IIRC) - 2xAF for 16 pixels will take 4 cycles, not 1.

Also, the number you get this way is NOT the bandwidth from memory to TMUs but instead bandwidth from texture-cache to the TMUs. When doing bi/tri-linear filtering, the texels that you need to access to filter adjacent pixels usually overlap to a large degree and thus will benefit greatly from caching (the only way to avoid this overlap is to grossly undersample the texture, which is not even close to possible with trilinear unless you set a large negative LOD bias). For the usual trilinear-aniso scenario (assuming no LOD bias), the texture cache will normally reduce the texture bandwidth needed by a factor of roughly 2.5 to 3.
 
Jaws said:
It's as meaningful as ATI stating 256 GB/sec for framebuffer b/w for Xenos. AA would be a distant dream for anything less. Point being, as already stated, worst case...
That comparison is only meaningful if you have one GPU with EDRAM and another without. It would be like one GPU with texture caching and another without, though the difference afforded by EDRAM wouldn't be as extreme.

Your workings for filtering bandwidth consumption are completely irrelevant. You might as well count bus interface pins for all the good it'll do you.

Your only hope is to try to assess the cache architecture, the pipeline architecture and their quantities. On top of that you have the out of order scheduling in Xenos, which adds another layer of intractability.

We don't even know if Xenos has an L1-only or an L1/L2 cache structure (I happen to think the latter).

Jawed
 
Titanio said:
This is all true, but unless the impact on performance of "automatic tiling" relates to the same bound as that which holds back AF (whatever that is), is it directly relevant to AF?
It could be relevant if you make your render engine so solidly ALU-bound that texturing performance itself suffers.

You can see this effect when comparing R520 and R580 performance. At the same clock speeds, R580 can perform texturing 15-50% faster than R520, simply because it's not ALU-bound while R520 is.

That's an extreme scenario. There's been talk in this thread already of games that seemingly aren't using tiling and which show no AF. Though it's worth remembering that just because a game doesn't have AA doesn't mean it isn't being forced to use tiling (EDRAM isn't necessarily enough to support the backbuffer + render targets, e.g. realtime cubemaps for reflections). But, again, I don't know how likely that scenario is.

Jawed
 
Back
Top