Is AF a bottleneck for Xenos?

Dave Baumann said:
No, it was just a pointless comment. Of course its not going to perform AF for free, these things will entirely depend on numerous usage scenarios and whether it fits in the performance envelope of the targets the developers have set for the title.
I don't think I need to be educated by you about xenos or anisotropic filtering, it was just a joke, deal with it and calm down, it's just a gpu!
 
I'm wondering if it's a bandwidth issue.

Just a hypothetical...

If Xenon consumed 5GB/s on the average from main memory that would leave 17.4GB/s available to Xenos. An 1800 XT has 48GB/s bandwidth available to it. In an attempt to normalise Xenos's eDram usage for FSAA, apha-blending, etc. I'll work with percentages I've seen tossed around from time to time as to what the eDram saves on bandwidth consumption. Something between 30-40% so I'll go with 35%.

48GB/s x .35 = 16.8 GB/s the eDram saves approximately.

48Gb/s - 16.8GB/s = 31.2 GB/s

31.2Gb/s would be where the 1800 XT would sit if it had Xenos's eDram. An ATI fellow, Micheal Doggett/D-something, a while back noted Xenos is comparable to the X1800 XT in power so that's why I used it for a baseline of comparison. Xenos has 22.4GB/s bandwidth available which would be 8.8GB/s short of what it takes to feed a GPU of comparable power. If we take into account Xenon's bandwidth consumption with what I think is a modest 5GB/s that would leave Xenos with around 17.4GB/s. This would be 13.8GB/s short of the bandwidth available to a GPU of comparable power.

If we say the eDram can be normalized to 50% of the total bandwidth consumption that would save 24GB/s for the X1800 XT while leaving a need for another 24GB/s. Still suggesting Xenon consumes 5GB/s from main memory this leaves Xenos needing 6.6GB/s to be as fed as readily as the X1800XT.

From what I understand both Xenos and the X1800XT have 16 TMUs. Both utilize hyper-threading, but the X1800XT's is a little finer grained and has a more advanced memory subsystem in it's ring bus. IIRC the TMUs are decoupled in both GPUs. Dave's article noted the AF has been improved over previous generations in Xenos. I would take that to mean the Xenos and R520 over R420/R300 etc as I think both use the same TMUs.

All things being equal it would suggest to me that this would be a bandwidth issue if an issue at all other than early games going through their growing pains. Perhaps, the extra samples AF requires is problematic on top of those needed for pixel shader etc and will require better stewardship of resources if it is to be used in conjunction with everything else.

I think I was generous with how much value I gave Xenos's eDram and conservative in how much bandwidth Xenon would consume in the system. If anyone would be so kind please correct me if I've been unfair.
 
Last edited by a moderator:
Shompola said:
I remember a lot of bitching about Xbox games lacking AF....You would think MS would take that in account.. But no lack of AF is still present. I am very dissapointed.. Lack of AA I could live with, but lack of AF can make an otherwise very good looking game look uneven.

To be fair this is not exactly MS' problem.
The tools are/were there, you just have to turn it on and deal with the cost. Sames true for PS3/X360.

What doesn't seem like a significant cost on a PC can be a dramatic cost on a console. PC's are rarely pushing the envelope graphically and the devs usually just let the users decide. On a console you pick a framerate and you trade things off to make it work.

Devs are making decisions to trade off polygon counts, texture layers, shader complexity, rendering features etc etc etc.

On Xbox we used to selectively set the aniso level based on the shader, for the subset of geometry we actually aniso filtered it cost about 5% of a frame, IMO that was a reasonable tradeoff.

To put that in perspective I've rewritten a rendering engine to save less than that.

It should be noted that it's also something easy to turn off , and if you are having performance issues and don't have time to track down what they are, it's an easy switch to throw to ship your game.

I haven't sat down and benchmarked Xenos in any meaningful way, but from my understanding of it's architecture whether it's a significant cost depends on shader complexity (number of ALU to Tex ops) texture formats in use, LOD bias etc etc. These are the same things that dictate it on PC cards.

The texture cache can clearly thrash as can any cache, but that's unlikely to be an issue on simple regular texture fetches, unless you have a LOT of source textures or your using innefficient source formats. Start using a big noise texture to randomly dereference a large second texture and I've yet to see any texture cache do anything but thrash.

I've never heard anyone complain about the excessive aniso cost on Xenos.
 
Guilty Bystander said:
I remember reading a few months ago Xenos has problems with it's texture engines can´t give you a link on that.
So maybe the texture engines in the Xenos are actually so bad it can't even do Trilinear filtering or Anisotropic filtering without dropping to very low framerates when doing Bump-mapping, Offset-mapping, Normal-mapping etc.
I'm still scratching my head on this one. I don't recall reading anything even remotely close to this about Xenos. Xenos hardware was in development for 2 years or more and meant to serve inside a closed box for 4-5 years, so if there is a single gpu out there that is least likely to house any major design errors, its Xenos.
 
Last edited by a moderator:
Dave Baumann said:
Texture cache is 32KB on Xenos, which is actually quite a lot by many standards - first generation DX9 cards were about 256B per quad. Larger texture caches, of course, also lowers texture bandwidth use as AF resuses local texture data alot and this operation can be very cache efficient. (Note: G71's texture cache has been increased - in relation to G70 - to 32KB as well, from what I understand).
Ah, well, there goes that hypothesis. Thanks for the info.

Again, the difference between PC's and consoles is that PC's are mainly utilising AF in a very inefficent way and applying them across everything, but its up to the console developers to pick and choose the textures and AF levels selectively.
Oh, I'm quite well aware of that. And I know that AF is unlikely to see heavy use till developers get a handle on the system. I was just generating technical reasons why AF is not common on the 360 yet. Heck, I don't even know if that's true or not.



Luminescent said:
I'm still scratching my head on this one. I don't recall reading anything even remotely close to this about Xenos.
There was this technology/engine demo from a Japanese developer that claimed their main rendering shader ran over the texture cache, IIRC. They moved shadow computation to another pass to get performance up. It was a quirky little demo, as I recall. Maybe someone remembers the specifics.
 
Last edited by a moderator:
ERP said:
I've never heard anyone complain about the excessive aniso cost on Xenos.

Well maybe they should turn it on more often then :p

Seriously, there is a bit of an unfortunate trend emerging, and people are beginning to notice (and complain) even if developers aren't.

I get the point though, that something doesn't need to be excessively expensive to be a non-runner.
 
Guilty Bystander said:
Again only low AF but no HDR and no MSAA.
And worst of all that the framerate is something like 10-15fps in that screen.
Ridiculuos.
Christian Allen talks about mp
GRAW is in parallel development, meaning that whenever we make any changes it affects all maps and not individual missions. For example, the High Dynamic Range wasn't integrated until later on. It was then integrated across all maps in one version, followed by fine tuning. On the other hand, some games are developed map by map, so you will get final quality early on in the development cycle for a couple of maps but in later maps you won't normally see them playable or at a 100% final quality until much later.
http://xbox360.ign.com/articles/687/687911p1.html

mp uses at least 2aa (while sp 4aa); 720 mp
http://media.teamxbox.com/games/ss/1157/full-res/1140835044.jpg
 
Would "geometry" road markings have aliased edges? If so, shouldn't that be an easy give-away?

Jawed
 
scificube said:
I'll work with percentages I've seen tossed around from time to time as to what the eDram saves on bandwidth consumption. Something between 30-40% so I'll go with 35%.

Pardon me, but... WTF?? The EDRAM practically removes all the framebuffer traffic, as it only takes a few hundred megabytes per second to copy the tiles into the front buffer. And framebuffer traffic is quite a lot more than 35% of the total bandwith use in any system... so the rest of your post is, unfortunately, quite worthless.
 
I also bumped it up to 50% and I also asked for correction if I were wrong.

I like to talk about things and I'm here to learn. You seem offended. I don't know why, but if so cool out cause I don't know what's wrong with talking about these things.

I never even got into predicated tiling nor did I suggest it would cause for great bandwidth consumption and I'm well aware it's a negligible hit on the bandwidth available.

You know you could try to let the folks know what's going on instead of being condescending.
 
Ridiculuos.
Christian Allen talks about mp

Good for him Ã￾ have the final game running on my Xbox 360 right now.
When you're crouched in the grass and you use the scope then GRAW only runs like 10-15fps.

Pardon me, but... WTF?? The EDRAM practically removes all the framebuffer traffic, as it only takes a few hundred megabytes per second to copy the tiles into the front buffer. And framebuffer traffic is quite a lot more than 35% of the total bandwith use in any system... so the rest of your post is, unfortunately, quite worthless.

You're right that's why it's so odd there is no AF or only 2-4x AF cause bandwidth certainly isn't an issue.
Maybe like Cell the Xenos is a little difficult to get into.
The Xenos specs certainly aren't that bad.

Xenos specs (out of the top of my head):
256bit VPU R500 Fedos
48 fragment unified Shader ALU's
16 unfiltered and filtered textures
8 ROPs
SM3.0+ Shader technology
8GTexel/s fillrate
4GPixel/s fillrate
10MB eDRAM with 256GB/s bandwidth used as a framebuffer for Z-testing, MSAA, HDR etc.

There isn't anything I can see in the specs which should cause any problems.
Maybe it's just in order to really utilise the Xenos power you need work with the Xenos specific features.
 
ERP said:
To be fair this is not exactly MS' problem.
The tools are/were there, you just have to turn it on and deal with the cost. Sames true for PS3/X360.

What doesn't seem like a significant cost on a PC can be a dramatic cost on a console. PC's are rarely pushing the envelope graphically and the devs usually just let the users decide. On a console you pick a framerate and you trade things off to make it work.

Devs are making decisions to trade off polygon counts, texture layers, shader complexity, rendering features etc etc etc.

On Xbox we used to selectively set the aniso level based on the shader, for the subset of geometry we actually aniso filtered it cost about 5% of a frame, IMO that was a reasonable tradeoff.

To put that in perspective I've rewritten a rendering engine to save less than that.

It should be noted that it's also something easy to turn off , and if you are having performance issues and don't have time to track down what they are, it's an easy switch to throw to ship your game.

I haven't sat down and benchmarked Xenos in any meaningful way, but from my understanding of it's architecture whether it's a significant cost depends on shader complexity (number of ALU to Tex ops) texture formats in use, LOD bias etc etc. These are the same things that dictate it on PC cards.

The texture cache can clearly thrash as can any cache, but that's unlikely to be an issue on simple regular texture fetches, unless you have a LOT of source textures or your using innefficient source formats. Start using a big noise texture to randomly dereference a large second texture and I've yet to see any texture cache do anything but thrash.

I've never heard anyone complain about the excessive aniso cost on Xenos.

How similar is this description with turning on vsync? This is another trend i'm seeing in 360 games that i don't like(i think that was what Titanio is alluding too). I guess the main question is if AA and vsync are features that are particularly prohibitive on Xenos (because of edram or whatever) or just have very real tradeoffs in console development in general.

If its the latter then i'm sure will see things imprvoe quite a bit in this regard given the improvement we've seen in just these launch-plus games and the comments from devs like bizarre, et al on how much better they could do things in their second wave of titles.
 
Shifty Geezer said:
Having read that, some people are going to be expecting buckets of cache on RSX. But ignoring wishful thinking and just talking hypothetically, what are the costs/benefits of larger caches and why aren't they being used? Let's say 128 Kb cache per quad, perhaps. Would that have a lot of benefit in AF, but create a stupidly expensive GPU, or what?

Larger caches and buffers in the GPU are going to be necessary in a closed-box system constrained by 128-bit memory interface . In order to compensate for the latency, larger caches built into the RSX's pipelines will help keep things flowing in the RSX and between the RSX and CELL. If like in PCs the memory interface was 256-bit wide (or greater), this wouldn't be necessesary and more of the transistor budget would be allocated to core logic. But because it is important that the console GPUs take easily to a die shrink for cost reduction purposes (thus the 128-bit memory interface), this isn't the case.

IMO this will be the high-level feature set of the RSX:

-8 single issue vertex shader units
-24 dual issue pixel shader units
-larger than typical caches found in PC parts
-128-bit memory interface with GDDR3 memory
-128-bit interface with CELL
-logic that allows for the enabling of lockstepping between shader units and SPEs
-DMA controller
-FlexIO
-550 Mhz internal clock speed
 
Titanio said:
Yeah, I've noticed this too. The worst offender I've seen lately is Test Drive.

It's probably an issue with main memory bandwidth - filtering is taking multiple texture samples for each pixel, which requires increasing amounts of bandwidth depending on the sampling level, and it may be that things are typically too tight on the main pipe to allow for higher levels of filtering. Others here may have a better explanation, but that's all I can come up with for now.

In addition, It's UMA memory means that XeCPU has to share B/W with Xenos. The 1MB cache for XeCPU for 6 threads across 3 cores seems to little to me and a likely cause for cache thrashing. Increased cache misses will also require more external B/W to be consumed. This would add a degree of unpredictability to B/W consumption that Xenos would contend for...
 
ROG27 said:
...
IMO this will be the high-level feature set of the RSX:

-8 single issue vertex shader units

G7x VS units are already dual issue.

-24 dual issue pixel shader units

G7x PS units are classified as 5 issue by NV (includes 16bit normalise).

-128-bit interface with CELL

If you're referring to the FlexIO interface, then IIRC, it can clock very high and each lane ~ 8bit. I.e.

35 GB/sec implies 7 lanes of 5 GB/sec each, i.e. 4 lanes (20 GB/sec) outbound and 3 lanes (15 GB/sec) in bound.

7 lanes x 8bit ~ 56 Bit wide FlexIO would be my guess...
 
Jaws said:
The 1MB cache for XeCPU for 6 threads across 3 cores seems to little to me and a likely cause for cache thrashing.
"Average" 170KB of L2 cache per thread, plus 16KB instruction and 16KB data cache = 202KB of cache per thread, with support for data to be read from memory direct into L1 without consuming L2, and for data to be written direct into L2 (aka cache locking) and be consumed by the GPU without touching memory.

The L2 cache is 8-way, so some threads will have more while others have less.

If the XB360 cache model was a naive as you paint it, then maybe it would be in trouble.

Jawed
 
Back
Top