Is AF a bottleneck for Xenos?

Discussion in 'Console Technology' started by tema, Mar 11, 2006.

  1. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    I don't think I need to be educated by you about xenos or anisotropic filtering, it was just a joke, deal with it and calm down, it's just a gpu!
     
  2. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    I'm aware of what it was nAo. And I'm perfectly calm, thanks.
     
  3. scificube

    Regular

    Joined:
    Feb 9, 2005
    Messages:
    836
    Likes Received:
    9
    I'm wondering if it's a bandwidth issue.

    Just a hypothetical...

    If Xenon consumed 5GB/s on the average from main memory that would leave 17.4GB/s available to Xenos. An 1800 XT has 48GB/s bandwidth available to it. In an attempt to normalise Xenos's eDram usage for FSAA, apha-blending, etc. I'll work with percentages I've seen tossed around from time to time as to what the eDram saves on bandwidth consumption. Something between 30-40% so I'll go with 35%.

    48GB/s x .35 = 16.8 GB/s the eDram saves approximately.

    48Gb/s - 16.8GB/s = 31.2 GB/s

    31.2Gb/s would be where the 1800 XT would sit if it had Xenos's eDram. An ATI fellow, Micheal Doggett/D-something, a while back noted Xenos is comparable to the X1800 XT in power so that's why I used it for a baseline of comparison. Xenos has 22.4GB/s bandwidth available which would be 8.8GB/s short of what it takes to feed a GPU of comparable power. If we take into account Xenon's bandwidth consumption with what I think is a modest 5GB/s that would leave Xenos with around 17.4GB/s. This would be 13.8GB/s short of the bandwidth available to a GPU of comparable power.

    If we say the eDram can be normalized to 50% of the total bandwidth consumption that would save 24GB/s for the X1800 XT while leaving a need for another 24GB/s. Still suggesting Xenon consumes 5GB/s from main memory this leaves Xenos needing 6.6GB/s to be as fed as readily as the X1800XT.

    From what I understand both Xenos and the X1800XT have 16 TMUs. Both utilize hyper-threading, but the X1800XT's is a little finer grained and has a more advanced memory subsystem in it's ring bus. IIRC the TMUs are decoupled in both GPUs. Dave's article noted the AF has been improved over previous generations in Xenos. I would take that to mean the Xenos and R520 over R420/R300 etc as I think both use the same TMUs.

    All things being equal it would suggest to me that this would be a bandwidth issue if an issue at all other than early games going through their growing pains. Perhaps, the extra samples AF requires is problematic on top of those needed for pixel shader etc and will require better stewardship of resources if it is to be used in conjunction with everything else.

    I think I was generous with how much value I gave Xenos's eDram and conservative in how much bandwidth Xenon would consume in the system. If anyone would be so kind please correct me if I've been unfair.
     
    #63 scificube, Mar 11, 2006
    Last edited by a moderator: Mar 11, 2006
  4. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    To be fair this is not exactly MS' problem.
    The tools are/were there, you just have to turn it on and deal with the cost. Sames true for PS3/X360.

    What doesn't seem like a significant cost on a PC can be a dramatic cost on a console. PC's are rarely pushing the envelope graphically and the devs usually just let the users decide. On a console you pick a framerate and you trade things off to make it work.

    Devs are making decisions to trade off polygon counts, texture layers, shader complexity, rendering features etc etc etc.

    On Xbox we used to selectively set the aniso level based on the shader, for the subset of geometry we actually aniso filtered it cost about 5% of a frame, IMO that was a reasonable tradeoff.

    To put that in perspective I've rewritten a rendering engine to save less than that.

    It should be noted that it's also something easy to turn off , and if you are having performance issues and don't have time to track down what they are, it's an easy switch to throw to ship your game.

    I haven't sat down and benchmarked Xenos in any meaningful way, but from my understanding of it's architecture whether it's a significant cost depends on shader complexity (number of ALU to Tex ops) texture formats in use, LOD bias etc etc. These are the same things that dictate it on PC cards.

    The texture cache can clearly thrash as can any cache, but that's unlikely to be an issue on simple regular texture fetches, unless you have a LOT of source textures or your using innefficient source formats. Start using a big noise texture to randomly dereference a large second texture and I've yet to see any texture cache do anything but thrash.

    I've never heard anyone complain about the excessive aniso cost on Xenos.
     
  5. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    I'm still scratching my head on this one. I don't recall reading anything even remotely close to this about Xenos. Xenos hardware was in development for 2 years or more and meant to serve inside a closed box for 4-5 years, so if there is a single gpu out there that is least likely to house any major design errors, its Xenos.
     
    #65 Luminescent, Mar 11, 2006
    Last edited by a moderator: Mar 11, 2006
  6. Inane_Dork

    Inane_Dork Rebmem Roines
    Veteran

    Joined:
    Sep 14, 2004
    Messages:
    1,987
    Likes Received:
    46
    Ah, well, there goes that hypothesis. Thanks for the info.

    Oh, I'm quite well aware of that. And I know that AF is unlikely to see heavy use till developers get a handle on the system. I was just generating technical reasons why AF is not common on the 360 yet. Heck, I don't even know if that's true or not.



    There was this technology/engine demo from a Japanese developer that claimed their main rendering shader ran over the texture cache, IIRC. They moved shadow computation to another pass to get performance up. It was a quirky little demo, as I recall. Maybe someone remembers the specifics.
     
    #66 Inane_Dork, Mar 11, 2006
    Last edited by a moderator: Mar 11, 2006
  7. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Well maybe they should turn it on more often then :p

    Seriously, there is a bit of an unfortunate trend emerging, and people are beginning to notice (and complain) even if developers aren't.

    I get the point though, that something doesn't need to be excessively expensive to be a non-runner.
     
  8. Lysander

    Regular

    Joined:
    Sep 3, 2005
    Messages:
    532
    Likes Received:
    5
    Ridiculuos.
    Christian Allen talks about mp
    http://xbox360.ign.com/articles/687/687911p1.html

    mp uses at least 2aa (while sp 4aa); 720 mp
    http://media.teamxbox.com/games/ss/1157/full-res/1140835044.jpg
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Would "geometry" road markings have aliased edges? If so, shouldn't that be an easy give-away?

    Jawed
     
  10. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,455
    Location:
    Budapest, Hungary
    Pardon me, but... WTF?? The EDRAM practically removes all the framebuffer traffic, as it only takes a few hundred megabytes per second to copy the tiles into the front buffer. And framebuffer traffic is quite a lot more than 35% of the total bandwith use in any system... so the rest of your post is, unfortunately, quite worthless.
     
  11. scificube

    Regular

    Joined:
    Feb 9, 2005
    Messages:
    836
    Likes Received:
    9
    I also bumped it up to 50% and I also asked for correction if I were wrong.

    I like to talk about things and I'm here to learn. You seem offended. I don't know why, but if so cool out cause I don't know what's wrong with talking about these things.

    I never even got into predicated tiling nor did I suggest it would cause for great bandwidth consumption and I'm well aware it's a negligible hit on the bandwidth available.

    You know you could try to let the folks know what's going on instead of being condescending.
     
  12. Asher

    Regular

    Joined:
    Jul 1, 2005
    Messages:
    976
    Likes Received:
    10
    Location:
    Seattle, WA
    What is a more accurate number? 50%? 75%?
     
  13. Guilty Bystander

    Newcomer

    Joined:
    Sep 21, 2005
    Messages:
    101
    Likes Received:
    1
    Good for him Ã￾ have the final game running on my Xbox 360 right now.
    When you're crouched in the grass and you use the scope then GRAW only runs like 10-15fps.

    You're right that's why it's so odd there is no AF or only 2-4x AF cause bandwidth certainly isn't an issue.
    Maybe like Cell the Xenos is a little difficult to get into.
    The Xenos specs certainly aren't that bad.

    Xenos specs (out of the top of my head):
    256bit VPU R500 Fedos
    48 fragment unified Shader ALU's
    16 unfiltered and filtered textures
    8 ROPs
    SM3.0+ Shader technology
    8GTexel/s fillrate
    4GPixel/s fillrate
    10MB eDRAM with 256GB/s bandwidth used as a framebuffer for Z-testing, MSAA, HDR etc.

    There isn't anything I can see in the specs which should cause any problems.
    Maybe it's just in order to really utilise the Xenos power you need work with the Xenos specific features.
     
  14. expletive

    Veteran

    Joined:
    Jun 4, 2005
    Messages:
    3,592
    Likes Received:
    69
    Location:
    Bridgewater, NJ
    How similar is this description with turning on vsync? This is another trend i'm seeing in 360 games that i don't like(i think that was what Titanio is alluding too). I guess the main question is if AA and vsync are features that are particularly prohibitive on Xenos (because of edram or whatever) or just have very real tradeoffs in console development in general.

    If its the latter then i'm sure will see things imprvoe quite a bit in this regard given the improvement we've seen in just these launch-plus games and the comments from devs like bizarre, et al on how much better they could do things in their second wave of titles.
     
  15. Nemo80

    Banned

    Joined:
    Sep 5, 2005
    Messages:
    128
    Likes Received:
    3
  16. Lysander

    Regular

    Joined:
    Sep 3, 2005
    Messages:
    532
    Likes Received:
    5
    :roll: nevermind
     
  17. ROG27

    Regular

    Joined:
    Oct 27, 2005
    Messages:
    572
    Likes Received:
    4
    Larger caches and buffers in the GPU are going to be necessary in a closed-box system constrained by 128-bit memory interface . In order to compensate for the latency, larger caches built into the RSX's pipelines will help keep things flowing in the RSX and between the RSX and CELL. If like in PCs the memory interface was 256-bit wide (or greater), this wouldn't be necessesary and more of the transistor budget would be allocated to core logic. But because it is important that the console GPUs take easily to a die shrink for cost reduction purposes (thus the 128-bit memory interface), this isn't the case.

    IMO this will be the high-level feature set of the RSX:

    -8 single issue vertex shader units
    -24 dual issue pixel shader units
    -larger than typical caches found in PC parts
    -128-bit memory interface with GDDR3 memory
    -128-bit interface with CELL
    -logic that allows for the enabling of lockstepping between shader units and SPEs
    -DMA controller
    -FlexIO
    -550 Mhz internal clock speed
     
  18. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    In addition, It's UMA memory means that XeCPU has to share B/W with Xenos. The 1MB cache for XeCPU for 6 threads across 3 cores seems to little to me and a likely cause for cache thrashing. Increased cache misses will also require more external B/W to be consumed. This would add a degree of unpredictability to B/W consumption that Xenos would contend for...
     
  19. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    G7x VS units are already dual issue.

    G7x PS units are classified as 5 issue by NV (includes 16bit normalise).

    If you're referring to the FlexIO interface, then IIRC, it can clock very high and each lane ~ 8bit. I.e.

    35 GB/sec implies 7 lanes of 5 GB/sec each, i.e. 4 lanes (20 GB/sec) outbound and 3 lanes (15 GB/sec) in bound.

    7 lanes x 8bit ~ 56 Bit wide FlexIO would be my guess...
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    "Average" 170KB of L2 cache per thread, plus 16KB instruction and 16KB data cache = 202KB of cache per thread, with support for data to be read from memory direct into L1 without consuming L2, and for data to be written direct into L2 (aka cache locking) and be consumed by the GPU without touching memory.

    The L2 cache is 8-way, so some threads will have more while others have less.

    If the XB360 cache model was a naive as you paint it, then maybe it would be in trouble.

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...