ATI R500 patent for Xenon GPU?

Discussion in 'Console Technology' started by j^aws, Apr 2, 2005.

  1. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    and

    and

    and

    Multi-thread graphic processing system

    Sounds like the load balancing control for vertex and pixel threads for the ATI R500, aka Xenon GPU...?
     
  2. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,975
    Likes Received:
    79
    Location:
    20001
    maybe thats how they are implementing unified shaders... that arbiter will determine on the fly based on prioritization of the multiple threads its receiving, how many ALUs (out of 48 ) to dedicated to pixel shading versus vertex... makes the ratio of one to the other completely fluid and dynamic...

    Of course that is layperson understanding. :D
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    ^^ Yep, sounds like it...

    The ALU can receive either vertex or pixel threads from the arbiter...
     
  4. Inane_Dork

    Inane_Dork Rebmem Roines
    Veteran

    Joined:
    Sep 14, 2004
    Messages:
    1,987
    Likes Received:
    46
    Some thoughts (unordered):

    1) The interleaving of threads on an ALU was an interesting twist I did not expect. Anything to boost the total number of instructions executed per second, I guess.

    2) The introduction of multiple reservation stations makes things interesting. The obvious split is between pixel and vertex threads, but go further. Presume for a second that the X2 has eDRAM and this patent applies to the GPU in the X2. If one of the reservation stations is in memory, they might be able to do deferred tiled rendering. The point, of course, would be to only process vertices or pixels within the portion of the screen currently in the eDRAM. It's sort of a crazy idea, but the pieces fit together rather well.
     
  5. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Good find Jaws.

    Inane_Dork, interesting comments. The R400 was a radical new design, and the R500 is a console follup part. I think most of us have assumed the radical new concept was Unified Shaders... but maybe there is more? Right now it seems memory is a huge limitation. Sure, proceedural textures may help some, but we have a ton of high quality features (high res normal maps, HDRL, high levels of AA) that most of us expect Next Gen hardware to do, yet the memory bandwidth is just not there.

    Dave has hinted on some ideas about the R520 having some focus on memory issues. It would be really nice to get some real advancements into the new consoles, stuff that sets a high standard for years to come.

    Again Jaws, good info there. Hopefully the patent turns into something beneficial.
     
  6. one

    one Unruly Member
    Veteran

    Joined:
    Jul 26, 2004
    Messages:
    4,823
    Likes Received:
    153
    Location:
    Minato-ku, Tokyo
    Unified ALUs will be nice to save a silicon budget but it seems to work optimally only when the arbiter is intelligent enough. I assume you can directly use them in the manual configuration mode by assigning the fixed number of Pixel/Vertex Shader units, too.
     
  7. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    It would be nice to know how much bigger is a unified vertex/pixel ALU than a custom vertex and a pixel ALU. The arbiter(s) is(are) going to be more complex than before so even this will eat some more transistors.
    If this cost is cheap, say less than 5/10% of ALU die area, it could be a big win for ATI.
    We should also note that a unified ALU could have some tradeoff between area and speed, so even if it doesn't get that much bigger it could be slower in some areas..
     
  8. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Maybe someone can correct me, but in theory does not dynamic load balancing ith unified shaders help with frame rate stability? I would think if they are implimented well they would improve the minimum frame rate because because hiccups associated with PS and VS bottlenecks could be smoothed over. Being able to dynamically load balance at any time, I would believe, make the framerate much more stable in shader limited games.

    Another advantage is diversity--both in a single game and from game to game. Designers can deside to be pixel shader intensive or vertex shader intensive, or somewhere inbetween.

    I really am excited to see what ATi crunched into the R500. It will be neat to see how it performs.
     
  9. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    It may help in some cases but most of the time frate rate unstability is given by lack of fill rate (as with tons of huge transparent particles) or a sudden increase in vertex or pixel shading demand (without a decrease in the counterpart..)
    I bet edram is more effective in reducing frame rate hiccups than unified ALUs ;)
     
  10. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    I will take both please... and while you are at it, can you throw in some high performance TBR that Inane_Dork mentioned? ;)
     
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    What Inane_Dork mentioned could be used to defer rendering, but the point is why would you do that?
    If you're going after a TBDR you need to have very special care for all the information you'd need to save and re-use, so even if it could be done 'that' way it doesn't mean it would efficient. How much bandwith is needed to restore the thread state? it would be feasible to do that with external memory? Dunno..
    To save memory I'd prefer to split the viewport n times, render n viewports and do a final composite pass.
    Moreover once you have a big pool of edram and you have designed your GPU around that you already have a lot of advantages a TBDR has, like multisampling AA (almost) for free.
    Features that a TBDR can provide as no overdraw and automatic non-opaque fragments sorting would be nice to have but these things doesn't come for free once you have deferred the rendering phase ;)
     
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    If what I've been hearing is interpreted accuratly then unifed shading architectures (at the hardware level) are scheduled to go all the way to mobile phones, fairly soon (in mobile development terms) so I'm guessing the control unit isn't that sizeable.
     
  13. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    I wasn't referring to ALU die area increase due to the more complex control unit but due to instrinc inefficiency in a design that try to solve 2 similar but different problems.
    (obviously I'm talking about pixel and vertex shading here).
    In a couple of years or so I expect nvidia to have GPUs with more ALUs than ATI GPUs for a given transistors budget,
    with the ATI part beaing able to sustain a bigger IPC than the NVIDIA part that should sport more cores..
    It will be an interesting battle ;)
    As I already stated if ATI can master both problems without trading too much performance and die area they will win this battle, imho.
     
  14. Farid

    Farid Artist formely known as Vysez
    Veteran Subscriber

    Joined:
    Mar 22, 2004
    Messages:
    3,844
    Likes Received:
    108
    Location:
    Paris, France
    On the paper, and therefore theoretically speaking, Nvidia would have to screw something up in order to be beat in both front. If Nvidia do everything correctly, same thing for Ati, each one should have its own forté.
     
  15. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    What do you think those ALU issues are?

    In terms of ALU's, with both PS and VS being required to operate the same instructions (under a WGF2.0 or greater environment at least) which suggests that the only ALU differences between the two would be what you implement as native instructions and what you implement as being capable in macro's - what instructions would benefit more from one type of processing than another? Does it also mean that every ALU is necesarily exactly the same?

    David Kirk highlighted texturing demands as one issue, however looking at this it seems that this could be negated (as was indicated in the reply we sourced from ATI in asnwer to Kirks comments) as there is a separate texture and ALU command queues - this also seems alike a fairly neat way of avoiding texture latencies as the ALU operations (be they vertex or pixel) are being executed on the ALU pool whilst other commands are waiting for texture reads (the results of which can then just go back into the ALU command queue).
     
  16. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Operating on the same instruction doesn't mean having the same behaviour, or even the same implementation (see x86 ISA history..)
    Once I wrote in this forum that it seems nvidia want to have 2 basic designs: one to cope with high latency operations and another one to cope with low latency operations.
    Since I'm not a hardware guy I'm really guessing here..but I expect these 2 designs to be quite different if one try to push the envelope and starts to make assumption (this is what I do as a software guy..)

    That's not the case, of course.

    Well..even an uneducated guy like me proposed something like that 2 o 3 years ago in a previous iteration of this forum so it's nothing groundbreaking.
    The problem is..it sounds as a cool and relatively simple thing to do and if you look to nv40 pixel pipes nvidia is already doing something like that since one ALU out of two is used for texture sampling also, even if that design is not unified.
    Nvidia has even a patent that is quite similiar to the one posted in this thread but they have different functional units for vertices and pixels feeded by a 'central' threads manager.
    At the end of the game I don't thin future design by NVIDIA and ATI will be so different at functional level.
    Nvidia could even switch to a full unified design very quickly once they shading model is unified too and assuming to extend the pixel pipe ALUs to the vertices processing.(and primitives assembling too!)
    I say this with a reference to this patent:
    System and method for reserving and managing memory spaces in a memory resource
    ciao,
    Marco
     
  17. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,811
    Likes Received:
    478
    Id take a 10% worst case slow down over the slow down something like texture sampling in the vertex shader is going to cause in an archtitecture optimized for vertex shaders using only low latency streamed data.
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    It would mean less performance than what you can achieve in a pixel shader, not just poor performance tout court ;) (as it is in current nv40 design)
     
  19. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    It strikes me that the high latency instructions are those that are dealing with textures – these are what Kirk has singled out as well. This is addressed in this scenario.

    Is the texture ALU able to operate whilst with the texture latency? (i.e. still interleave instructions whilst waiting to address the texture) It strikes me that having the independent texture address processor such as the R300 model is more similar.

    Personally I’m beginning to think that NVIDIA’s reluctance to go to unified shaders is more driven by looking at a future involving multi-chip implementations than anything else.
     
  20. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Yeah..I was speaking about textures sampling, it was so clear to me I forgot to mention it.
    AFAIK yes.
    Who cares if it's more or less similar if they're functionally doing the same thing and achieving the same results?
    I don't want to blame nvidia if their architecture is not to similar to what propose and design another ihv ;) (if they achieve the same results, and at this time ATI and NVIDIA seem to be quit on par on many fronts, imho)
    Oh well..do you mean what if nvidia has an IC dedicated to geometry and another one dedicated to pixels?!
    Where did you get this idea about multichip impementations?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...