Looking for feedback: SPU Shaders

Discussion in 'CellPerformance@B3D' started by Mike Acton, Sep 26, 2007.

  1. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    About SPURS Arwin, Mike does not seem to be loving SPURS, but I got a more "hopeful" vibe from his words... well maybe not about SPURS on PS3, but on future hardware maybe...

    That while it might be overkill for a 6 SPE's CELL BE (dedicated to games) it is not a bad solution if you think about it as a forward looking one when you might have about 4-5x the SPE's to manage and an evolved SPURS might start to be much more appealing.

    I did not get the idea that SPURS sucks, but rather that it could be useful to manage a larger sea of SPU's, but that you can achieve better performance optimizing for the relatively low number of SPE's in the actual PS3 system.

    Mike, if PS4 had 32 SPU's, say 30 available full-time to the developers, would you be re-evaluating SPURS versus SPU Shaders ? Would you envision some particular little touches done to SPURS on such a system ?
     
  2. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    I was chewing on the presentation. This is my understanding so far... I am still not too sure yet.

    It seems that SPURS the implementation (i) provided too much abstraction, and (ii) was not mature enough (e.g., originally only one instance was allowed ?). (ii) may have been addressed but (i) is a fundamental shift from the original design principle of Cell.

    Even for managing/scheduling a large number of cores, there will likely be non-trivial hit because we may need to implement additional software layer (extra overhead) to mitigate the dynamism/unpredictability introduced by SPURS (e.g., software cache, keeping track of the context info like who scheduled me and what are the current policies/states/...).

    The concept of a job manager is not wrong (and would really help to manage a large number of cores for some problem classes). But SPURS seems to have extra baggages at the single core level, and the problem will compound with more cores. I think the Insomniac guys' misgiving about SPURS is that as a middleware, it did not exploit the architectural advantage. Instead it took some of them away. [EDIT: I think the current incarnation of SPURS is also somewhat associated with super fast culling. It has values and can be quick (See Uncharted). May be some developers used SPURS (everywhere) without thinking through the subtle issues properly ?]

    OTOH, SPU Shader is very very specific. I think the goal is to handle all the similar data operations (and packed data) together at one shot to maximize the Cell advantage. These lumped operations may contain data and logic from different parts of the application, but they are carefully slotted and executed together (by design) to exploit the h/w advantages and meet the near real-time schedule of a game.

    However I wonder about the necessary supporting infrastructure/framework for the SPU Shaders. If everything is custom, how much complexity are we looking at in practice ? How much time to do this ? What level of expertise ? How long did it take Insomniac to preach this to the engineering team at large, etc. The slides are not very rich in the "Disadvantages" category. May be it's really very good, but I wish Mike can beef up the "Caveats" and "Case Studies" slides (more details). Then, I feel it would be an even more compelling _and_ persuasive read.

    The other item on my wishlist is "potential parallelism". How much parallelism is still possible ? A block diagram roadmap of how SPU Shaders can be applied to parallelize a complete game would be interesting too. But that may be too much to share I guess :(
     
    #22 patsu, Feb 5, 2008
    Last edited by a moderator: Feb 5, 2008
  3. jonathan

    Newcomer

    Joined:
    Jan 31, 2008
    Messages:
    3
    Likes Received:
    0
    I just noticed the new presentation up on the R&D page[1] - thanks!

    The shader model presented is an approach that has been around for a while (in one form or another - see [2]) but it seems to make a lot of sense for SPU processing. It's a pity that the actual implementation is rather clumsy at the moment in terms of the compile/disassemble/embed process.


    [1] Any chance the RSS feed can be fixed?
    [2] http://domino.watson.ibm.com/tchjr/journalindex.nsf/0/cde711e5ad6786e485256bfa00685a03?OpenDocument
     
  4. bknafla

    Newcomer

    Joined:
    Feb 6, 2008
    Messages:
    2
    Likes Received:
    0
    My interpretation of the slides is that SPU shaders should be scalable to any number of SPUs available. A shader can't really assume how it is run other than by the interface it has to adhere to and therefore the context info that are handed to it through the parameters. Therefore you can think of it as a kernel in a stream processing environment (which it is) or a shaders for a GPU - if one system is structured to use more than one SPU or even all SPUs available if could possibly spawn a SPU shader instance on each SPU while partitioning the data between the shader instances. Just like it is done on a GPU for fragment shaders.

    What SPU shaders don't tell you is how many SPUs could be used by the system - but this is completely outside the design of shaders to keep the simple.

    Thank you Insomniac and Mr. Acton for these great slides and the willingness to share your knowledge and insights!

    Though I have one question: what does "ea" mean on the slides (data_ea or frags_ea etc.)?

    Cheers,
    Bjoern
     
  5. archangelmorph

    Veteran

    Joined:
    Jun 19, 2006
    Messages:
    1,551
    Likes Received:
    11
    Location:
    London
    That's exactly my feeling too & i'd certainly like to see more in formation on this..
     
  6. Mike Acton

    Mike Acton CellPerformance
    Newcomer

    Joined:
    Jun 6, 2006
    Messages:
    47
    Likes Received:
    2
    Location:
    Burbank, CA
    Exactly so. We certainly could have called them "SPU processing kernels", but "kernel" is as overloaded as "shader" (and could just as easily mean the core bit of code that controls the system. i.e. the opposite of a "shader").

    Yes, how many SPUs might call a particular shader or set of shaders is outside the scope of their design. It completely depends on how many SPUs are running the system that calls any particular set of shaders.

    "ea" = "effective address"

    i.e. The address of the memory that you're talking about in the mapped address space. Usually main RAM, but it's not necessarily so (it could be referring to memory on another SPU, for instance.)

    Mike.
     
  7. bknafla

    Newcomer

    Joined:
    Feb 6, 2008
    Messages:
    2
    Likes Received:
    0
    Quoted text originally posted by Mike Acton:
    Ah, thanks!


    Do you allow multiple shaders per "shader slot" in a system? The way you control the dataflow into and out of a shaders seems to allow for this quite easily.

    You confirmed that a system could distribute shaders over many SPUs. I don't know much (nearly nothing) about the PS3/cell memory system but started to wonder how you coordinate memory accesses by a shader running on different SPUs. You explicitly hand in the DMA-access functions - do you collect DMA memory function calls from different shader instances (one shader on different SPUs) to coalesce memory access/transfer and then split the memory to flow to the designated shader?
    This might introduce coupling in form of a shader synchronization point and therefore a bottleneck but might help memory transfer speeds. Or isn't such memory coalescing without an advantage on the Cell-processor?

    Hm, the longer I think about this I suspect that you wouldn't try to auto-organize memory transfer. The easiest system would be to allow completely independent memory (DMA) accesses by the shaders. If memory coalescing is profiled and measured to be advantageous the whole philosophy behind the shaders indicates that you would build the memory access/transfer into the system and then to specialize and split the shaders. No auto-adapting system would be needed, the system has full control about what is going on and the shaders can be kept even simpler.
    Am I overcomplicating or totally of the mark?

    I really look forward to the information published at GDC to learn more about the SPU shaders, how many shaders are currently in use, and the typical place in a system where shaders are introduced, and the tasks shaders are injecting into the system, and how many shaders really need their own DMA memory access.

    Cheers,
    Bjoern
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...