The SPE as general purpose processor

Discussion in 'Console Technology' started by Frank, Mar 7, 2006.

  1. danteye

    Newcomer

    Joined:
    May 25, 2005
    Messages:
    33
    Likes Received:
    0
    ok i understand! thanks you!
     
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,980
    Location:
    Under my bridge
    Nope. I have here a processor that runs at 3 GHz, with 32 'ADD' units that add, can only add, and that's all this processor does. It performs 96 Billion integer instructions per second. Is it good at integer work?

    Others have explained well the entirety of Integer work's demands, but I felt the obvious illustration worth adding to show the flaw in your conclusion.
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,939
    Likes Received:
    42
    You have a processor with a crippled ISA.

    SPUs and PPEs have a full ISA which should be obvious.

    Edge's point was clearly that SPUs didn't suck at integer MATH.
     
  4. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,586
    Likes Received:
    981
    Not clear at all.

    Only his last post detailed integer arithmetic.

    Before that he just talked about the SPUs not having inferior integer performance.

    Cheers
     
  5. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,939
    Likes Received:
    42
    Well the fact that he mentioned integer "instruction" was a dead giveaway as referring to math instructions. Maybe I've seen this too often on this forum for it to be obvious...
     
  6. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,586
    Likes Received:
    981
    Could very well be that he meant arithmetic, my mistake then. But I was certainly mislead by him mentioning "integer performance" and "integer work" in the above posts then, which traditionally has a different meaning (see above) when you discuss in an integer vs floating point context.

    Cheers
     
  7. DarkRage

    Newcomer

    Joined:
    Jul 25, 2005
    Messages:
    70
    Likes Received:
    1
    Location:
    Spain
    Come on, you are counting integer operations within the vector capabilities as integer general purpose capabilities.

    They are not.

    Again, something as usual as a = b[i+1]+c[i+2] is described by IBM as a significant number of operations needed, together with performance penalties, and you are using the full vector for it, with the other 3 32-bits "ALUs" in the vector doing useless work.
     
  8. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,939
    Likes Received:
    42


    Well what do you expect? SPUs are unified scalar/vector processors. If it's doing scalar work, then it can't be doing vector work and vice versa... this is why you have so many on a die...
     
  9. add n to (x)

    Newcomer

    Joined:
    Dec 5, 2004
    Messages:
    13
    Likes Received:
    1
    Location:
    London, United Kingdom
    Ok, so you're talking about regular C-like syntax, such as addressing a scalar array of values.

    Now it's true that the SPEs can only load and store quadwords (128-bits) at a time, so accessing individual scalar values will introduce some overhead:

    Loading a scalar value from local store will cause the entire quadword that contains that value to be loaded, and a rotation in order to put the value into the "preferred slot" within the register (so that's one additional instruction).

    Writing to a scalar value in local store will introduce a couple of extra instructions, since the quadword you're writing to has to be loaded, the value inserted into the correct place within the quadword and then written back to memory (that's an additional two instructions).

    Now three extra instructions isn't exactly what I'd call "significant". And with the large (128) register file, general variables like indices, loop counters etc. aren't flushed to memory very often.
     
  10. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,586
    Likes Received:
    981
    Fundamentally it is kind of pointless to argue whether or not a SPU can be a stand alone/general purpose CPU. It is not intended to work like that, it is very clearly intended to crunch through regular sized chunks of data at great speed, and it is very good at that.

    While it can act (almost, it only has a slave MMU, it cannot set up it's own translation tables) completely as a fully fledged CPU it is deficient (feature wise and therefore performance wise) in a whole bunch of ways to ever make it useful as a GP CPU.

    I think you're right when you say that converting scalar stores into read-modify-writes is insignificant, I too consider this to be the least of its deficiencies. The archaic memory model (complete with lack of automatic memory coherence) is what ultimately renders the SPU braindead as a general purpose CPU.

    Cheers
     
    #50 Gubbi, Mar 10, 2006
    Last edited by a moderator: Mar 10, 2006
  11. PeterT

    Regular

    Joined:
    May 14, 2002
    Messages:
    702
    Likes Received:
    14
    Location:
    Austria
    Has there actually been a consensus in the earlier thread about what "general purpose" is?
    And if so, are there any general purpose tasks that really require all that much performance?

    I agree with you points in principle for a very specific definition of "general purpose" - I just don't see how that kind of work is very relevant at all when discussing SPEs in a console setting. So, this thread's main point of contention may be interesting technically, but it has little bearing on PS3.
     
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,980
    Location:
    Under my bridge
    That wasn't clear to me. Being good at integer math doesn't really solve the idea that SPE's are good at 'integer workloads' either as it doesn't take into account the other non-maths functions, which is the crux of the discussion, though I think it took a bit of a detour there with someone doubting SPE's int-math performance.

    It's one of those subjects that really can only be solved iwth benchmarks I think. Take a useful real-world routine that you want the integer performance for from a normal processor and port it verbatim for SPE, with maybe some memory management beyond pure on-demand fetching. There's a software cache solution IIRC that could be employed. Then see how well SPE copes with branching, random memory accessing compared with the PPE and see if it is fast enough to be useable or that slow as to be a 'last resort'.
     
  13. add n to (x)

    Newcomer

    Joined:
    Dec 5, 2004
    Messages:
    13
    Likes Received:
    1
    Location:
    London, United Kingdom
    Is the SPE a "general-purpose-processor" in the same sense that the PPE or Xenon CPU or a P4 or whatever is? No, of course not. Those were designed to simplify the writing of code for them by providing features like MMU, caches etc. Some of them implement more advanced features such as OOOE & branch-prediction in order to speed things up even more. The SPEs were designed to give maximum performance on a particular set of workloads with the minimum number of transistors. But they can also run "general-purpose-code" (whatever that is) to a certain extent. Yes, you have to manage the local store yourself instead of relying on cache, but for a lot of tasks they're surprisingly quick even given pretty poor C code (in SPE terms anyway). Calling them braindead is unjustified in my opinion.

    I'll stop now before I get myself into trouble ;)
     
    #53 add n to (x), Mar 10, 2006
    Last edited by a moderator: Mar 10, 2006
  14. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,980
    Location:
    Under my bridge
    Not that I've seen ;). Perhaps just a case on non-maths-processing counts? Bitwise transformations, comparisons, load/stores, that all have nothing to do with adding and multiplying (and all combinations thereof) numbers count as general purpose computing.

    So I guess a processor workload can be divided into
    Floating Point Math - Ability to Add+Mul etc. decimal values
    Integer Math - Ability to Add+Mull etc. integer values
    General Purpose - Everything else

    'Integer Performance' is maybe a catchall for everything not Floating Point Math related. AFAIK the term was coined by MS in response to Cell, as it not? Has it been used before then? They certainly went to no effort to define the term!

    Perhaps an example of 'general purpose' performance would be a bubble sort? That's all load/store/compare. Write a bubble sort to sort 500 Kb of data (greater than LS) in PPE and SPE and see how they perform. Would that be a fair comparison?
     
  15. deathkiller

    Newcomer

    Joined:
    Jul 24, 2005
    Messages:
    186
    Likes Received:
    4
    No, because PPE have 512KB L2 cache :wink:.

    Off-topic: I have found the TRE Demo Movie http://www.kevinevans.net/ibmcell/tre_demo_movie.html old?
     
  16. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Integer Performance usuall refers to performance in none FP situations, and it long predates MS as a term. In th old days integer math was the limiting factor, these days it's basically free.

    A bubble sort would be a trivial test. But you really need to be running a large application to get a good picture, a lot of general perfromance is dictated by cache architecture, and the processors ability to hide Load/Store and instruction latencies. The Majority of an applications code does nothing more than shuffle data and a lot of the data is generally not ideally structured for the cache. The execution time becomes dominated by the cache misses. This is why intel and AMD have invested so much of there R&D in improving there cache. It's OK to say things like well restructure the data so it's more cache friendly but in the real world it's often not practical.
     
  17. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,939
    Likes Received:
    42
    Well let me explain why it should've. Firstly he was talking about integer instructions per second which implied maths. Secondly, he only used 1 instruction per cycle per SPU, even though the SPUs are dual issue. Thirdly I even subsequently derived these numbers... I've probably seen it too many times for it to be obvious though...

    I think for the sake of confusion, integer instructions should mean integer maths, especially on a 3D site. And FP instructions for FP maths. I think Gubbi summarised it well earlier.

    Devs will be experimenting with different algorithms and what works best. Hopefully in the next few years, we'll see the fruits of that labour...
     
    #57 j^aws, Mar 10, 2006
    Last edited by a moderator: Mar 10, 2006
  18. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,980
    Location:
    Under my bridge
    For sure, SPE's general purpose computing performance isn't really an issue because that's not what they're going to be used for. It'd still be nice to have a comparison with a conventional processor just to know how well SPE's can cope, and how much of a reason there is for a PPE over another SPE or two, which is the point of this thread after all!
     
  19. Edge

    Regular

    Joined:
    Apr 26, 2002
    Messages:
    613
    Likes Received:
    10
    You're just being silly. I can drag all kinds of specialized processors in for the sake of endless argument.

    You stick with your 32 ADD unit processor for your next console, and I will stick with CELL.
     
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,980
    Location:
    Under my bridge
    Yep :p. Just saying that all the Int operations in the world doesn't mean good performance if they're not meaningful, useful operations. A processor capable of a trillion int ops per second is not a good int perform if only 20,000 of those ops can load data in registers where they are needed.

    I was mixing Int Performance with General Processing though, so if you meant it as just maths, and we know Cell has a full Int maths ISA, your reasoning was fair and I just didn't follow the plot very well!
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...