End of Cell for IBM

Discussion in 'Console Industry' started by Butta, Nov 20, 2009.

  1. JardeL

    Regular

    Joined:
    Aug 8, 2006
    Messages:
    545
    Likes Received:
    2
    Location:
    Istanbul
    ...
     
  2. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    That's more interesting. So they're going to update Cell significantly I assumed to keep up with the competition. I mean if I remember right the 32i was going to have improve SPU. So I guess IBM figure 32i iteration of Cell isn't going to cut it.
     
  3. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Another possibility is that IBM call the PPU and the its POWER PC architecture the "core" of the cell technology :lol:
     
  4. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    The SPUs are both the strength and the weakness of CELL. Strength in that it is what gives CELL its extraordinary computing density. Weakness in that it makes CELL CPUs impossibly hard to virtualize, limiting them to a single user (and single application !!) environment, - fine for game console and HPC, but useless everywhere else.

    There isn't a lot of room for CELL designers to maneuvre. The size and latency of the local store is effectively part of the architecture spec now, that is, fixed.

    Cheers
     
  5. Weaste

    Newcomer

    Joined:
    Nov 13, 2007
    Messages:
    175
    Likes Received:
    0
    Location:
    Castellon de la Plana
    And can never be changed or updated?
     
  6. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    The local store can't be decreased in size, since that would break existing programs.

    It could be increased, but that would increase latency as well, making programs slower, especially existing ones, which expect a six cycle latency. Also an increase has zero benefit on existing programs (unlike caches).

    The latency of the local store is a function of its size. Lowering it is out of the question, signal propagation delays increase with smaller geometry, increasing latency is bad for existing code because code is statically scheduled by the compiler (or manually by the coder, *ugh*) to deal with the six cycle latency.

    Cheers
     
  7. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    17,682
    Likes Received:
    1,200
    Location:
    Maastricht, The Netherlands
    Honest question - is it theoretically impossible to increase the local store without increasing latency? Isn't it possible to, say, use faster memory to compensate?
     
  8. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    OK, pardon my ignorance, but how is it then possible that the first level caches have kept increasing in size while maintaining the same low latency?
     
  9. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    17,682
    Likes Received:
    1,200
    Location:
    Maastricht, The Netherlands
    By the way, the title of this topic should be amended, a question-mark should be added at least. 'End of Cell for IBM?'
     
  10. Vitaly Vidmirov

    Newcomer

    Joined:
    Jul 9, 2007
    Messages:
    108
    Likes Received:
    10
    Location:
    Russia
    Nonsense. PS3 gameOS / linux are virtualized (including SPUs of course).
    Backing SPU state is pretty easy.

    Increasing? Last time Intel bumped L1 cache (32KB) in their CPUs was 7 years ago. Amd live with 64KB for over a decade.
    And do you know any examples of 256kb L1 cache? =)
     
    #130 Vitaly Vidmirov, Nov 24, 2009
    Last edited by a moderator: Nov 24, 2009
  11. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    127
    Location:
    Taiwan
    There were some CPU in ancient time, named PA-RISC, which have huge L1 cache (1MB data + 0.5MB instruction). Some even boast 1 cycle access latency. Though their clock frequency are not fast compared to other CPU of their generation. :)
     
  12. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    Easy as in "few lines of code", yes. Easy as in "only takes a few cycles", no.

    Cheers
     
    #132 Gubbi, Nov 24, 2009
    Last edited by a moderator: Nov 24, 2009
  13. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    So they have panned out, is that due to effects Gubbi refered to?

    That was never the question. BTW accessing the LS does not evolve any table lookups before accessing any position as well so I am not really impressed by the 6 cycle latency considering L1 caches do quite a bit of logic in less cycles.
     
  14. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    All modern caches are virtually indexed. That means that the virtual address is used to initiate the load from the cache array immidiately. Tags are checked in parallel and used one or a few cycles later to determine if the data found (if any) is a hit or not.

    Cheers
     
  15. Vitaly Vidmirov

    Newcomer

    Joined:
    Jul 9, 2007
    Messages:
    108
    Likes Received:
    10
    Location:
    Russia
    Since when virtualized context switching takes "a few cycles"? Does it require L2 cache flush?
    90us or so to backup SPU context is OK. That's ~5K switches (save+restore) per second.
     
  16. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Dead again, per Ars :lol:

    The authors usually respond to comments regarding factuality but none have commented on the links about IBM clarifying. Maybe Cell is dead because Ars wants it dead (conspiracy and all!) Anyhow, their take is always interesting overview.
     
  17. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    As I was mentioning earlier in the thread, signal processing is one of Cell's strong suits. The article itself mentions the tasks for the cluster:

    As for the supposed speed delta between a potential GPU solution, I think it has more to do with the project being green-lighted in 2008. I think if an institution were to begin today to evaluate various architectures, GPGPU would look stronger than even a year ago simply due to OpenCL, increased FLops, and increased DP per card. Not that Cell hasn't recently come under the OpenCL fold as well of course.

    Even today if we look at the price of a Firestream 9270 and consider the host system needs, on a pure cost basis ~3 PS3's (3 nodes) per cost of a single card I think would still make it a worthwhile choice in certain situations.
     
  18. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    I think Jon's been off on Cell since day 1 personally... he never seemed to 'get' it. Even that article painting the picture as if IBM 'sold' Sony on Cell reflects a warped understanding of the chips origins, since IBM had to be dragged to the SPE party essentially.

    But I do think that as a branch, Cell's ball will be picked up by a different architecture over at IBM. The Driverheaven article is just the most positive spin on what is essentially the same non-denial denial out of David Turek that all these sites are working with.

    I've said before in other threads but I find it a bit ironic that the greatest beneficiary of the architecture may ultimately be IBM, who of course wanted something more 'standard' at the outset. Cell has given them a position quick off the line in the world of many-core architectures, the supporting tools, and plain old experience/R&D. Whatever comes next for them, I'm hoping for an interesting chip.

    As an aside, I think the HPCWire article linked within Jon's makes a great sort of "memories of Cell" piece:

    http://www.hpcwire.com/features/Will-Roadrunner-Be-the-Cells-Last-Hurrah-66707892.html

    It's interesting to note that the top 6 supercomputers on the Green 500 are all Cell-based systems.

    http://www.green500.org/lists/2009/11/top/list.php
     
  19. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Not to derail, but does anyone have a relative size comparison between a single Larrabee core and an SPE? What is going to be the cost of taking a more traditional core (sans OOOe) with cache, latching on a honking vector unit, compared to a clean slate SPE design?

    Anyhow, it will be interesting to see how chip communication matures. You don't hear a lot of complaints about Cell in this regards.
     
  20. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    A normal processors explicit context is on the order of half to one kilobyte. The SPUs context is almost three decimal orders of magnitude larger.

    It's ok because CELL is used in environments where you on average run only one application (a game) at a time.

    For reference, on bog standard Linux switching time is less than one microsecond.

    A typical server has around 1-2000 context switches per second per core but can go much higher (eg. Citrix servers do). Your typical Vista desktop has around 1-4000 switches/second at any given time just web browsning, listening to streaming music, etc.

    One thing is the amount of time used, another is bandwith, 2000 switches per SPU per second times 512 KB per context switch (256KB out, 256KB in) means you would spend 7GB/s bandwidth just for context switches.

    Cheers
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...