Design your own Cell

Discussion in 'Console Industry' started by semitope, Nov 26, 2009.

  1. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    OMG Acert, that paragraph complete with bolded letters is like maximum eye roll! :razz:

    nAo's architectural/instructions improvement list aside, my own entry into the mix in this thread would be a pure 'mild' evolution. On 32nm HKMG as I proposed, I'd say 2 Cells essentially on a single die - with the PPE cores evolved to more robust/functional entities, and the SPEs being of the PowerXCell variety by default (minimum logic increase vs the originals to begin with). Assuming density and thermal gains come in line with the hopes for the new process, I'm pegging this 'Cell Evo' chip at ~150mm^2 with thermals at roughly the present 45nm SOI Cell's... maybe hopefully even below. Clockspeeds I'll leave at 3.2 GHz in consideration of the beefier PPE replacements, memory controller of course updated... and here I think we have a decent chip to serve as CPU in a console. Yes, in this vision the CPU and GPU remain separate once again, but with the additional SPE power I do envision a smart design having a GPU more specifically tailored to the environment.

    Now... so with this ~150mm^2 Cell Evo (2Power, 16 SPE), for IBM's HPC purposes I see where the new PPE replacement cores are sufficiently up to the task where blades like the QS series no longer require outboard Opterons, for instance, to coordinate workloads. IBM is freed to pursue more 'pure' Cell-based design options, the unification of the SP and DP Cell variants ensures that all Cells are commodity Cells in terms of internal costs, and to make up for the focus on small die size and low thermals, IBM can of course go for glued-die chips, or just straight up MCMs to compete with more monolithic competitors. The great scaling should favor a many-chip environment for Cell quite well in HPC, so I don't view it as having to compete with 500mm^2 chips to offer benefits in certain environments.
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Is it just me, or did you just ask for AMD's fusion chip?:wink: Or even larrabee fused with a nehalem core on the same die? :lol:
     
  3. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    194
    Location:
    Stateless
    If a new Cell is to be developed it will be most likely by IBM so it has to fit their needs. Basically I can see Larrabee (if it turns out well, which in regard of Intel work force is a matter of time) as the biggest threat for them. They could pass on the graphical side of things and focus on compute only.
    Low power consumption is much wanted in HPC market but I feel like IBM could afford a higher TDP than the Cell one and still be competitive.
    I could see the Xenon has a good start not really for the core but the memory hierarchy. You have registers, L1 and L2 but actually the L2 works as what would be an L3 in most actual CPU/X86 designs. It runs at half the speed, it's shared. The interesting part for me is that you can design "grapes" and I feel that it would be easier to handle coherency and feed for example 8 "grapes" of four cores than 32 cores with their local subset of L2 (could be completely wrong tho...).
    IBM could:
    Design a shorter pipeline CPU adapt clock speed if needed (in regard to power consumption/TDP).
    Design it properly, lots of effort have been put in SPU, I'm not sure we can say the same about PX/PPU.
    Design brand new 256 SIMD unit supporting integers and floats (why not based on SPU ISA but a super set of altivex sounds better or at least able to run in compatibility mode with degraded perfs).
    Hyper threading support for the SIMD units (2 or 4 threads)
    Reduce the number of registers to avoid the chip to over grow (say 64 if 2 hardware threads / 32 if 4)
    Fix LHS.
    Have really good pre-fetching capability
    Have the L2 running at the same clock speed as the chip.
    Greater control on cache behaviours.
    Provide high bandwidth between the different memory levels, L1, L2, RAM

    I did a mock-up of a 4 cores Xenon (with paint) it wasn't super accurate for sure (... :lol:) but it was clear at least that the chip could most likely be a bit tinier than the cell was at launch or the same.

    I'll use the cell size for rough calculation assuming a 60% scaling per node @ 32nm that ~50mm² per grape. IBM want the cell to be a bit tinier so they could choose 4 grapes => ~200mm² @2GHz.

    That's 512GFlops SP and 256 DP and I think lower TDP than X86 parts. Both Intel and AMD plan to release OoO quad cores (or more) parts also supporting 8 wide SIMD units running around 3GHz in next future. IBM alternative could end using quiet less power, running cooler and be tinier (a bit tinier cores and a lot less cache), it could use the same or improved IO as POWER 7 for great scalability.
    So not much of a huge raw power advantage but way greater scalability, lower TDP and power consumption and the chip being pretty tiny possibly a money marker on the HPC market.

    Thing could get better if they can use EDRAM for the L2 => way tinier chip => lose some of this advantage and optimize for power/heat instead of density => aim for higher clocks and/or and some cores hitting the TFLOPs could be nice or simply have better FLOPS/Watts characteristics (my favoured take for the intended market).

    They may ship something like 4/8 Chips per blade (4 sockets per blade, one or two chips per socket) with bunch of memory. Not a crazy looking design in regard to raw numbers (1 or 2 TFLOPS in DP) but homogeneous design, cooler , pretty cheap to produce, well know ISA, able to run existing code, etc.
     
    #43 liolio, Nov 28, 2009
    Last edited by a moderator: Nov 28, 2009
  4. assen

    Veteran

    Joined:
    May 21, 2003
    Messages:
    1,377
    Likes Received:
    19
    Location:
    Skirts of Vitosha
    Are you willing to spend your development times turning problems into streaming versions, instead of solving new problems? Are you willing to bankroll your programmers doing so? Just betting isn't good enough ;-)
     
  5. Mobius1aic

    Mobius1aic Quo vadis?
    Veteran

    Joined:
    Oct 30, 2007
    Messages:
    1,683
    Likes Received:
    259
    I remember first hearing about the Loongson series a while back. Any products from China containing them? So far the only Loongson product I can forsee coming to the US is in a missile guidance system if you get my drift :razz:
     
  6. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    Sounds neat :), could be better than LRB hehe ;).
     
  7. T.B.

    Newcomer

    Joined:
    Mar 11, 2008
    Messages:
    156
    Likes Received:
    0
    While I agree with most points you make, don't you think this is more a compiler issue? I mean there is non-optimized debug code, and then there is going out of your way to bloat code-size beyond anything reasonable.

    Would be interesting to see what kind of instructions people would want, apart from DIV. ;) I'm somewhat partial toward logical boolean, to get rid of all those ceq instructions. Take some pressure off of EVEN.

    As for instruction latency, I don't see it as a problem in most cases. There are software solutions around that.

    The rest.... yeah, pretty much. :)
     
  8. Weaste

    Newcomer

    Joined:
    Nov 13, 2007
    Messages:
    175
    Likes Received:
    0
    Location:
    Castellon de la Plana
    What new instructions do you think that IBM had in mind when they said "Performance per SPE equal or better - significantly better on applications that benefit from new instructions." in the old roadmap?
     
  9. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    194
    Location:
    Stateless
    I don't agree I think Polaris is more relevant to where Intel is heading ;)
     
  10. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,065
    Likes Received:
    1,660
    Location:
    Maastricht, The Netherlands
    Well, in my ideal world I would design an engine completely based on the idea of streaming in the first place. It would probably result in new ideas automatically.
     
  11. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Unfortunately developers don't live in ideal worlds--then again, if they did, we may see new games every 4-6 years from a studio.
     
  12. EvilOne666

    Newcomer

    Joined:
    Apr 25, 2008
    Messages:
    48
    Likes Received:
    1
    in Soviet Russia cell designs you
     
  13. semitope

    Banned

    Joined:
    Oct 24, 2009
    Messages:
    180
    Likes Received:
    0
    :lol:

    This thread turned out fine IMO. I really just wanted to see what ppl would come up with
     
  14. EvilOne666

    Newcomer

    Joined:
    Apr 25, 2008
    Messages:
    48
    Likes Received:
    1
    :lol: fair enough xd :lol:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...