Cell vs Tesla 10 vs Firestream 9250

Discussion in 'CellPerformance@B3D' started by randomhack, Jun 22, 2008.

  1. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Pardon my ignorance wrt Cell but I wonder why the cell based accelerator boards are so damned expensive compared to the GPU based boards? I also wonder whether Cell based systems offer any performance advantage over the GPU boards.
    Is it the power draw thats attrative on the Cell?
     
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,484
    Likes Received:
    10,844
    Location:
    Under my bridge
    I think because Cell is out and has been out for a couple of years, and the GPU boards aren't. ;) Prior to these latest developments there was nothing to challenge Cell's sustained performance. Now that the GPU IHVs are adjusting their designs for general purpose processing, and competing with each other, they are clearly in a price struggle to attract performance dollars - if Firestream was the only accelerated option out there, do you think it'd remain so 'cheap'!

    This is something of a make-or-break point for Cell I think. It hasn't managed to take of and establish itself in a broad market or the CE space. It's current niche of performance computing is being challenged by the GPU guys. The advantage to Cell at the moment is in performance per watt, but in specialist applications custom ASICs do a good job already. In the PC space, such as Toshiba's laptop venture, Cell is likely to be surpassed by integrated GPU hardware that can transition between jobs. Function as a standard PC GPU they'll get mainstream adoption, and with all these GPU's present, people will actually start targeting them, and unlike Cell, they'll get the software that has the performance advantages.

    I'm not feeling optimistic for Cell at the moment :(
     
  3. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    For CE space, what is the performance figures for NEC's and Broadcom's SoC for Blu-ray players ? These devices need to run general programs on HD assets. They would be Cell challengers on another front.


    For specialized business and scientific applications, The GPGPUs already churn out impressive numbers for work that match well to their architecture. I am also waiting for some comparison numbers from Folding@Home (There will be a new PS3 client too according to Vijay Pande). The problem is these SIMD nodes may have higher FLOP count, but the effective work done may not be as great (compared to the FLOP count). For F@H, it is said that the GPUs need to recalculate some values a few times because they can't store these intermediate values.

    The current incarnation of GPGPUs also have limited application areas. Nonetheless, the sheer number of cores may still provide the necessary performance advantage despite the wastage. I also believe that integrated GPU will likely kill any prospect for SPUREngine and Cell accelerator boards here.


    For supercomputing, power consumption has become one of the key limiting factors. Does Cell still offer better performance/watt compared to GPUs ? If true, they will hold a strategic advantage there (since they can add more nodes to improve performance and likely still achieve good efficiency due to Cell's software design). I know of clusters that caused frequent brown-outs in their neighbourhood in Asia.

    In general, energy use will become more and more problematic moving forward.
     
    #3 patsu, Jun 22, 2008
    Last edited by a moderator: Jun 22, 2008
  4. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Cell offers a number of advantages, especially in the supercomputing space. That's due mainly to interconnects; where GPUs are slaved to the PCIe slots, Cell is not, and thus communication node-to-node and with the 'system' in general is at a much lower latency than with GPU-based systems. Cell itself as an architecture reflects a number of supercomputing concepts, and the idea of 'interconnects' is at the core of it... the EIB and the low-latency LS.

    GPUs on a targeted level are of course great performers, and I certainly don't think the average person should go out and buy a Cell add-in board when they can work with a GPU instead. But for scaled HPC and supercomputing solutions on the enterprise scale, the combination of IBM's reputation and BladeCenter system, the improving SDK tools and 'transparency' offered in heterogeneous arrangements, and actual real performance advantages in certain situations keeps Cell quite competitive in the present term, and I think that IBM is well-positioned at the moment with it.

    How approachable the Firestream is I'm not sure myself, certainly it's DP performance is quite strong, but against the Tesla 10 Cell compares favorably on both DP performance and wattage. NVidia's rack-mount server offering at 700 watts would put the QS22 at rough parity in terms of DP performance for just a couple thousand more while consuming have the watts for example.
     
  5. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    I believe Clearspeed's accelerator boards have existed long before CELL acceleration boards and they've proven themselves in Linpack sustained performance testing too. In fact they were added to the TSUBAME supercomputer at Tokyo Institute of Technology in Japan to boost sustained performance back in 2006 while only consuming a small amount of additional power. Clearspeed has a new processor out now CSX700 with 96 GFLOPS of performance yet only consumes 25W of power. That is pretty darn amazing.

    http://www.clearspeed.com/docs/resources/ClearSpeed_e710_Marketing_Brochure_5-08.pdf

    What is CELL's performance/watt?
     
    #5 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  6. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Clearspeed's application utility is limited in the HPC space, in part due again to the PCIe limitations, and in part simply due to its architecture; something that has been discussed many a time on these forums, especially around the time of Cell's original introduction. At the same time though obviously, Clearspeed only has utility in the HPC space... so it's function is more as a very efficient situational boost computation-wise.

    It is a very good Flop/watt performer, but I think the fact that no system of note uses it at its core should tell you all you need to know. Along with the PCI-express hurdles of both GPGPU and Clearspeed solutions, it should be noted that on the memory front the PowerXCell 8i is in a much better position as well. The QS22 accommodates up to 32GB of RAM per blade, where the Clearpseed card is limited to 2GB per card.
     
  7. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    You are comparing a blade with a card?

    CATS 700 1U blade with 24GB of ECC RAM, 1.1TFLOPS double precision, 500Watts power draw.

    http://www.clearspeed.com/docs/resources/ClearSpeed_CATS_700_5-08.pdf

    Who's using Clearspeeds?

    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_BAE_Agreement_070904.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_Meraka_Institute_Feb5_07.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_2006_11_13.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_Tao.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_Warwick.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_Sun.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_HP_approved_071113.php
    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_CATS_071113.php

    CELL isn't being used everywhere, it's being used in HPC and PS3. It's only being used by Toshiba because they've invested so much money in it and want to get something tangible out of it same with CELLs in supposedly SONY tvs. CELL is not all that convincing in the CE space. Toshiba and SONY promised to use them in their CE devices because...well they developed it along with IBM. Do you see anybody else using CELLs in their CE devices? No, everybody else is using ASICs.
     
    #7 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  8. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    Impressive FLOP count and energy efficiency !!! Admittedly, I have not tracked ClearSpeed closely.

    According to the latest top 500 supercomputer list, the highest ranked ClearSpeed-equiped system placed at #24 (with 12344 nodes). Ranked above it (#23) is a "Dell" Xeon cluster with 9600 nodes on infiniband. With such a superior specs, how did ClearSpeed lag behind a smaller Xeon cluster ? Perhaps the host system limited ClearSpeed's advantages in some ways ? Did PCIe cap the overall performance ?

    There was also no power rating numbers given. Why not flaunt it ?

    * ClearSpeed is a co-processor while Cell is a general purpose CPU. So the application area is somewhat different.

    * Is ClearSpeed based on a SIMD programming model (Their whitepaper seems to imply so). If true, it will have the same specialization/limitation as GPGPU, but with a much better performance/watt rating. OTOH, GPGPU may come "free/integrated" one day, potentially squeezing Cell and ClearSpeed accelerators out. The current Cell prospect indeed lies within IBM, Toshiba and Sony until they evolve the business further. Where does ClearSpeed intend to go from here onwards ? Who do you think is likely to acquire ClearSpeed (the company) ?

    * What is the up-time for a typical ClearSpeed system ? Is it robust enough for mission critical applications ?

    * Are they expensive ?
     
    #8 patsu, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  9. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
  10. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    That's very cheap.

    I don't know how the Cell vendors are going to position their solutions, but it looks like ClearSpeed competes more with GPGPU and Cell co-processor boards only. A ClearSpeed node will require a host CPU (so the total power consumption is the sum of the host CPU's and the ClearSpeed coprocessor's draw). It will also require another interconnect to hook the host CPUs together, in addition to ClearSpeed's PCIe interface. These are all extra costs, space and power.

    If necessary, a Cell node may in fact run on its own without a host. Its MIMD + SIMD design also allows it to speed up a wider range of problems. However, the DP performance/watt is lower than ClearSpeed's.

    Would be interesting to see the 2009 Top 500 list. After dropping from #5 to #24, it may be time for ClearSpeed to gun for top 5 again.
     
  11. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    Oh my! Clearspeed are only using 90nm accelerater chips too!

    Well here is the reason.

    This upgrade consisted of only 360 Clearspeed boards or 720 Clearspeed processors. In other words they were able to achieve an additional 9TFLOPS from just 720 CS processors. So basically the Clearspeed processors accounted for only about 10% -20% of the total number of processors that make up TSUBAME. Also keep in mind this was back in 2006 when Dell's ABE didn't even exist on the top 10. With that said it's not really surprising that ABE barely beat out TSUBAME. If you were to build a supercomputer today from the ground up with the new Clearspeed processors, you'd be able to build a PFLOPS machine very easily and cheaply and only consume a fraction of the power and space as existing systems.

    Yes but in the HPC sector the CELLs are being used as coprocessors, they still need host CPUs. RodeRunner uses dualcore Opterons...thousands of them. The Power core in the CELL as used in Roadrunner is used for other needed duties, it doesn't just sit idle.
     
    #11 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  12. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    If the new processor performs as expected, I'd imagine ClearSpeed aims to take the #1 spot in 2009 (or 2010).

    In the RoadRunner configuration, there are about 2 Cells to 1 Opteron node. The latter feeds data and runs the network. For ClearSpeed, I'd imagine you need at least 1 to 1 to keep the coprocessor active.

    For smaller scale deployment, the users should be able use a cluster of Cell servers "as is" though (e.g., I believe Georgia Tech, U. of Maryland have these).

    In any case, Cell is still a more general architecture with a wider range of applications, but your mileage will vary.
     
    #12 patsu, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  13. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Hmm that gives roughly 25 gflops per board measured versus 96 gflops peak? Or were these numbers using some older board or something? If these are with the current boards, then I am not terribly impressed.

    If 25gflops are with the current board, then thats roughly 120$/gflop, not a terribly attractive cost even if the power requirement is only 1W/gflop. On the cost side, you could do a lot better by buying a generic intel or amd quad-core part. On the power side, you also have to include some power on the host side when using clearspeed.
     
  14. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    Yes those were the older boards based on the CSX600 processors which gets 25 GFLOPS per processor. The new boards have only 1 processor and is rated at 96 GFLOPS. These new boards themselves don't even have fans because of the low power consumption.
     
    #14 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  15. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    That blade is just a bunch of cards; it doesn't address any of the problems I brought up with the solution (PCI-express, memory per chip), just provides them in a denser footprint.

    No, Cell isn't being used everywhere. But it's impact on HPC - and architecting in general - has been more significant than that of Clearspeed, and the fact is that a system based on Cell is simply more versatile than one based on Clearspeed in terms of the workloads it can address.

    As for Toshiba and the SpursEngine, I believe that the chip offers some very tangible differentiation. Since it's at the heart of their post-HD DVD media strategy as well, it obviously has merit on the performance level vs competing available ASICs to boot.
     
  16. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    Who in the HPC sector said it's a problem? Show me one scientific problem that the Clearspeeds couldn't solve with a high degree of efficiency due to its *memory and PCIe problems*.

    By the way having more memory per chip doesn't outweight CELL's disadvantages which is heat, power consumption, and footprint. A 1U Clearspeed blade provides over 1TFLOPS of computer power. A 1U CELL blade could only provide a fraction of that. If I could run my molecular simulations with 1TFLOPS of compute power and 24GBs of memory with a high degree of efficiency, why would I want to run it on a CELL blade? What advantage does a CELL blade offer? More memory? Have you ever thought that maybe CELL needs more memory to be as efficient? Is the higher memory capacity per CELL chip a tangible feature or is it just a marketing bullet point?

    That's like claiming basketball player A is taller than basketball player B...by 1 inch...hardly convincing.

    Toshiba is using CELL to do upscaling and image enhancement/processing in their CE devices, that's not a tangible differentiation over an ASIC designed to do the same thing. Now if Toshiba releases a TV where you could use it to surf the internet, that would be a tangible difference.
     
    #16 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  17. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Memory addressability directly relates to the nature and scope of the problems a chip is able to work on, as it provides a direct limit as to the amount of data in play. Your bringing up the 24GB of memory again vs the obviated 2GB per chip/card within the server itself makes me wonder whether you're grasping the point here. With the QS22, it's 16GB per chip; with the CATS 700, it's 2GB.

    Further as the system scales out across several nodes (or even intra-node), the PCI bus becomes a crucially limiting factor as the latency comes into play, and the latency is enormous - over a hundred times greater. HPC and supercomputing on a large scale are as much about the interconnects as they are about the chips themselves, and managed communication across the system as a whole.

    The results are tangibly different, that's for sure, or they would just be using an ASIC. The magic mirror demo, the MPEG-2 tiling, and the super-upscaling are all applications that I've not seen replicated elsewhere on a cheap ASIC, and I'll point out what should be the obvious point: if Toshiba could achieve the same result outside of the SpursEngine using cheaper hardware, that's what they'd be doing.

    I'll end by addressing this. If you're here to bash Cell, you're in the wrong place. This sub-forum is explicitly for the discussion of Cell in terms of its ecosystem and programming. This thread itself is 50/50 in terms of whether it should be here or not, but I opted to let it stay for comparative purposes. What seems to be happening though is that someone with a seeming derisive view of the architecture is taking this as an opportunity to rail against it instead. Your admiration for Clearspeed is noted; indeed it's an admirable architecture. But you need to either a) change your tone within this sub-forum, or b) take the rant to a different sub-forum.
     
  18. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    I think ClearSpeed's strategy ties in well with the rise of the "generic node" (off-the-shelves) supercomputers. The exceptionally high FLOP count is a hallmark of SIMD processors, and complements the host processor's generality.

    The power rating is impressive too, but it is in fact diluted by the typically hungry host processor. Nonetheless, the averaged performance/watt should still be attractive (Otherwise, life will be tough).

    Its real world impact depends on various factors such as effectiveness (as with GPGPUs), robustness, adoption, etc. If it is as good as advertised, we should see some shuffling in the Top 500 list in 2009 and 2010.

    At the moment, I think their key problem is business execution because they had 2 years since their first breakthrough product launched, but they have not made major impact yet.



    Cell is a different animal because its vision is based on the message passing model. Being a self-contained, predictable, power and space efficient compute element, it can scale from CE devices to a server node amazingly well. It has also been used to speed up assorted problems, from tree travesals to number crunching. Unfortunately, this generalization also makes the basic unit less performant (or too expensive) compared to highly specialized devices for a given task.

    I think both ClearSpeed and Cell will need at least two to three more years before we can conclude their success in their intended use.

    I'd say for now Sony looks the most committed and well-postitioned to take advantage of Cell (There are more Cell software/services to come ;-) ). Toshiba is just starting (since they have always wanted to remove the SPU and have finally done so). IBM is probably pushing mini-RoadRunners aggressively to other installations as we speak, and also re-grouping/re-evaluating their position.

    Because Cell is actively deployed and used on a large scale, it can bring interesting dynamics to the ecosystem. Today, the Folding@Home project is an interesting exercise in a public, fully distributed Cell network.


    I think Toshiba mentioned that Cell allows them to do Super-Upconversion. Indeed more features are coming soon. Surfing the net and video conferencing are just 2 examples highlighted. If you look at the bundled software on their SPUREngine laptop, you will also see potential in UI breakthrough and other home media server use.

    In Sony's case, they are also starting to add tru2way capability to HD TVs (i.e., built-in Java set-top boxes). This is similar to PS3's BD-Live foundation and its PlayTV accessory.

    I think a closed P2P (Cell) network for content distribution is interesting too.
     
    #18 patsu, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  19. Carl B

    Carl B Friends call me xbd
    Moderator Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Clearspeed's core market is the same as the market that the GPGPU makers are initially playing in; the workstation-as-supercomputer market, for lack of a better term. When a single add-in card is all you need to really manage on a system level, what you find yourself with is an incredible price/performance value. Certainly the compute resources of a Clearspeed product or a Firestream or a Tesla achieved through 'traditional' means would be much more expensive and require a much larger footprint. What is essentially created is a thriving cottage industry where players in the HPC space who would prior have found equivalent systems cost-prohibitive are now able to do serious work on cheap setups.

    But as we go into the massive rack-server environments, the solutions that on the desktop provide an unequaled value begin to strain in areas that are important at scale. Utility is still there to be had, but it becomes constrained in relation to theoretical non-board bound implementations of those same architectures, and their competition.

    Beyond that, the fact is that a lot of additional factors play into the appeal of an architecture.

    Each of the four architectures being discussed here (Cell, Radeon, GeForce, Clearpseed) has a clear set of differentiating positive qualities, but some qualities can be weighted as more important than others, and some have a longer list of such qualities than others.

    To its benefit, the fact is that Cell has the weight of the preeminent supercomputing manufacturer behind it (and the support/service that brings), an evolving SDK that has made code porting increasingly simple, an out-of-the-box heterogeneous support structure, Bladecenter inclusion, and the fact that the PS3 provides a full cheap work environment... all are factors that are playing to the PowerXCell 8i's favor right now in the HPC space, and certainly the chip has gained a lot of momentum.

    For NVidia, the deal is that it's cheap, accessible, and CUDA has leveraged those facts to become a respectably established tool in a very short amount of time. CUDA essentially is the face of GPGPU at the moment, and the scope of individuals who have delved into it speaks to its promise as a field.

    AMD doesn't have the established programmability head-start that NVidia does, but the DP performance is stellar, wattage is under control, and everyone knows what the deal is essentially.

    Clearspeed has been brought up time and again around here, and the fact is that for DP performance they are the watt/Flop leader. But the constraints of the board model and the disadvantages in proliferation and programmability they face when compared to the above players puts them at a comparative disadvantage in terms of "taking off." This new card and a supposedly much-improved SDK may be what they need to break out, we'll see what happens.

    But with the above players and the arrival of Larrabee in the near future, there is no question the Top500 list a couple of years from now is going to look wholly changed.

    I think both can already be qualified as successes, industry-dependent. Clearspeed has always occupied a certain niche which it certainly performs well in; whether GPGPU squeezes it or not, I think it will have to be considered as a 'success' at what it did. For Cell, frankly the gains in HPC I think are at or beyond what many cynically believed would be the case when the architecture launched. In the consumer-electronics industry though, it's not looking very prolific at all.

    I don't think that has anything to do with Cell though. :)
     
  20. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    I understand the concept just fine, the more memory is better rule also has limitations, ever heard of diminishing returns? On the other hand you don't seem to be understanding my point and you still haven't offered any proof to support your claims. If 2GB per processor is a problem then show me some realworld HPC problems that perform poorly with your invented memory capacity and PCIe bottleneck.

    Another basic concept that has no realworld evidence with respect to Clearspeeds design implementation. Show me where PCIe has dampened Clearspeeds realworld performance. I'd like to see some numbers.

    The magic mirror is a niche application. Why design an ASIC for such a niche market? It's not worth it. MPEG2 tiling..how is that tangible? Wouldn't a consumer need multiple input sources running at the same time to be able to feed this MPEG2 tiling engine? Why would a normal person have such a setup?

    Super Upscaling...let's see the results first before we claim it's something only CELL could provide. Again for whatever reason you keep ignoring my point. Toshiba is not using ASICs for these things because they're niche applications, it's not practical nor economical to design different ASICS for niche applications that have limited market value. It's not tangible.

    The only application that will see volume is Super Upscaling, again let's see the results first to see how it compares to the highend upscaling ASICs already out on the market. If it ends up being the same then it's just reinventing the wheel.

    I'm just responding to uninformed statements that were made about CELLs supposed advantages. The CELL boards are competing in the same market as the Clearspeed and GPGPU boards whether we want to accept it or not. If you think the CELL blades are not competing for the same HPC dollars as the Clearspeed blades then you're in for a rude awakening. Finally if we don't have real numbers to verify supposed advantages/disadvantages it's all FUD.
     
    #20 RudeCurve, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...