Toshiba unveils "SpursEngine" stream processor derived from Cell/B.E.

Discussion in 'CellPerformance@B3D' started by one, Sep 20, 2007.

  1. ebola

    Newcomer

    Joined:
    Dec 13, 2006
    Messages:
    99
    Likes Received:
    0
    this almost looks like an analogue of "the ps3 that never happened", with video replacing the 'visualizers' (and the PPE ripped out obviously for use in a pc)

    Does seem like it's the sort of thing larabee could be doing ?

    I suppose in the market place it's a bit like a physics card too.. an accelerator specialized to one task, but that will face competition from cpu+gpu... although I perceive spe's are more suited to video than cpu or gpu, due to their highly integrated int/float.
     
  2. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    Maybe it could, but probably not at 10-20 Watt.

    Will be really interesting to see what market they are aiming for and what "consumer electronics" they are specifically targeting with this item, I don´t believe hair saloons to be their only target market.

    Does it fit the requirements of a CPU for a TV?
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,492
    Likes Received:
    10,854
    Location:
    Under my bridge
    Why should it? Cell works on different design principles, and can attain higher peak throughput as a result for workloads that map well. I don't know what algorithms this system is using, but there will be cases where Cell can run stuff faster than a quad core, multi-hundred dollars CPU, and vice versa.
     
  4. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Yeah but this isn't Cell, its a half speed, half width Cell. So 1/4 of Cell performing a task that a modern quad core couldn't? Even just looking at raw GFLOPs the SpursEngine falls quite a way short of a decent quad core.

    Im not saying its impossible but i'm certainly hard pressed to believe it. I don't doubt that written in a certain way this software would run better on a Cell type architecture, thats just a detail of how its implemented. But are we saying that literally there is no way to pull something similar to this off on a quad core (or even dual core for that matter)? I.e. no-one can launch a competing product with a similar function as you absolutely need the SpursEngine to run it?

    Thats what I have a hard time believing.
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,492
    Likes Received:
    10,854
    Location:
    Under my bridge
    Not exactly. A best-case implementation on Cell will attain more performance than a best case implementation on a traditional processor, rather than both can compete on the same footing as long as the implementation is tailored to them.
    You'll be able to create something similar, but how similar is the question. Cut back frame-rate here, polygon count there, and it's similar, but is it then a product you want? Plus, as pointed out already, SpursEngine does it in 10-20 watts. You could take a very cheap, low spec PC and add SpursEngine for a cool, quiet kiosk unit, where you'd need a hot and expensive beefy PC to compete on traditional hardware.
     
  6. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Yeah no doubt its more energy efficient (assuming that you do need a beefy CPU to pull this off) but im still not sold on the raw performance aspect. The fact that they seem to be targetting this at laptops and CE devices were energy/heat is a concern and not desktops reinforces my skeptisism.
     
  7. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    It could be for business reasons, like laptop being a higher margin, higher growth market for Toshiba to pursue. It is also a sweet spot for Cell in terms of packaging.

    Other companies may be able to copy the software but they may have a harder time trying to fit in a small form factor solution.

    Anyway... I am really happy for STI. It seems that they are sticking to their plans afterall. I wonder where Sony is now since they don't have to rev a custom chip. Assuming it's a breeze to use, more consumer applications like this will help to launch Cell, Toshiba and Sony into the premium consumer brands.

    I wonder if Sony's "Dress" application (Everybody's Fashion Entertainment ?) will be similar.
     
    #27 patsu, Sep 24, 2007
    Last edited by a moderator: Sep 25, 2007
  8. one

    one Unruly Member
    Veteran

    Joined:
    Jul 26, 2004
    Messages:
    4,823
    Likes Received:
    153
    Location:
    Minato-ku, Tokyo
    They are not mutually exclusive. You should imagine what you can do with a quad-core PC with a SpursEngine PCI-e card installed :wink:
     
  9. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    From here we get:

    Let us make some rough deductions.

    3 Cell at 3.2 GHz = 140 Core Duo at 2 GHz
    + Assuming 1 Cell at 3.2 GHz = 4 SpursEngines at 1.5 GHz =>

    1 SpursEngine at 1.5 GHz = 11.7 Core Duo at 2.0 GHz for this particular image processing application.

    Assume 1 Quad core at 3.0 GHz = 3 Core Duo at 2.0 GHz

    And then we have 1 SpursEngine at 1.5 GHz = 3.9 Quad Core at 3.0 GHz for a comparable image processing application. :grin:
     
    #29 Crossbar, Sep 25, 2007
    Last edited by a moderator: Sep 25, 2007
  10. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Yeah I guess thats some pretty strong evidence to suggest the Cells architecture really can give massive benefits.

    Still they are very specific cases were specific algorithm types are just very inefficient on desktop CPU's. Its not like we are talking raw performance here, just efficiency of execution.

    I would like to know if there are other ways to approach the problem that isn't so inefficient on a regular CPU.

    Afterall, isn't that what we always say about Cell? Very slow when you use an x86 approach but can be much faster when you taylor your approach to the architectures strengths.

    Im not saying there aren't specific problems that just don't naturally sit better on Cells architecture but I think its clear that the cases were a single Cell is as fast as 16 top end quad core CPU's regardless of how you approach the problem are extremely, extremely rare. Afterall, I think we can all agree that a single Cell doesn't have remotely close to 64x the raw power of a single Core2 core.
     
  11. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    It depends on how you look at it (Glass is half full or half empty).

    The fact that x86 approach runs slow on Cell has no implications on how fast Cell can go given its new way of problem solving. And it is more than "just" efficiency.

    To me, the Cell architecture is a re-interpretation of existing problems. It is not a brute-force increase in clock speed, cache size, number of cores, etc. (Those are simply benefits -- not founding principles -- of the Cell concept). So it does not make sense to force the old way of doing things on Cell, except for backward compatibility.

    I think the overall performance is a function of CPU + memory access. So far we have a few "generic" areas where Cell is supposed to fall, but instead it outran traditional CPUs when framed in the right context (e.g., breadth first search).

    The truth is probably somewhere in between. People are starting to apply Cell to more real life problems (or previously unachievable performance level given a fixed price), we should know more in a couple of years.
     
    #31 patsu, Sep 25, 2007
    Last edited by a moderator: Sep 28, 2007
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,492
    Likes Received:
    10,854
    Location:
    Under my bridge
    Wrong.

    It's not just the algorithm, but the available processing resources. SPU's are SIMD processors, which deal with multiple computations at a time. From the article about the image recognition software :
    In order to use the SIMD parallel processing of Cell you need to re-architect your software. That's where the new algorithms come in. When you have mapped your algorithms well to the hardware, you have at your disposal a large number of logic units that can do lots of sums at the same time.

    x86 doesn't have this. You can't redesign your software to go from 15 clocks per function to 3 (unless it was terribly written in the first place!) because there aren't enough execution units to calculate that number of simultaneous calculations.

    Cell's performance is a combination of lots of processing units that work on parallel data (8 SPEs doing maths on 4 values per clock each means 32 calculations per clock cycle. You only get that with lots of cores) and a memory system that, properly managed, can supply these processing units with data so they aren't hanging around waiting.

    Power is an ill-defined term. It depends what you're doing and how you're doing it. A flat statement 'CPU x is n times faster than CPU y' is only ever valid when the only difference is clock speed! The moment the architectures differ, and the system they're attached to (Cell has 25 GB/s available via XDR. PCs on DDR2 can't manage half that), things get very complicated.

    Suffice to say, Cell outperforming a big expensive CPU in some applications isn't surprising. Neither is it to be assumed likely the big expensive CPU can compete with Cell if you just use a different algorithm that fits well to it (which all algorithms do, as it were. It's Cell code that needs re-engineering, because code is traditionally designed with x86 in mind). It's no different to seeing a GPU render graphics really quickly and then say a quad-core x86 should be able to match it if you use a different algorithm. A GPU has shed-loads more calculation units than a CPU, and you have to write code that uses those units. When you do, unlocking its performance, the GPU has no equal. Cell sits somewhere between the CPU and GPU, providing more execution units than a standard CPU, being more flexible than a GPU, and needing a different approach to code creation than either of them.

    The fact Cell is so 'magically' quick is why there's some excitement over it! If a conventional quad-core CPU could perform as well without needing whole new ways of developing software and algorithms, we wouldn't need to waste our time reinventing wheels on Cell, would we? ;)
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    But thats exactly what i'm saying. Cell doesn't have the processing resources of 16 quad core CPU's as these figures would suggest. It doesn't even have remotely close to that. At maximum it peaks out at <2.5x the processing resources in terms of GFLOPs and probably less in most other areas. Hence why i'm saying the vast majority of that performance isn't down to raw power but rather its down to the efficiency of the algorithm on Cells architecture.

    Even the reference you gave only quotes Cell being able to perform key operations 3 times faster than the CPU so were is the rest of that performance coming from? Memory? But like you said Cell is little more than twice as fast there, not 140x faster. Its got to be down to massive inefficiencies in how those types of algorithms run on regular CPU's vs a Cell like CPU rather than raw power. In the best of circumstances raw power might be able to account for 2 or even 3 times quad core performance but not whats being talked about here (>50x).
     
  14. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    ...which is why I suggested we may have to look at it from a system architecture point of view (at least CPU + memory access). Comparing the CPU alone doesn't give the full picture.

    On a good day, the speed gain is likely due to a combination of parallel raw math power, up to an order of magnitude faster data + instruction access, a high rate of sustained data stream via async DMA and single purpose application (The SPUs don't need to run any OS layer or general housekeeping code).

    The algorithm is efficient in the sense that it allows Cell to maximize its strength.

    On super linear speed up, wikipedia has this to say...

    So efficient and fast memory access can affect the outcome drastically (because the time saved is multiplied by the number of accesses, not the number of cores).
     
    #34 patsu, Sep 25, 2007
    Last edited by a moderator: Sep 25, 2007
  15. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,492
    Likes Received:
    10,854
    Location:
    Under my bridge
    The efficiency of the architecture. Not the algorithms. Algorithms are like drivers where CPUs are like cars. If you take a 140 MPH top speed car and give it to a good driver, he might make a lap on 3 minutes. A 240 mph car with a good driver could do it in, say, 1 and a half minutes. Stick a bad driver in the fast car and they may well take longer than the slower car. The algorithms on Cell are just about putting a good driver in the driver's seat, instead of a conventional driver. Like, conventional CPUs are motorbikes, and Cell as an F1. Stick a motorbike rider in an F1 and they won't be so hot. Put an F1 driver on a motorbike and they likely won't be as fast as the motorcyclist, even though they're used to being much faster. Put an x86 algorithm on Cell and it won't be so hot. But that's not a limit of the hardware. The hardware architecture, the design of the CPU, is a go-faster design that needs go-faster code to use it.

    5x faster.

    You're really not understanding. You've got way too a simple view of these machines. ;) The RAM memory is 2x-3x faster. The internal bandwidth working on LS is way, way faster, hundreds of GB a second. If you have the data in LS, you have amazingly fast BW. Then you're comparing LS performance to cache, and all sorts of complexities. It's not just a case of looking at a few separate numbers and deriving a performance comparison by how much bigger the numbers are. A quad-core QX6700 has twice as many transistors as Cell, so it must be twice as powerful! :D

    No, it's an aggregate performance increase on multiple improvements across the board in terms of chip and system design. Cell adds faster memory access, faster working memory, more execution units, more SIMD processing, which all together improve things greater than the sum of the parts. This design is a clever one. Cell was developed without any need for backwards compatibility or legacy code support, so it could be designed to approach problem solving from a pure performance perspective. This same design philosophy is being developed by Intel too, who are aiming for massive performance increases by going with a more Cell route. Intel themselves know lots of x86 cores isn't going to be fast compared to true performance processors. x86 is locked to an old way of thinking and doing things. 500 transistors of quad-core x86 is going to be a performance black-hole compared to 500 transistors of streamlined re-targeted processor.

    At the end of the day though, there are lots and lots of benchmarks out there. Sure, some have a chance of being quite spun, such as from IBM, but there's also independent developments too. Unless you want to think that everyone who compares performance with Cell and x86 and finds Cell is faster is just not trying on x86, the numbers speak for themselves, and are a testament to the smart design that STI invested lots of money into, tackling the major issues that are bottlenecking conventional processors, and being first to market with this new trend in processors.
     
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Isn't Larrabee x86?
     
  17. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
  18. dantruon

    Regular

    Joined:
    Apr 5, 2004
    Messages:
    487
    Likes Received:
    2
    from reading this thread and others thread about the Cell on this forums, the sentiments is that the Cell is very efficient, flexible and powerful, now the question is when will we be able to see it implement in a notebook or a desktop as a replacement of the x86 CPU or it never happend?
     
  19. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    It's unlikely for Cell to replace workhorse PCs. The best it can do is to carve a niche for itself (e.g., in entertainment space). There is also no third party Cell applications today... until perhaps in the future when Sony has built a sizable base of PS3 owners.
     
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,492
    Likes Received:
    10,854
    Location:
    Under my bridge
    The first step top Cell computers would be content-creation workstations that run Cell apps like Maya. If they appear, you may, if you're lucky, alongside Cell Linux development on PS3, get other system coming out. But given how long this is likely to take, and how Intel will have their competing chips out, I don't expect Cell to broach into the mass consumer CPU space.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...