View Full Version : Toshiba unveils "SpursEngine" stream processor derived from Cell/B.E.
http://www.toshiba.co.jp/about/press/2007_09/pr2001.htm
Toshiba to demonstrate prototype of new "SpursEngine â„¢" processor at CEATEC JAPAN 2007
20 September , 2007
Derived from Cell/B.E.â„¢ technology, will be applied to video processing
TOKYO--Toshiba Corporation today announced development of the "SpursEngineTM", a high-performance stream processor integrating Synergistic Processing Element (SPE) cores derived from the Cell Broadband EngineTM (Cell/B.E.TM). The SpursEngine is expressly designed to bring the powerful capabilities of the Cell/B.E. technology to consumer electronics, and to take video processing in digital consumer products to new levels of realism and image quality.
The prototype of SpursEngine will be unveiled at CEATEC JAPAN 2007, at Makuhari Messe, Japan, from October 2nd. Notebook PCs integrating SpursEngine will be used in the world's first public demonstration of the processor's capabilities in 3D image processing and manipulation: real-time transformations of hair styles and makeup that instantaneously recognize and process changes in position, angle, and facial expression, and render them as computer graphics. Toshiba also plans to demonstrate concept notebook PCs integrating the SpursEngine.
SpursEngine, a co-processor that works in cooperation with a host CPU, fuses Cell/B.E.'s high performance multi-core technology with Toshiba's advanced image processing technology to perform stream processing of video sources--image recognition and processing--at the increasingly sophisticated level required by new generations of digital consumer products.
The new co-processor integrates four of Cell/B.E.'s high performance RISC core SPEs, half the number of the full configuration, plus hardware dedicated to decoding and encoding MPEG-2 and H.264 video. By combining the high level, real time processing software of the SPEs with the hardware video codecs, the SpursEngine realizes an optimized balance of processing flexibility and low power consumption. The prototype of SpursEngine operates at a clock frequency of 1.5GHz and consumes power at 10 to 20 watts.
SpursEngine also adopts XDRTM DRAM memory as working memory, achieving support of high data transfer rates, for large volumes of media data.
Toshiba will bring SpursEngine to market after CEATEC, for application in various digital consumer products, and for use by customers and Toshiba itself, as soon as it completes specifications for commercial production.
About Cell Broadband Engine
The revolutionary Cell/B.E., jointly developed by IBM, Sony Group and Toshiba, is a breakthrough design featuring a central processing core based on IBM's Power Architecture technology and eight synergistic processing elements (SPE). Cell/B.E. brings an unseen level of broadband processing power to digital products.
About SPE
Synergistic Processor Element is a processor core that has high performance floating point computation capability with an original instruction set architecture, for optimized processing of multiple media applications.
http://www.toshiba.co.jp/about/press/2007_09/imgdat/img2002.gif
http://www.toshiba.co.jp/about/press/2007_09/imgdat/img2003.jpg
Other pics
http://www.watch.impress.co.jp/av/docs/20070920/toshiba.htm
Vitaly Vidmirov
20-Sep-2007, 12:12
Very cool!
But as far as i understand, it can only work as PCI-e accelerator?
Very cool!
But as far as i understand, it can only work as PCI-e accelerator?Seems so, it's a co-processor. Future CE products from Toshiba may incorporate it without PCI-e.
This is hilarious. Dedicated decode/encode chips instead of using Cell to do the whole thing :?:
Jawed
This is hilarious. Dedicated decode/encode chips instead of using Cell to do the whole thing :?:
Jawedspecialised cores will always be more efficient. But you are right, Cell was intented to be flexible yet efficient enough to replace specialised solutions as the "SpursEngine" (ie. good enough that the gains of specialised solutions stand in no relation of development costs).
So its quite ironic to see Toshiba themself come up with it.
I think part of it is that Toshiba itself has always been rather opposed to the inclusion of the Power core as it applied to their own purposes; beyond that, no need for a full Cell chip here since it's going to be an x86 laptop with a 'primary' CPU anyway. So, while you're cutting it down to begin with, might as well only keep as many full SPEs as you need and include lesser silicon for the functions which lesser silicon can perform.
I'm personally pretty excited about the development - it'll help push the SPE API forward into the CE space, where it should at least perform well comparatively, and it shows a spread if not of the chip then at least of the architecture. I'd love to see Toshiba sell this chip into the add-in board market, where vendors putting forth a little effort could make it both an A/V accelerator for video/imaging and a physics accelerator for gaming. Might be positive for workstation use as well. Toshiba recently demonstrated an SPE fabbed on their 65nm CMOS process; I imagine this chip is fabbed on such as well, so should be pretty cheap to produce.
Maybe people should remember what Emotion Engine also designed by Toshiba was like ;)
MIPS core + DMAC + VU0 + VU1 + IPU (MPEG2 decoder)
AlStrong
20-Sep-2007, 17:07
Rough die size. The angle will throw off the estimation a bit, but.... good enough I say!
http://i192.photobucket.com/albums/z211/Alstrong/SpursEngineDieSize.jpg
Shifty Geezer
20-Sep-2007, 18:45
Interesting. Toshiba ditched the PPE that they never wanted in the first place! Not sure how this fits into their CE targets. Presumably they'll have an existing processor they're used to and add the SPEs alongside. On the plus side, it shows some flexibility with the Cell design, but on the downside it's not really advancing the family rather than creating a schism. Is it Cell if it's just SPUs connected to an arbitrary processor? If AMD were to integrate SPE's onto their next chip as coprocessors, would that make it a Cell? The lack of full IPA compatibility concerns me. Will this SpursEngine be able to contribute to a networked processing system as envisaged with Cell, with Cell devices working to augment each other? Or have they just created a standalone part for CE goods where Cell was too pricey? In that respect it is a smart move, as Carl says. It uses SPE code and promotes SPE development. Anything achieving that now has to be good for the long term health of the platform; it's better than not using anything Cell at all.
Yes, it is good news (and very important !) as long as they build up the software library and expertise for SPU development, and can compete with cheaper solutions in price/performance ratio. If deployed for real, the move will keep Cell viable and visible on the marketing front. I wonder if it make sense to have 2 SPE versions.
It is not so good news on the cost saving side (for people who thought they could repurpose the "broken" Cell chips for CE applications).
Shifty Geezer
20-Sep-2007, 19:55
It is not so good news on the cost saving side (for people who thought they could repurpose the "broken" Cell chips for CE applications).That might not be a cost saving choice. I mean, if all those broken Cells are lying around, it will be cheaper to use them than make some new chips nd chuck those Cells in the bin!
Could it be more a choice of power consumption and heat? SpursEngine is set to burn 10-20 watts. That sounds a lot less than you'd expect from Cell, although they're clocking much lower. What do we think a CBE with 4 working SPEs at 1.5GHz will consume?
What do we think a CBE with 4 working SPEs at 1.5GHz will consume?A better comparison is with 5 working SPEs, the 5th one for MPEG2/H.264 processing. Or 6 SPEs, depending on the performance of the full-HD video unit in SpursEngine.
Shifty Geezer
20-Sep-2007, 21:03
True. My guess is a 1:6 CBE would be running at least double the wattage.
That might not be a cost saving choice. I mean, if all those broken Cells are lying around, it will be cheaper to use them than make some new chips nd chuck those Cells in the bin!
Yap ! That's what I meant. Now they went to do another custom Cell chip. The "broken" Cells remain unused probably because of power consumption and cost (for CE deployment).
Meanwhile, as long as they continue to do more stuff using just 4 SPEs, a 7 SPE system like PS3 will gain more legroom as a result (e.g., Can run stuff in parallel while doing Blu-ray).
Maybe people should remember what Emotion Engine also designed by Toshiba was like ;)
MIPS core + DMAC + VU0 + VU1 + IPU (MPEG2 decoder)
AFAIK the IPU doesn't decode MPEG2 by it self.
AFAIK the IPU doesn't decode MPEG2 by it self.Yeah, let me correct it to "MPEG2 decoding accelerator" that does things like iDCT conversion, macro-block decoding, texture decompression etc.
3dilettante
21-Sep-2007, 15:55
Yap ! That's what I meant. Now they went to do another custom Cell chip. The "broken" Cells remain unused probably because of power consumption and cost (for CE deployment).
That and the lack of PCI-E would make it problematic to use as an accellerator card. The market's pretty small when it comes to platforms that use Flex-IO, so reusing a bad Cell would require the use of some kind of specialty bridge chip (which also costs money).
Imagine the power and ground pins saved by shaving off most of Cell. I know if it's even possible to just not connect power and ground pins, even if half the chip is turned off.
Without per-core power planes, the chip might not function.
Is it even possible for Cell as it is used in the PS3 to initialize properly without the PPE active?
I wonder if the hardware makes some assumptions about functionality that would not be preserved if forced into this more specialized role.
Crossbar
24-Sep-2007, 07:55
Notebook PCs integrating SpursEngine will be used in the world's first public demonstration of the processor's capabilities in 3D image processing and manipulation: real-time transformations of hair styles and makeup that instantaneously recognize and process changes in position, angle, and facial expression, and render them as computer graphics. Toshiba also plans to demonstrate concept notebook PCs integrating the SpursEngine.
http://www.watch.impress.co.jp/av/docs/20070920/tos02.jpg
http://www.watch.impress.co.jp/av/docs/20070920/tos03.jpg
It looks like Toshiba are serious about going commercial with their Magic Mirror.
If they can do it with just 4 SPUs at 1.5 GHz, the 6 available SPUs at 3.2 GHz in the PS3 Cell should run circles around that application. I am really looking forward to that demonstration.
pjbliverpool
24-Sep-2007, 11:27
It looks like Toshiba are serious about going commercial with their Magic Mirror.
If they can do it with just 4 SPUs at 1.5 GHz, the 6 available SPUs at 3.2 GHz in the PS3 Cell should run circles around that application. I am really looking forward to that demonstration.
Its a kick ass application, there is no doubt about that. I can see women the world over (my girlfriend being one of them) going crazy for this thing - I mean seriously crazy!! Hell, I think my gf would love the PC more than me if I had this on it!
But I do wonder, does it really need the spurs engine to work? Surely a 4 SPE 1.5Ghz processor can't be capable of anything a powerful desktop CPU isn't already capable of.
Are we saying a quad core couldn't run this as I would sure as hell rather invest in one of those than a seperate add in board?
this almost looks like an analogue of "the ps3 that never happened", with video replacing the 'visualizers' (and the PPE ripped out obviously for use in a pc)
Does seem like it's the sort of thing larabee could be doing ?
I suppose in the market place it's a bit like a physics card too.. an accelerator specialized to one task, but that will face competition from cpu+gpu... although I perceive spe's are more suited to video than cpu or gpu, due to their highly integrated int/float.
Crossbar
24-Sep-2007, 12:07
Its a kick ass application, there is no doubt about that. I can see women the world over (my girlfriend being one of them) going crazy for this thing - I mean seriously crazy!! Hell, I think my gf would love the PC more than me if I had this on it!
But I do wonder, does it really need the spurs engine to work? Surely a 4 SPE 1.5Ghz processor can't be capable of anything a powerful desktop CPU isn't already capable of.
Are we saying a quad core couldn't run this as I would sure as hell rather invest in one of those than a seperate add in board?
Maybe it could, but probably not at 10-20 Watt.
Will be really interesting to see what market they are aiming for and what "consumer electronics" they are specifically targeting with this item, I don´t believe hair saloons to be their only target market.
Does it fit the requirements of a CPU for a TV?
Shifty Geezer
24-Sep-2007, 15:06
Are we saying a quad core couldn't run this...Why should it? Cell works on different design principles, and can attain higher peak throughput as a result for workloads that map well. I don't know what algorithms this system is using, but there will be cases where Cell can run stuff faster than a quad core, multi-hundred dollars CPU, and vice versa.
pjbliverpool
24-Sep-2007, 17:35
Why should it? Cell works on different design principles, and can attain higher peak throughput as a result for workloads that map well. I don't know what algorithms this system is using, but there will be cases where Cell can run stuff faster than a quad core, multi-hundred dollars CPU, and vice versa.
Yeah but this isn't Cell, its a half speed, half width Cell. So 1/4 of Cell performing a task that a modern quad core couldn't? Even just looking at raw GFLOPs the SpursEngine falls quite a way short of a decent quad core.
Im not saying its impossible but i'm certainly hard pressed to believe it. I don't doubt that written in a certain way this software would run better on a Cell type architecture, thats just a detail of how its implemented. But are we saying that literally there is no way to pull something similar to this off on a quad core (or even dual core for that matter)? I.e. no-one can launch a competing product with a similar function as you absolutely need the SpursEngine to run it?
Thats what I have a hard time believing.
Shifty Geezer
24-Sep-2007, 18:56
Im not saying its impossible but i'm certainly hard pressed to believe it. I don't doubt that written in a certain way this software would run better on a Cell type architecture, thats just a detail of how its implemented.Not exactly. A best-case implementation on Cell will attain more performance than a best case implementation on a traditional processor, rather than both can compete on the same footing as long as the implementation is tailored to them.
But are we saying that literally there is no way to pull something similar to this off on a quad core (or even dual core for that matter)? I.e. no-one can launch a competing product with a similar function as you absolutely need the SpursEngine to run it?You'll be able to create something similar, but how similar is the question. Cut back frame-rate here, polygon count there, and it's similar, but is it then a product you want? Plus, as pointed out already, SpursEngine does it in 10-20 watts. You could take a very cheap, low spec PC and add SpursEngine for a cool, quiet kiosk unit, where you'd need a hot and expensive beefy PC to compete on traditional hardware.
pjbliverpool
24-Sep-2007, 20:05
Not exactly. A best-case implementation on Cell will attain more performance than a best case implementation on a traditional processor, rather than both can compete on the same footing as long as the implementation is tailored to them.
You'll be able to create something similar, but how similar is the question. Cut back frame-rate here, polygon count there, and it's similar, but is it then a product you want? Plus, as pointed out already, SpursEngine does it in 10-20 watts. You could take a very cheap, low spec PC and add SpursEngine for a cool, quiet kiosk unit, where you'd need a hot and expensive beefy PC to compete on traditional hardware.
Yeah no doubt its more energy efficient (assuming that you do need a beefy CPU to pull this off) but im still not sold on the raw performance aspect. The fact that they seem to be targetting this at laptops and CE devices were energy/heat is a concern and not desktops reinforces my skeptisism.
It could be for business reasons, like laptop being a higher margin, higher growth market for Toshiba to pursue. It is also a sweet spot for Cell in terms of packaging.
Other companies may be able to copy the software but they may have a harder time trying to fit in a small form factor solution.
Anyway... I am really happy for STI. It seems that they are sticking to their plans afterall. I wonder where Sony is now since they don't have to rev a custom chip. Assuming it's a breeze to use, more consumer applications like this will help to launch Cell, Toshiba and Sony into the premium consumer brands.
I wonder if Sony's "Dress" application (Everybody's Fashion Entertainment ?) will be similar.
Yeah no doubt its more energy efficient (assuming that you do need a beefy CPU to pull this off) but im still not sold on the raw performance aspect. The fact that they seem to be targetting this at laptops and CE devices were energy/heat is a concern and not desktops reinforces my skeptisism.They are not mutually exclusive. You should imagine what you can do with a quad-core PC with a SpursEngine PCI-e card installed :wink:
Crossbar
25-Sep-2007, 11:29
Yeah but this isn't Cell, its a half speed, half width Cell. So 1/4 of Cell performing a task that a modern quad core couldn't? Even just looking at raw GFLOPs the SpursEngine falls quite a way short of a decent quad core.
Im not saying its impossible but i'm certainly hard pressed to believe it. I don't doubt that written in a certain way this software would run better on a Cell type architecture, thats just a detail of how its implemented. But are we saying that literally there is no way to pull something similar to this off on a quad core (or even dual core for that matter)? I.e. no-one can launch a competing product with a similar function as you absolutely need the SpursEngine to run it?
From here (http://forum.beyond3d.com/showthread.php?p=1069508#post1069508)we get:
The team spent about eight months on the project first implementing the algorithm on a 2 GHz Intel Core 2 Duo processor. Using the PC, the team showed machine vision that could recognize in three minutes a bar stool in an image of an office setting.
Using a network of three Playstation3 consoles linked to a PC, the tem was able to speed the recognition rate up to just one second.
...
Overall the three consoles handled the work at rates up to 140 times the speed of the single PC processor, Felch said.
Let us make some rough deductions.
3 Cell at 3.2 GHz = 140 Core Duo at 2 GHz
+ Assuming 1 Cell at 3.2 GHz = 4 SpursEngines at 1.5 GHz =>
1 SpursEngine at 1.5 GHz = 11.7 Core Duo at 2.0 GHz for this particular image processing application.
Assume 1 Quad core at 3.0 GHz = 3 Core Duo at 2.0 GHz
And then we have 1 SpursEngine at 1.5 GHz = 3.9 Quad Core at 3.0 GHz for a comparable image processing application. :grin:
pjbliverpool
25-Sep-2007, 19:42
From here (http://forum.beyond3d.com/showthread.php?p=1069508#post1069508)we get:
Let us make some rough deductions.
3 Cell at 3.2 GHz = 140 Core Duo at 2 GHz
+ Assuming 1 Cell at 3.2 GHz = 4 SpursEngines at 1.5 GHz =>
1 SpursEngine at 1.5 GHz = 11.7 Core Duo at 2.0 GHz for this particular image processing application.
Assume 1 Quad core at 3.0 GHz = 3 Core Duo at 2.0 GHz
And then we have 1 SpursEngine at 1.5 GHz = 3.9 Quad Core at 3.0 GHz for a comparable image processing application. :grin:
Yeah I guess thats some pretty strong evidence to suggest the Cells architecture really can give massive benefits.
Still they are very specific cases were specific algorithm types are just very inefficient on desktop CPU's. Its not like we are talking raw performance here, just efficiency of execution.
I would like to know if there are other ways to approach the problem that isn't so inefficient on a regular CPU.
Afterall, isn't that what we always say about Cell? Very slow when you use an x86 approach but can be much faster when you taylor your approach to the architectures strengths.
Im not saying there aren't specific problems that just don't naturally sit better on Cells architecture but I think its clear that the cases were a single Cell is as fast as 16 top end quad core CPU's regardless of how you approach the problem are extremely, extremely rare. Afterall, I think we can all agree that a single Cell doesn't have remotely close to 64x the raw power of a single Core2 core.
Yeah I guess thats some pretty strong evidence to suggest the Cells architecture really can give massive benefits.
Still they are very specific cases were specific algorithm types are just very inefficient on desktop CPU's. Its not like we are talking raw performance here, just efficiency of execution.
I would like to know if there are other ways to approach the problem that isn't so inefficient on a regular CPU.
Afterall, isn't that what we always say about Cell? Very slow when you use an x86 approach but can be much faster when you taylor your approach to the architectures strengths.
It depends on how you look at it (Glass is half full or half empty).
The fact that x86 approach runs slow on Cell has no implications on how fast Cell can go given its new way of problem solving. And it is more than "just" efficiency.
To me, the Cell architecture is a re-interpretation of existing problems. It is not a brute-force increase in clock speed, cache size, number of cores, etc. (Those are simply benefits -- not founding principles -- of the Cell concept). So it does not make sense to force the old way of doing things on Cell, except for backward compatibility.
Im not saying there aren't specific problems that just don't naturally sit better on Cells architecture but I think its clear that the cases were a single Cell is as fast as 16 top end quad core CPU's regardless of how you approach the problem are extremely, extremely rare. Afterall, I think we can all agree that a single Cell doesn't have remotely close to 64x the raw power of a single Core2 core.
I think the overall performance is a function of CPU + memory access. So far we have a few "generic" areas where Cell is supposed to fall, but instead it outran traditional CPUs when framed in the right context (e.g., breadth first search).
The truth is probably somewhere in between. People are starting to apply Cell to more real life problems (or previously unachievable performance level given a fixed price), we should know more in a couple of years.
Shifty Geezer
25-Sep-2007, 21:45
Still they are very specific cases were specific algorithm types are just very inefficient on desktop CPU's. Its not like we are talking raw performance here, just efficiency of execution.Wrong.
I would like to know if there are other ways to approach the problem that isn't so inefficient on a regular CPU. Afterall, isn't that what we always say about Cell? Very slow when you use an x86 approach but can be much faster when you taylor your approach to the architectures strengths.It's not just the algorithm, but the available processing resources. SPU's are SIMD processors, which deal with multiple computations at a time. From the article about the image recognition software :
Thanks to its on-board accelerators, the Cell processor in the consoles was able to handle key computations in three cycles that the Intel chip had to compute sequentially in 15 cycles.In order to use the SIMD parallel processing of Cell you need to re-architect your software. That's where the new algorithms come in. When you have mapped your algorithms well to the hardware, you have at your disposal a large number of logic units that can do lots of sums at the same time.
x86 doesn't have this. You can't redesign your software to go from 15 clocks per function to 3 (unless it was terribly written in the first place!) because there aren't enough execution units to calculate that number of simultaneous calculations.
Cell's performance is a combination of lots of processing units that work on parallel data (8 SPEs doing maths on 4 values per clock each means 32 calculations per clock cycle. You only get that with lots of cores) and a memory system that, properly managed, can supply these processing units with data so they aren't hanging around waiting.
Afterall, I think we can all agree that a single Cell doesn't have remotely close to 64x the raw power of a single Core2 core.Power is an ill-defined term. It depends what you're doing and how you're doing it. A flat statement 'CPU x is n times faster than CPU y' is only ever valid when the only difference is clock speed! The moment the architectures differ, and the system they're attached to (Cell has 25 GB/s available via XDR. PCs on DDR2 can't manage half that), things get very complicated.
Suffice to say, Cell outperforming a big expensive CPU in some applications isn't surprising. Neither is it to be assumed likely the big expensive CPU can compete with Cell if you just use a different algorithm that fits well to it (which all algorithms do, as it were. It's Cell code that needs re-engineering, because code is traditionally designed with x86 in mind). It's no different to seeing a GPU render graphics really quickly and then say a quad-core x86 should be able to match it if you use a different algorithm. A GPU has shed-loads more calculation units than a CPU, and you have to write code that uses those units. When you do, unlocking its performance, the GPU has no equal. Cell sits somewhere between the CPU and GPU, providing more execution units than a standard CPU, being more flexible than a GPU, and needing a different approach to code creation than either of them.
The fact Cell is so 'magically' quick is why there's some excitement over it! If a conventional quad-core CPU could perform as well without needing whole new ways of developing software and algorithms, we wouldn't need to waste our time reinventing wheels on Cell, would we? ;)
pjbliverpool
25-Sep-2007, 23:11
Wrong.
It's not just the algorithm, but the available processing resources. SPU's are SIMD processors, which deal with multiple computations at a time. From the article about the image recognition software
But thats exactly what i'm saying. Cell doesn't have the processing resources of 16 quad core CPU's as these figures would suggest. It doesn't even have remotely close to that. At maximum it peaks out at <2.5x the processing resources in terms of GFLOPs and probably less in most other areas. Hence why i'm saying the vast majority of that performance isn't down to raw power but rather its down to the efficiency of the algorithm on Cells architecture.
Even the reference you gave only quotes Cell being able to perform key operations 3 times faster than the CPU so were is the rest of that performance coming from? Memory? But like you said Cell is little more than twice as fast there, not 140x faster. Its got to be down to massive inefficiencies in how those types of algorithms run on regular CPU's vs a Cell like CPU rather than raw power. In the best of circumstances raw power might be able to account for 2 or even 3 times quad core performance but not whats being talked about here (>50x).
...which is why I suggested we may have to look at it from a system architecture point of view (at least CPU + memory access). Comparing the CPU alone doesn't give the full picture.
On a good day, the speed gain is likely due to a combination of parallel raw math power, up to an order of magnitude faster data + instruction access, a high rate of sustained data stream via async DMA and single purpose application (The SPUs don't need to run any OS layer or general housekeeping code).
The algorithm is efficient in the sense that it allows Cell to maximize its strength.
On super linear (http://en.wikipedia.org/wiki/Speedup) speed up, wikipedia has this to say...
According to Amdahl's law, the theoretical maximum speedup of using N processors would be N, namely linear speedup. However, it is not uncommon to observe more than N speedup on a machine with N processors in practice, namely super linear speedup. One possible reason is the effect of cache aggregation. In parallel computers, not only does the number of processors change, but so does the size of accumulated caches from different processors. With the larger accumulated cache size, more or even the entire data set can fit into caches, dramatically reducing memory access time and producing an additional speedup beyond that arising from pure computation.
So efficient and fast memory access can affect the outcome drastically (because the time saved is multiplied by the number of accesses, not the number of cores).
Shifty Geezer
26-Sep-2007, 16:41
Hence why i'm saying the vast majority of that performance isn't down to raw power but rather its down to the efficiency of the algorithm on Cells architecture.The efficiency of the architecture. Not the algorithms. Algorithms are like drivers where CPUs are like cars. If you take a 140 MPH top speed car and give it to a good driver, he might make a lap on 3 minutes. A 240 mph car with a good driver could do it in, say, 1 and a half minutes. Stick a bad driver in the fast car and they may well take longer than the slower car. The algorithms on Cell are just about putting a good driver in the driver's seat, instead of a conventional driver. Like, conventional CPUs are motorbikes, and Cell as an F1. Stick a motorbike rider in an F1 and they won't be so hot. Put an F1 driver on a motorbike and they likely won't be as fast as the motorcyclist, even though they're used to being much faster. Put an x86 algorithm on Cell and it won't be so hot. But that's not a limit of the hardware. The hardware architecture, the design of the CPU, is a go-faster design that needs go-faster code to use it.
Even the reference you gave only quotes Cell being able to perform key operations 3 times faster than the CPU5x faster.
so were is the rest of that performance coming from? Memory? But like you said Cell is little more than twice as fast there, not 140x faster.You're really not understanding. You've got way too a simple view of these machines. ;) The RAM memory is 2x-3x faster. The internal bandwidth working on LS is way, way faster, hundreds of GB a second. If you have the data in LS, you have amazingly fast BW. Then you're comparing LS performance to cache, and all sorts of complexities. It's not just a case of looking at a few separate numbers and deriving a performance comparison by how much bigger the numbers are. A quad-core QX6700 has twice as many transistors as Cell, so it must be twice as powerful! :D
Its got to be down to massive inefficiencies in how those types of algorithms run on regular CPU's vs a Cell like CPU rather than raw power.No, it's an aggregate performance increase on multiple improvements across the board in terms of chip and system design. Cell adds faster memory access, faster working memory, more execution units, more SIMD processing, which all together improve things greater than the sum of the parts. This design is a clever one. Cell was developed without any need for backwards compatibility or legacy code support, so it could be designed to approach problem solving from a pure performance perspective. This same design philosophy is being developed by Intel too, who are aiming for massive performance increases by going with a more Cell route. Intel themselves know lots of x86 cores isn't going to be fast compared to true performance processors. x86 is locked to an old way of thinking and doing things. 500 transistors of quad-core x86 is going to be a performance black-hole compared to 500 transistors of streamlined re-targeted processor.
At the end of the day though, there are lots and lots of benchmarks out there. Sure, some have a chance of being quite spun, such as from IBM, but there's also independent developments too. Unless you want to think that everyone who compares performance with Cell and x86 and finds Cell is faster is just not trying on x86, the numbers speak for themselves, and are a testament to the smart design that STI invested lots of money into, tackling the major issues that are bottlenecking conventional processors, and being first to market with this new trend in processors.
pjbliverpool
26-Sep-2007, 18:56
This design is a clever one. Cell was developed without any need for backwards compatibility or legacy code support, so it could be designed to approach problem solving from a pure performance perspective. This same design philosophy is being developed by Intel too, who are aiming for massive performance increases by going with a more Cell route. Intel themselves know lots of x86 cores isn't going to be fast compared to true performance processors. x86 is locked to an old way of thinking and doing things. 500 transistors of quad-core x86 is going to be a performance black-hole compared to 500 transistors of streamlined re-targeted processor.
Isn't Larrabee x86?
Crossbar
26-Sep-2007, 19:16
Isn't Larrabee x86?
The cores will implement a subset of the x86 ISA that includes some GPU-specific extensions.
http://arstechnica.com/articles/paedia/hardware/clearing-up-the-confusion-over-intels-larrabee.ars
Likely focused on the SSE part as it is intended to be integrated in GPUs as well, like this one.
http://media.arstechnica.com/news.media/larrabee-gpu.gif
dantruon
29-Sep-2007, 01:34
from reading this thread and others thread about the Cell on this forums, the sentiments is that the Cell is very efficient, flexible and powerful, now the question is when will we be able to see it implement in a notebook or a desktop as a replacement of the x86 CPU or it never happend?
It's unlikely for Cell to replace workhorse PCs. The best it can do is to carve a niche for itself (e.g., in entertainment space). There is also no third party Cell applications today... until perhaps in the future when Sony has built a sizable base of PS3 owners.
Shifty Geezer
29-Sep-2007, 10:09
The first step top Cell computers would be content-creation workstations that run Cell apps like Maya. If they appear, you may, if you're lucky, alongside Cell Linux development on PS3, get other system coming out. But given how long this is likely to take, and how Intel will have their competing chips out, I don't expect Cell to broach into the mass consumer CPU space.
From: http://www.xbitlabs.com/news/cpu/display/20071003232118.html
Spurs Engine can be used in both consumer electronics and computer applications. For example, in the PC space it could process graphics or physics. Nevertheless, considering that four SPEs can offer tangible advantages over dual-core x86 chips, but would hardly rival contemporary graphics processors that feature 64 – 128 or even more processing engines, Toshiba’s new development will hardly find home in general-purpose PCs.
Uhh?
Hmm.... they should compare laptop GPUs with SpursEngine. What's the state of the art ?
The former definitely have the software and OS integration advantages, the latter requires custom application like FaceMation to shine.
Titanio
06-Jan-2008, 19:18
Not sure where to put this, but at their CES conference, Toshiba talked about using Cell for ultra high quality SD to HD upscaling, perhaps in TVs or..?
http://www.engadget.com/2008/01/06/live-from-the-toshiba-ces-press-conference/
nico1982
06-Jan-2008, 23:27
Not sure where to put this, but at their CES conference, Toshiba talked about using Cell for ultra high quality SD to HD upscaling, perhaps in TVs or..?
http://www.engadget.com/2008/01/06/live-from-the-toshiba-ces-press-conference/
Endgaget is missing a few slides. It seems it is related to SD digital channels and DVD displayed on HD sets (I think :P) http://www.watch.impress.co.jp/av/docs/20080107/ces04.htm?ref=rss (right bottom slide).
http://www.foxbusiness.com/markets/industries/industrials/article/toshiba-showcases-latest-advanced-technologies-ces-2008_424814_6.html
AV Notebook PC "Qosmio" with "SpursEngine"
"Qosmio" has led the way in bringing advanced entertainment capabilities to AV notebook PCs, and visitors to CES will see for themselves how a new Qosmio concept model points to future services and capabilities. The Qosmio integrates Toshiba's new "SpursEngine," a co-processor derived from the high- performance Cell Broadband Engine(TM: 103.94, -2.96, -2.76%)(1,2) (Cell-B.E.) processor and Toshiba's advanced image processing technology, and bring awesome levels of processing power to consumer electronics.
Attention-grabbing demonstrations at CES will include the following:
-- Real-time face morphing
-- Real-time transformations of hair styles and makeup that
instantaneously recognize and process changes in position, angle, and
facial expression captured by an integrated camera, and render them as
computer graphics
-- Hand gesture
-- Remote-controller-free, hand gesture control of video, where simply
moving the hand can control, for example, playback or pause
-- Video indexing
-- Once a face appears on screen, it is recognized and stored in an easily
searched index that allows viewers to find and playback video segments
featuring a specific person.
-- Super-resolution
Image super-resolution creates high definition (HD: 24.96, -0.86, -3.33%) video from standard
definition (SD: 33.31, -0.74, -2.17%) video, for instance video library recorded so far, by
enhancing pixel resolution.
-- High-speed video editing and transcoding
-- SpursEngine's high-speed transcoding from MPEG2 to H.264 achieves
faster editing and recording of HD video to HD DVD discs.
TV with Cell Broadband Engine(TM: 103.94, -2.96, -2.76%) (Cell-B.E.)
TVs empowered with the high performance Cell-B.E. multi-core processor offer new dimensions in visual entertainment. Technology demonstrations at CES will feature real-time image super-resolution that transforms SD images into HD, and multi-decoding, the simultaneous playback of multiple videos that can bring up to 48 moving images to the screen.
Apparently the Cell TV launches in Fall 2008 - sometime in 2009 (in the latest schedule that is).
Crossbar
08-Jan-2008, 00:12
[url]Apparently the Cell TV launches in Fall 2008 - sometime in 2009 (in the latest schedule that is).
Perhaps they are waiting for the 45 nm Cell to go into mass-production.
Less heat-dissipation and power requirements as well as lower price/unit are likely reasons for that.
A Cell would do great in a laptop or micro-computer, as long as you don't want to run Windows on it. Linux is no problem. But that would make the volume much smaller than most hardware vendors like.
Vitaly Vidmirov
12-Jan-2008, 16:18
A Cell would do great in a laptop or micro-computer
I don't think so. CELL is hot and draws 80+ watts.
A Cell would do great in a laptop or micro-computer
I don't think so. CELL is hot and draws 80+ watts.
I kind of agree but it would have take really few changes to make it effective ;)
If STI had spend slightly more ressources on the ppe design and offer a down clocked version (say 2Ghz) fewer SPU, It could be perfectly fine and may apple would have chose tit ;)
Anyway as we say in France "using IF we could put Paris in a bottle" ;)
A Cell would do great in a laptop or micro-computer
I don't think so. CELL is hot and draws 80+ watts.
At 90nm it drew about 100W, 65nm cut a huge chunk off that, I estimate it's around 60W now, still a bit too high for a laptop. 45nm will cut power 45% so that'd put it into the laptop range.
Shifty Geezer
30-Jan-2008, 15:35
If they're adding this to a laptop, I presume the software is Windows based. How is code developed and executed on SpursEngine through Windows then and how does it integrate with the Windows applications? Does it share memory or is code left on local XDR for the Cell, basically running it as a system in a system that just reports results back through a driver interface?
randycat99
06-Feb-2008, 04:08
At 90nm it drew about 100W, 65nm cut a huge chunk off that, I estimate it's around 60W now, still a bit too high for a laptop. 45nm will cut power 45% so that'd put it into the laptop range.
Also, bear in mind those figures are with zero power management controls. I imagine a laptop solution would include a proper power management system for the chip, itself, which would bring down typical power consumption behavior drastically (as it does for nearly all processing intensive components found in a modern laptop).
Granted, power consumption could still be very high under full load, but that is still tolerable on a laptop as long as it is short term. It's the same as it would be for any classic high-power laptop cpu/gpu. For short tasks, the power management will enable it to "rise" to the occasion, with minimal impact to battery life or heat generation. For sustained intensive tasks, there is simply no escaping the reality that processing horsepower has an electrical/thermal cost, and the user will be fully aware of this when the fans come on and the battery meter goes to 1/2 hr remaining unless they plug it in to a dedicated AC source.
http://www.toshiba.co.jp/about/press/2008_04/pr0801.htm
Toshiba starts sample shipping of SpursEngine™ SE1000 high-performance stream processor
08 April, 2008
- Offering development environment to advance stream processing applications in the Full- HD era -
http://www.toshiba.co.jp/about/press/2008_04/imgdat/0801_1.gif
http://www.toshiba.co.jp/about/press/2008_04/imgdat/0801_2.gif
TOKYO--Toshiba Corporation today announced the start of sample shipping of the SpursEngine™ SE1000 (SpursEngine), a high-performance stream processor integrating four Synergistic Processing Element (SPE) cores derived from the "Cell Broadband Engine™" (Cell/B.E.™). Sample shipping started from today, and Toshiba expects sales of 6 million units within the first three years of the SpursEngine’s release.
SpursEngine is a co-processor that integrates a hardware codec for Full HD encoding and decoding of MPEG-2 and H.264 streams with four SPEs derived from Cell/B.E. These advanced processing elements offer high performance media streaming capabilities, with a clock frequency of 1.5GHz, while achieving low power consumption range of 10W to 20W.
"We are very pleased to have started sample shipping of SpursEngine" said Yoshio Masubuchi, Director of Toshiba’s System LSI Division, Advanced SoC Development Center. "The design of this powerful co-processor is dedicated to bringing the advanced capabilities of the Cell/B.E.™ to consumer electronics, particularly video processing in digital consumer products. We are sure that SpursEngine will accelerate the market for full-HD applications."
Toshiba will support developers working on SpursEngine applications with a comprehensive reference kit that includes a reference board and essential middleware APIs. The reference board has a PCI-Express edge connector that can connect to an x1 layer slot in a PC. Toshiba will also provide an integrated development environment (SPE compiler, SPE debugger, and performance monitor) and sample applications that demonstrate how to use the provided middleware. With the reference kit, customers can quickly and easily construct an evaluation and development environment and accelerate product development.
Toshiba will further boost the performance and cut the power consumption of the SpursEngine, towards supporting further innovation in products offering new levels of functionality.
Co-operation between Toshiba and the SpursEngine™ SE1000 Partnerships
Toshiba is developing co-operative relationships with many partner companies in order to develop wide scope video solutions that utilize SpursEngine. For example, we are partnering with Corel Corporation whose headquarters are in Canada; and Taiwan based CyberLink Corporation and Leadtek Research Inc. These companies produce popular video and image processing software and hardware such as graphic board, and will together supply to set manufactures. By working together with these companies and creating a new value chain, many end user can enjoy comfortable digital life by using our board and software bundled with SpursEngine.
Outline of SpursEngine™ SE1000
Product Number BXA32110XBGN
Sample Shipping April, 2008
Processor SPE 4 core
SPE Fully compliant with Cell/B.E.™ SPE Instruction Set
SIMD RISC/Single & double -precision floating-point arithmetic /DMAC/MMU
Memory Interface (XDR™ DRAM) 128MB(512Mbit x2), Physical Bandwidth of 12.8GB/s
Hardware Video Codec Full HD Capable MPEG2 Encoder and Decoder
Full HD Capable H.264 Encoder and Decoder
PCI-Express I/F x1, x4
PCI-Express Compliant with Base Specification Revision 1.1
Outline of SpursEngine™ Reference Kit
Hardware
Product Name SpursEngine™ SE1000 Reference Kit
Model BXK005000
Main Engine
SpursEngine™ SE1000
Processing Performance Maximum 48GFlops 12GFlops/1SPE
Element Core SPEx4
Operating Frequency 1.5GHz
Registers 128bit x 128/1SPE
Internal Memory Local Storage 256KB/1SPE
Host Interface
PCI-Express
Functions
Endpoint
1 lane link support
Compliant Base specification, Revision 1.1
Memory
XDR™DRAM Physical Bandwidth 12.8GB/s
Memory Size 128MB
Software
Basic software, libraries and tool chain for host and SPE program development.
Sample applications for user programs.
*XDR™ DRAM is a trademark of Rambus Inc. in the United States and other countries.
*SpursEngine™ and the logo are trademarks of Toshiba Corporation.
*Cell Broadband Engine™ and Cell/B.E.™ are trademarks of Sony Computer Entertainment Inc.
According to the Japanese press release, the sample price is 10,000 yen ($98.03).
http://www.toshiba.co.jp/about/press/2008_04/pr_j0801.htm
Shifty Geezer
08-Apr-2008, 12:29
6 million across 3 years isn't what I'd call a big market. The mention of Corel is encouraging though. If they produce a SPE accelerated graphics application, a port to PS3 should be easy, and the appearance of fast, high quality software could help the professional Cell 'workstation' market take off.
Well, nice to see things happening though. Honestly they should probably have launched a SpursEngine-type product back on 90nm bulk in order to build up the ecosystem sooner rather than later, and on the back of PS3's initial launch hype. Certainly back then it would have been competing with essentially the same 90nm embedded alternatives its simply competing with now on the smaller node. Beyond the 2 million/year in expected sales of the add-in board, I imagine that Toshiba will be using the chip themselves in their upcoming Cell-based HD TVs and other A/V gear, so likely its production run will be a good bit higher.
Hi there people! You really have nice forum here. :)
SpursEngine has hardware h264 encoder for handling 1080p video, but does this feature mean videostream is encoded real-time to h264 form or is it only for accelerating desktop video-encoding process beyond what normal desktop CPU is able to do?
I have been benchmarking my new budget CPU by encoding 720p video to h264 format. CPU is Intel E2180 OC'd to 3204MHz and while it is a fast processor for games etc. it still is very very sluggish at encoding "merely" 720p video. I used this transcode script (http://forum.doom9.org/showthread.php?t=127998) which transcodes video to be compatible with PS3 and encoding part is done with x264. I get around 4-8fps encoding speeds for 720p video therefore I was wondering if SpursEngine really is fast enough to encode FullHD video real-time to h264 format!
Like Mr. Garrison once said:"There are no stupid questions! There's only stupid people!" :)
Shifty Geezer
13-Apr-2008, 11:19
On a related note, is there any encoding software for SPE's on Linux? I would have thought this'd be a prime candidate for an oft-used application that could do with a lot of speeding up. Has anyone tried this, and if so, how did performance pan out? How would it compare to the hardware h264 encoder/decoder in SpursEngine? The principal reason for asking is why include hardware h264 when Cell was shown to be very quick at it?! This hardware choice suggests custom hardware is far faster, and the SPE's are 'relegated' to image processing functions, rather than workhorse activities, limiting Cell's CE applications.
NetFront Living Connect is probably the most advanced, but I don't know if it's on the market already:
http://www.access-company.com/products/internet_appliances/livingconnect/index.html (Note the server side component)
http://forum.beyond3d.com/showthread.php?t=41456
http://techon.nikkeibp.co.jp/english/NEWS_EN/20070521/132856/ (ACCESS player on Linux news)
ACCESS Co., Ltd. has demonstrated the operation of its DLNA-compatible middleware "NetFront Living Connect" using the "PlayStation 3 (PS3)" at the "10th Embedded Systems Expo (ESEC)," which opened on May 16, 2007. Using "Yellow Dog Linux v5.0 for PlayStation 3," the company demonstrated its NetFront Living Connect for Linux.
This is the 3rd place winner (region 2) for IBM's Cell programming competition:
http://sourceforge.net/projects/cell-h264/
3rd Place H264 Real Time encoding on Cell/B.E. Yao Zou, Xun He, Xianmin Chen and Lei Zhu Shanghai Jiaotong University
I think someone is trying to port x264 (http://www.videolan.org/developers/x264.html) to PS3. You should be able to find the post on B3D.
On a related note, is there any encoding software for SPE's on Linux? I would have thought this'd be a prime candidate for an oft-used application that could do with a lot of speeding up. Has anyone tried this, and if so, how did performance pan out? How would it compare to the hardware h264 encoder/decoder in SpursEngine? The principal reason for asking is why include hardware h264 when Cell was shown to be very quick at it?! This hardware choice suggests custom hardware is far faster, and the SPE's are 'relegated' to image processing functions, rather than workhorse activities, limiting Cell's CE applications.
I don't think it's a matter of faster so much as cheaper on the hardware codec front; certainly the SPEs would be up to the task, and the comparison chart between the Cell and SpursEngine shows the breakdown. The SPEs are still doing all of the heavy-lifting here in terms of processing, just as on an alternate architecture/embedded solution there would need to be a processing consideration above and beyond the codec hardware itself.
I think at the end of the day Spurs will prove to be a very robust solution in terms of its results compared to contemporaries, but its market success of course will depend on a number of factors.
Laptop's coming this year. Cell HDTV fall next year.
http://www.reghardware.co.uk/2008/05/09/toshiba_cell_strategy/
In April, Toshiba said it's partnering with the likes of CyberLink, Leadtek and Corel - all makes of widely used PC-oriented video playback apps - to add support for the SE1000 to their software.
That code will presumably go into the upcoming SpursEngine-equipped Qosmio, to allow it to deliver what Toshiba calls "Super-resolution" imagery. This is essentially standard-definition content upscaled to 1080p and beyond, but Toshiba believes the SE1000 will allow it to make a far better job of this existing task than its rivals can using standard interpolation algorithms.
This is in addition to Toshiba's upcoming TVs fitted with Cell processors - a line it demo'd at the Consumer Electronics Show in Las Vegas back in January. Cell, it said, will allow its tellies to do fancy things like 14-in-1 picture-in-picture images.
The catch: we'll have to wait until the autumn of 2009 for these.
Karoshi
10-May-2008, 13:43
Superresolution in image processing context has a particular meaning. Like getting better resolution from multiple shots (think satellites). And it could well be used to try and improve resolution for upscaled, slow moving content. Or interpolating a slow moving background from a sequence of I-frames.
Betanumerical
10-May-2008, 15:43
On a related note, is there any encoding software for SPE's on Linux? I would have thought this'd be a prime candidate for an oft-used application that could do with a lot of speeding up. Has anyone tried this, and if so, how did performance pan out? How would it compare to the hardware h264 encoder/decoder in SpursEngine? The principal reason for asking is why include hardware h264 when Cell was shown to be very quick at it?! This hardware choice suggests custom hardware is far faster, and the SPE's are 'relegated' to image processing functions, rather than workhorse activities, limiting Cell's CE applications.
IBM ran some competition for coding on the Cell (it was a while back, iirc) and one of the entries was a port of H264.
Sourceforge link.
http://sourceforge.net/projects/cell-h264/
PDF on project.
http://www-304.ibm.com/jct09002c/university/students/contests/cell/r2w3proposal.pdf
Results.
Directly compiling C code of X264 on SPU is not feasible. The first porting version with double buffer DMA support but without SIMD can encoded QCIF at 54.8 fps, using default parameters of X264 (Baseline), while after utilizing SIMD on only loop filter (~10% encoding time), the performance is increased to 62.9 fps with the same parameters
http://www.tgdaily.com/content/view/37845/135/
The video transcoding process takes about half as long on a SE1000 than on a 3 GHz Intel Core 2 Quad CPU. Keep in mind that this is a very specialized application, while the Core 2 Quad is a much more universal chip, but the simple performance potential is impressive nevertheless. Especially if you consider the fact that the accelerator consumes only 10 to 20 watts.
pjbliverpool
09-Jun-2008, 21:12
http://www.tgdaily.com/content/view/37845/135/
Intel should be embarrassed.
Their best CPU just had its ass handed to it by an old design sporting a mere fraction of its transistor count and power requirements.
So much for Core 2.
Shifty Geezer
09-Jun-2008, 21:33
That's why some of us have been very excited for Cell! It's design is a deviation from the traditional giving it an edge in the new processing demands that original CPU's were never expected to work on. It won't handle some app's anything like as well as your x86 derivatives, but most of the time users don't need it to. For the workhorse data-munching applications, Cell is a sweet design. In this case, SPURSengine in a laptop would mean a very low-power CPU could be used for standard processing and demanding processing offloaded to the SPURSEngine, meaning fabulous performance and battery life.
The fundamental flaw in all this is products and applications that use SPURSEngine ;) A lovely accelerated laptop won't be appearing any time soon.
What puzzles me is... if it takes only 10-20 W, why not throw more SPUREngine at the problem ? Why stop at 1 engine ?
Shifty Geezer
10-Jun-2008, 08:45
Because the device is positioned as a low-cost, low-power solution, for CE devices. This is a demo of what the SPURSEngine could achieve as is in a PC. It's not an actual solution, nor a product trying to be the world's fastest video transcoder!
Maybe because... Toshiba is selling only laptops ;)
Bah... Mercury should get in touch with Corel to port the same thing onto their Cell accelerator board: http://www.mc.com/microsites/cell/productdetails.aspx?id=2590
It's very hard to sell Cell add-on boards to normal PC users. Unless there's enough support from applications, consumers are not going to run out and buy them. On the other hand, with only a small fraction of market share, application developers are not going to support it. So it's sort of a classical chicken and egg problem.
This is why AMD, Intel, and NVIDIA are going for the GPGPU way. Many people are already buying GPU, so the market share is large enough. Therefore, application developers are more likely to provide their support.
The intended market for SPUREngine is not PC. It's those CE devices, set-top boxes, and maybe even inside TVs.
Yes, SPUREngine will also be in Toshiba laptops though.
Vitaly Vidmirov
15-Jun-2008, 12:53
http://blog.laptopmag.com/hands-on-with-the-qosmio-g55
That's an unexpected (and awesome) application for the SpursEngine when not performing actual encoding/decoding functions, and should help to differentiate the laptops in a big way. Surprisingly not as expensive as I thought they'd retail for either, which is a good sign in terms of helping them gain some traction.
I found it interesting that there's a consumer branding effort it seems for the SpursEngine, since the article refers to it as "Quad Core HD Processor."
Toshiba Launches AV notebook PCs that integrate TOSHIBA Quad Core HD Processor "SpursEngineTM"
http://www.toshiba.co.jp/about/press/2008_06/pr2301.htm
Is there any way to get some samples of this board?
I can't seem to find anything at all besides some news reports, not even on the Toshiba websites.
I think you'd probably have to register as a developer, and then beyond that the price should be about $100 for the add-in board and the SDK. Now, how you register I have no idea, but sometimes when you email a general contact from the website, they'll be able to assist in routing you to the proper department/individual.
You might want to try the contacts at the end of this press release (http://www.reuters.com/article/pressRelease/idUS120247+08-Apr-2008+PRN20080408).
Bob Nelson of Tsantes Communication Group, +1-408-426-4905,
bnelson@tsantes.com, for Toshiba Corporation; or Deborah Chalmers of Toshiba
America Electronic Components, Inc., +1-408-526-2454,
deborah.chalmers@taec.toshiba.com
Ok, thanks to the both of you.
I'll try that and see if I can get a hold of some samples.
What do you intend to do with it ? Let us know !
We're doing this project were we are evaluating Cell technology for other companies. So far we have only tested on PS3, but we just recently acquired some Blade Center technology too.
However, most of the companies involved in the project are looking into embedded solutions. So the SpursEngine would be a solution for them if they ever plan to use Cell (so far they aren't really to keen on getting into it, mostly because of cost, so they are starting to look into gpu solutions or will just continue to use dsp's or fpga's).
It would be a nice addition to the evaluation since it's a low cost low power target board, and that's exactly what they are looking for.
Pardon my ignorance. In your opinion, what are the typical use cases and problems you want to solve using a SPUREngine in a CE device ?
Well, that's something we'll still have to think of.
Most companies involved need quite some image processing power.
They go from traffic control over chip inspection to medical imaging and visual sorting machines.
So we have quite a range of stuff to try out.
Leadtek is to exhibit the WinFast PxVC1100 PCI-Express card based on SpursEngine at CEATEC JAPAN 2008 on Sep 30 - Oct 4. It has 128MB XDR DRAM. Its functions by bundled Corel DVD MovieFactory and WinDVD include video authoring, super-resolution upconversion, transcoding and DVD/AVCHD disc authoring/playback.
http://www.leadtek.co.jp/news_release/ceatec2008.html
The release date for WinFast PxVC1100 in Japan has been announced as Nov. 14 for 29800 yen (300$).
http://pc.watch.impress.co.jp/docs/2008/1029/leadtek.htm
Also this is a report from CEATEC 2008 on Sep 30 about which the post above mentions.
http://pc.watch.impress.co.jp/docs/2008/0930/ceatec03.htm
Some benches
http://pc.watch.impress.co.jp/docs/2008/1120/tawada157.htm
With all the crappy result of GPU transcoding. I am looking into this product. Anyone find any first hand impression of the product ?
pjbliverpool
13-Jan-2009, 19:03
Hmm, I can't tell which is better, higher, lower, or if the benchmarks are a mixture of the two!
From the results it looks that only H.264 encodes show big improvements. Encoding to MPEG-2 is probably just too fast both on the accelerate and on the CPU so the bottleneck is at other places.
For example, when doing DV-AVI to MPEG-2 SD, the SpursEngine runs at 84.07 fps and the 4 core CPU runs at 82.78 fps. When using the SpursEngine the CPU usage is indeed lower at 34% instead of 70% when using CPU. However, when doing HDV -> AVC-HD, SpursEngine runs at 32.77 fps while the quad core CPU runs at only 5.02 fps. If the quality is comparable then it's quite an acceleration.
Some benches
http://pc.watch.impress.co.jp/docs/2008/1120/tawada157.htm
With all the crappy result of GPU transcoding. I am looking into this product. Anyone find any first hand impression of the product ?
i think its better with non english URLs to push them through the auto translator to make them somewhat readable by most people here, so heres this one
http://translate.google.com/translate?prev=&hl=en&ie=UTF-8&u=http%3A%2F%2Fpc.watch.impress.co.jp%2Fdocs%2F200 8%2F1120%2Ftawada157.htm&sl=ja&tl=en&history_state0=
i find it rather odd that neather they, or indeed any URL iv found so far, as regards this chip/card review even mentions the basic things such as general standards Profiles or levels used in testing,operation etc.
such basic information as is layed out in wiki for instance
http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC#Levels
the main options current pro video people and devs at doom9 for instance being
"High Profile (HiP): The primary profile for broadcast and disc storage applications, particularly for high-definition television applications (this is the profile adopted into HD DVD and Blu-ray Disc, for example). "
and level "4.1 245760 8192 50 Mbit/s 62.5 Mbit/s 150 Mbit/s 200 Mbit/s 1280x720@68.3 (9)
1920x1080@30.1 (4)
2048x1024@30.0 (4) "
after reading several Cell threads here over the years, we basicly know the expected potential for this chip, and its purpose, multi realtime PIP,upscaling,encoding.decoding,etc....
so the only real lack of information i cant seem to find is its general availablity and price, right now it seems people are expecting it to be around $300 US
but i find that a strange high price as it was billed as a cut down CELL that had the PPC core ripped out of it and replaced with something else cheaper(and no one really cared about porting linux to )to bring its costs down well below the real PPC CELL,etc... and its initial purpose was for making it cheap ($25/$35 US ? per 10K tray)and putting inside HDTVs?
for that $300 US price surely you would expect at the very least for this most basic PCI-E 1 PCB, a real PPC based CELL to at least the PS3 spec if not the better blade version?
the real question that needs an answer then, given they are now passing it off as the ultimate (as people/OEMs dont seem interested in putting the likes of the cheap ASIC Qpixel real-time H.264/MPEG4-AVC High Definition (level 4.1) Encoder/Decoder 275 mW of power QL305 on a cheap USB2 stick and selling it ASAP into the mass markets http://www.reuters.com/article/pressRelease/idUS10020+02-Jun-2008+BW20080602 ) and to be fair , currently THE Only mass end user HD AVC/H.264 H@L4.1+ Encoder/ Decoder/ Transcoder Co-Processor, is, is there an open SDK and working code examples to use directly on and for this PxVC1100 card /board ?
and if so, were is this focused open code BPI/SDK, and does anyone plan to, or has already written the FFMPEG/x264 spursEngine patchs to use this new WinFast PxVC1100 card ?, have any patchs already been reviewed and posted to the FFMPEG/x264 message boards for inclusion!
were can we get them and the documentation to use it to the fullest programaticly , or is this purely a closed windows only payware Transcodeing shop, that non windows devs/people cant utilise fully if at all?
were can we get them and the documentation to use it to the fullest programaticly , or is this purely a closed windows only payware Transcodeing shop, that non windows devs/people cant utilise fully if at all?
Leadtek Japan says they're going to release the SDK for WinFast PxVC1100 around the end of this month. It contains Toshiba libraries except for hand gesture recognition and upconversion.
http://www.leadtek.co.jp/news_release/sdk_pxvc1100.html
Also, CRI Middleware (Japanese game middleware developer) is offering CRI SpursCoder which is a free H.264 encoder for SpursEngine.
http://criware.jp/spurscoder/
thanks for that "one".
http://translate.google.com/translate?prev=hp&hl=en&u=http://criware.jp/spurscoder/&sl=ja&tl=en
http://translate.google.com/translate?prev=hp&hl=en&u=http://www.leadtek.co.jp/news_release/sdk_pxvc1100.html&sl=ja&tl=en
i cant find a good translator for the text inside the spurscoder.zip
anyone care to do a better job than google on the CriSpursCoder.txt as it looks like it gives you the cli options you can use and shead some light on its abilitys even before anyone english speaking gets to buy and review one of these....
edit: tre31 did a translatiopn
http://forum.doom9.org/showthread.php?p=1239115#post1239115
i also read some info and a link on the doom9 and
http://www.avsforum.com/avs-vb/showthread.php?p=15252430#post15252430 is full of info....
Is the Leadtek card available outside of Japan?
We'd like to get a few for playing around with, but availability seems nil here.
We already mailed with Leadtek, but they don't have a distributor in Belgium, and I don't see it anywhere available in surrounding countries.
Also the US doesn't seem to sell it, or I just don't know any of the retailers who'd sell stuff like that.
Also, does Amazon Japan deliver in Europe? Because that's the only place I found it on.
It will be available outside of Japan soon.
http://www.leadtek.com/spursengine/press_2.aspx?id=1
The SDK is available now.
http://www.leadtek.com/spursengine/
It will be available outside of Japan soon.
http://www.leadtek.com/spursengine/press_2.aspx?id=1
The SDK is available now.
http://www.leadtek.com/spursengine/
I got the SDK, but it isn't really useful without the card.
I've also seen the press release, but it doesn't really indicate when exactly it'll reach the west.
Seeing that it is completely sold out in Japan, it could still take a while if they want to keep up with demand over there, before releasing it anywhere else.
Leadtek "released" it a while back in Taiwan (I think it's about half months after Japan), but I don't know any one is actually selling these here. I'm also interested in this if the price is right, but from the Japanese price it seems to be pretty expensive, about US$300.
We got a reply from Leadtek, and they are willing to send it to us.
Price is $290, shipment included.
Brad Grenz
21-Feb-2009, 04:50
Coolio!
Crossbar
20-Apr-2009, 14:03
http://www.pcworld.com/article/163404/toshibas_new_laptops_sharpen_up_internet_video.htm l?tk=rss_news
Toshiba is putting its quad-core SpursEngine chip to use in several new laptops to improve the quality of Internet video images.
The company's new Qosmio multimedia laptops, which will appear in Japan on Friday before becoming available worldwide, will use the graphics processing chip to clean up video from sites such as YouTube, the company said Tuesday.
The function will work when playing video fullscreen -- not when it's played in a window on a Web site -- and only when using Internet Explorer. Toshiba couldn't immediately explain why it won't work with other Web browsers.
The top-of-the-range G50 includes an 18.4-inch widescreen full high-def LCD screen, 2.66GHz Core2 Duo processor, a 640GB hard disk and dual digital TV tuners. It will go on sale from Friday in Japan and costs around ¥340,000 (US$3,420). Mid- and low-end Qosmio machines will also be offered for ¥260,000 and ¥210,000 respectively.
The computers will also go on sale outside Japan although international launch dates are yet to be fixed.
pjbliverpool
20-Apr-2009, 17:36
Surely it would be more efficient (in terms of silicon) to implement this type of feature on the GPU? Why add a new card for something a cheap GPU could handle equally welll or better?
Well maybe you should read above to judge what 'equally well or better' might be (post #85 on). And I think when we extrapolate out in that direction, suddenly die budgets aren't nearly as black and white.
pjbliverpool
21-Apr-2009, 00:54
Aren't those benchmarks just comparing against a CPU though? Obviously some effort has been put into writing an app that works on Spurs to improve internet video. Why can't a similar app we written in CUDA (or whatever other language GPU's can use) to do the same thing?
No doubt spurs is much smaller in terms of transistors than a mid range R7xx/G92b but does it cost less? And even then, if the GPU's are already present in the system, why not just use them instead of adding yet another co-processor?
It's probably an experiment to test market and help push their SpursEngine (since high end laptops may have enough margin to play). Toshiba has been talking about intelligent TV for a few years now (e.g., TV with video conferencing and other media apps). I don't know if they will add a CPU to their TVs someday; but if they do, all these CPU + SpursEngine exercises will help them get there one day. The SpursEngine is already used in their TVs for upscaling today.
Shifty Geezer
21-Apr-2009, 16:13
I dare say the economics are there. Toshiba haven't thrown Cell into all their devices, instead sticking to what makes sense. If the alternatives were as effective, why would Toshiba have a change of heart and put in a mostly redundant processor? We know SPUs are more effective than GPUs at some tasks - GPUs just aren't flexible enough to do everything yet. eg. Video transcoding has not seen the ebenfits from GPUs that we might have hoped for. I can well believe that SPURSengine offers the most efficient, cost and power-consumption-wise, general purpose high-performance accelerator. I don't see anything in Toshiba's actions to suggest lack of caution or a willingness to use redundant silicon where not needed, hence no Cell-TVs before they had a purpose for them.
rpg.314
21-Apr-2009, 19:05
Well, wouldn't cell be good enough for these (non laptop) purposes? I mean why bother with designing a niche chip when cell can do it well enough. And given cell's volumes, it would be cheap enough to put in high end TV's atleast by now.
I can well believe that SPURSengine offers the most efficient, cost and power-consumption-wise, general purpose high-performance accelerator.
May be, But a video specific one.
Well, wouldn't cell be good enough for these (non laptop) purposes? I mean why bother with designing a niche chip when cell can do it well enough. And given cell's volumes, it would be cheap enough to put in high end TV's atleast by now.
If Toshiba can make it even cheaper, they will do so. Hardware guys usually nickel and dime their BOM cost. They will save millions if the volume is big, like for CE use. The SpursEngine is about 30% smaller (compared to regular SPEs), runs at a lower frequency and consumes 10-20W. OTOH, the PPU is not a fast CPU if they don't use the vector engine. May be Toshiba can pick a more suitable CPU to power SpursEngine for their app (cost/performance wise) ?
May be, But a video specific one.
Not necessarily. The algorithm doesn't see the video (It's just data !). Anything that suits the performance characteristics and profile should fly.
Karoshi
21-Apr-2009, 22:05
When do the OpenCL drivers come out?
I just found this:
http://sites.google.com/site/openclps3/blog
rpg.314
22-Apr-2009, 10:08
Not necessarily. The algorithm doesn't see the video (It's just data !). Anything that suits the performance characteristics and profile should fly.
From the spe POV, yes. It's just that I had mpeg2, h.264 decode/encode cores on spurs engine in mind when I said it.
rpg.314
22-Apr-2009, 10:12
This is cool. But prolly superfluous as I am 100% sure that IBM will release an opencl implementation for cell this year itself.
When/if IBM releases an OpenCL implementation, they will need the open source community developers to help support it. People like Robbie McMahon, the author of the above OpenCL for PS3, could be a key contributor since he would know the fundamentals inside out.
According to his blog, Robbie also did some work in MPICH for PS3 (probably for Argonne National Lab or Loyola in Chicago). The PS3 clusters in Dartmouth and North Carolina State U. use MPICH (and OpenMPI v2.5) too. We should be able to see PS3 OpenCL + OpenMPI clusters soon (although admittedly, MPI already abstracts the hardware away; so OpenCL doesn't buy much there -- except to port OpenCL code from elsewhere)
Aren't those benchmarks just comparing against a CPU though? Obviously some effort has been put into writing an app that works on Spurs to improve internet video. Why can't a similar app we written in CUDA (or whatever other language GPU's can use) to do the same thing?
No doubt spurs is much smaller in terms of transistors than a mid range R7xx/G92b but does it cost less? And even then, if the GPU's are already present in the system, why not just use them instead of adding yet another co-processor?
Yes, those are against CPU. My bad for forgetting that very obvious fact! :) I was blending it mentally with some thread in the 3D section that called out some of the GPU-based IQ algorithms as not being 'all that' essentially.
But to SpursEngine itself, while clearly being dependent on a per (laptop) model basis, strong wattage, die/cost (it is an internally developed chip also which contributes), and profile factors might all play to Spurs in terms of Toshiba's decision making. And though I don't think IQ improvements in Internet video is honestly something that requires extensive benching, we'll just assume that Toshiba out the gate has a better implementation ready to go on Spurs after years of R&D in that arena than it would be able to get going on CUDA with similar results in the near term.
That said, if it were a laptop with a high-end GPU onboard by default (and that GPU had sufficient profile power throttling/states), then yes I could definitely see where the SpursEngines might be redundant vs a GPU-based IQ solution. But there are the aforementioned quality and thermal qualities attached to this though; I haven't been following the mobile GPU scene recently so I'm not sure where the wattage lies at the moment for the upper end.
Well, we can only assume though that whatever the costs along the multitude of axis Toshiba must consider, the SpursEngine is the viable choice for the effort. Else they wouldn't be doing it. And what's left to us is just guesswork unfortunately. :)
Ok, so we got our hands on a couple of Leadtek cards.
Got around to trying some of the samples.
The decoding and encoding seem to work fairly well, although both are limited on input (they only take the standard sample file by default).
Haven't really benchmarked it, just trying some stuff.
Documentation isn't as good as I first thought. Especially not for the face recognition library.
Isn't there anyone else working on Spurs?
I've spent hours searching the net, and I only found about one topic on some forum and maybe 2 or 3 Japanese sites which have something on it regarding code.
I don't have one, but what face recognition library are they using ? Something from open source ?
Nope, something Toshiba made themselves I believe.
They use something they call the CANDI api, which stands for Codec AND Indexing.
Both the codec and the face recognition use this same api.
Some details on CANDI and CRI Middleware:
http://hdpro.jp/interview/index_080718en.html
It is Sofdec for SpursEngine that CRI Middleware is going to develop (Figure 2). This middleware is a wrapper for middleware called CANDI, provided by Toshiba, allowing to easily develop SpursEngine applications. At present, CANDI has some limitations such as fixed resolutions and requires conversions to and from Toshiba's proprietary data format during decoding and encoding. Sofdec will enable to set arbitrary image resolutions and bit rates in order to easily create high-quality videos. In addition, it will allow to directly enter H.264 files.
For CRI Middleware, Sofdec for SpursEngine is merely the first stage of their development. The company plans the second-stage of development, too. That is, according to Mr. Matsushita, going to be software that allows SpursEngine to utilize its genuine capability. To do so, CRI Middleware is aiming at applications in which SpursEngine’s codec logic and SPEs work all together.
Yeah, I've read those.
The thing is, they all seem to do something with the codec part, but not with the face recognition.
I downloaded the SpursCoder, but it's just an executable and a library, no code though (at least for the free personal version). And the documentation is in Japanese, not my strongest language. :razz:
Did you check with the guys who did face recognition on the full Cell ? May be their work can be ported to SpursEngine ? (albeit at approximately half the performance).
I'm not really looking for face detection on Cell, I'm looking for some information on how the API works. The middleware should take care of all the SPE stuff, I'm just trying to write a host application.
The documentation doesn't help me all that much (it may be just me ofcourse).
I did get some steps further though, by trial and error reasoning.
Oh god, I hate those type of projects. :(
All the best !
Yeah, thanks.
The included demo's all just work though. So I have my hopes up a bit.
Need to get a nice demo set up by next Tuesday.
http://www.leadtek.co.jp/multimedia/winfast_hpvc1100_txp_1.html
Shifty Geezer
04-Sep-2009, 22:55
I got all excited when I saw the Super Resolution example image, but clicking it just reveals a marketing illusion where they blurred the true image to create the 'low resolution' image. Suddenly I don't feel particularly interested in learning more about this platform from that link.
randycat99
05-Sep-2009, 06:34
Is it really blurred or just the natural effect that occurs when you scale-up a small piece of an image to a much larger size?
Brad Grenz
05-Sep-2009, 06:45
Naw, in that link it seems pretty clear they used a full screen blur effect to create the "before" picture.
randycat99
05-Sep-2009, 06:49
I guess I misunderstood- were you referring to the original being pre-blurred or the blow-up piece from the original that looks blurred, Shifty?
I thought you were referring to the latter, but maybe you guys are noting the former?
Shifty Geezer
05-Sep-2009, 09:39
This image (http://www.leadtek.co.jp/multimedia/pxvc1100_6.html). the magnisifed sectrions in the ring show a native resolution image at the bottom and a blurred copy above. Something like Guassian blur of radius 3.5-4. They've also applied a sharpness filter to the 'Super-Resoution' photo. It's entirely a fabrication with no example of what the real engine achieves.
Looking further up the article though, they do show some more technical shots that are suggestive of an effective upscaling system. It just be nice to get some real example. If it's that good, I want to see it!
rpg.314
05-Sep-2009, 10:29
Definitely a fake. You cannot sharpen the features the way it has been sharpened here.
dont know how i missed this one ....or did i ;) to many browser pages open lately....
did anyone with a card or intrigrated SpursEngine laptop try this free trial yet on H@L3.1 through H@L4.1 AVC Encoding and put up the resulting clips somewere to compare to the current x264 r1251 http://x264.nl/ AVC Encoding quality yet?, assuming this SpursEngine plugin can actually even do generic High Profile cabac encoding OC...
btw,if you dont want to be messing about with AviSynth scripts to feed x264 you might just use http://forum.doom9.org/showthread.php?t=141441
roozhou's current Direct264 r1251 - special x264 CLI build that supports input from DirectShow.
http://dvformat.digitalmedianet.com/articles/viewarticle.jsp?id=810790
07/30/09
...
Pegasys Adds SpursEngine Functionality to Its TMPGEnc MovieStyle Software
SpursEngine Plug-In Upscales Standard Resolution Files or Quickly Encodes MPEG-2
& H.264 files to View Virtually Anywhere
(July 30, 2009) DMN Newswire--2009-7-30--Pegasys, Inc. (http://tmpgenc.pegasys-inc.com) the company that makes digital video easy, has updated its TMPGEnc MovieStyle software to include SpursEngine functionality via the TMPGEnc Movie Plug-in SpursEngine (sold separately) and AVCHD file input support.
The Movie Plug-in enables high-speed, hardware H.264 and MPEG-2 encoding and standard video upscaling when used with devices embedded with the SpursEngine Media Streaming Processor from Toshiba®. AVCHD file input support lets users import footage from an AVCHD camcorder and convert it for their favorite digital device.
"We've already added SpursEngine functionality to our flagship video converter, TMPGEnc 4.0 XPress, but that is a very technical product," commented Tak Ebine, Pegasys CEO. �??By adding SpursEngine...
"
http://tmpgenc.pegasys-inc.com/en/product/te4xp_spurs.html
again it seems odd to me if the chip makers really want SpursEngine to take off, why they have Not seen fit to talk to and add any patches to FFmpeg,Mcoder, and x264 codebases ...
The WinFast PxVC1100 Video Transcoding Card: Worth The Price?
http://www.tomshardware.com/reviews/leadtek-winfast-pxvc1100,2523.html
Thanks for the link one - surprisingly in depth, and the SpursEngine acquits itself quite well indeed. It would have been nice to have a GPU accelerated scenario to compete against as well, but for the specific purpose the card serves it was a good write-up. I'm actually surprised it turned up at Tom's Hardware as it would seem to me more niche than they usually go.
Shifty Geezer
31-Jan-2010, 09:50
Yes, lack of a CUDA comparison makes it a floating review without a proper comparison base.
Is there still no word on Toshiba's SuperResolution+++ tech's quality? Surely, if it's good, they could license it out to upscaling convertors, which could increase the appeal of SPURSEngine cards.
NeoTechni
06-Mar-2010, 21:50
This image (http://www.leadtek.co.jp/multimedia/pxvc1100_6.html). the magnisifed sectrions in the ring show a native resolution image at the bottom and a blurred copy above. Something like Guassian blur of radius 3.5-4. They've also applied a sharpness filter to the 'Super-Resoution' photo. It's entirely a fabrication with no example of what the real engine achieves.
Zoom in and enhance!
WinFast HPVC1111 Four Wheel Drive SpursEngine 4x4
ftp://ftp.leadtek.com/platform_solution/WinFast%20HPVC1111_Spec02222010.pdf
http://akiba-pc.watch.impress.co.jp/hotline/20100417/etc_leadtek.html
http://ascii.jp/elem/000/000/515/515190/
http://akiba.kakaku.com/pc/1004/16/231500.php
http://ascii.jp/elem/000/000/515/515210/lt1_c_640x480.jpg
Shifty Geezer
19-Apr-2010, 11:38
16 SPUs at 1.8GHz? That's a pretty honking board (http://akiba-pc.watch.impress.co.jp/hotline/20100417/image/wtz1.html)! Those fans are probably noisy too. I'd rather have a large fan over multiple heatsinks.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.