New article on Cell (good!)

Shifty Geezer said:
As for the future of super chips, it seems to me that whoever invents main RAM that can ran be accessed in 1 cycle direct to the CPU will pave the way forward. Memory accessing still remains the bottleneck more than anything else. Cell's performance benefit is attained by working round this, which isn't always possible.

Given a clock frequency of 3.2GHz cycle time is ~0.3ns. Light travels 10 centimers in that time, electric impulse can travel ~2-4 cm in the same time.

So yes, whoever creates this 1 cycle main memory will make shitloads of money.

Not from RAM, but from the timemachines, hyperdrives and everything else, that requires one to break the fundamental rules of the universe, to work

Cheers
 
Titanio said:
Hehe, good stuff. Kind of surprised they have four projects on the boil!

On another note about Cell and new clients, there are a couple of reports out there saying that the US Department of Homeland Security is using Mercury's Cell systems now e.g.

http://www.technewsworld.com/story/48274.html

Although there's only a couple, so I'm not sure how confirmed that is..


The way I read the report is, "Cell is now being used by Mercury Systems... which sells computers to Department of Homeland Security."

Just kind of establishing who some of the types of customers for these systems might be in the future. But I don't think Homeland Security is on the bandwagon just yet.

Awesome find with the Forbes article btw Titanio.
 
Gubbi said:
Given a clock frequency of 3.2GHz cycle time is ~0.3ns. Light travels 10 centimers in that time, electric impulse can travel ~2-4 cm in the same time.

So yes, whoever creates this 1 cycle main memory will make shitloads of money.

Not from RAM, but from the timemachines, hyperdrives and everything else, that requires one to break the fundamental rules of the universe, to work
:p Not necessarily though. Perhaps an optical transmission system no further than 10cm from the CPU? Or an atomic storage system using charged atoms that keeps the GB of RAM local to the CPU? Faster processors are all very well but without faster RAM access they'll go nowhere. Try sticking a Cell on a 16MHz SIMM from 1990 and see how fast it runs a complex multiobject physics sim! Developing just faster multicore processors isn't going to see a faster processing future. As CPU performance goes hand in hand with RAM performance, I'm surprised the two components are treated as separate entities. I'd have thought it'd be better to develop both hand in hand as an overall system.
 
Asher said:
The Raytheon inclusion seems odd to me. I've worked with them extensively over the past few years, and all of my work with them were on PPC440s and Power4/5s. Their code uses 64-bit floats extensively, I don't see how Cell is of much use to them right now in its current state.

In the defense industry you don't buy hardware to make things work, you buy hardware becuase you have budget that needs to be used up and an image to maintain. I've never worked iwth Raytheon, but I can almost garantee you that ultimately this decision was made as a political move rather than a technical move. The people in the trenches don't mind though, because it means they get a new toy to play with.

Who knows, maybe they'll even get something working on it.

Nite_Hawk
 
DemoCoder said:
I like CELL, but CELL is not the future IMHO, it is 1/2 of the future. I think the future is the combination of throughput style designs like the Niagara chip, and "SPE farm" approach of CELL, that is, lots of additional TLP combined with a large pool of functional units. If you're going to go the route of dropping OoOE and ILP scalability, you need TLP to make up for stalls.


That will hold us over until we get RSFQ and nanorod based designs. :)

I think that OoOE and such optimization for higher single-thread performance will be back in.
They were left out of Cell and x360 CPU so these chips could be done today.
Niagara is different stuff, it's meant to be great for app servers with hundreds of users or such situations with heavy number of threads.

Soon we'll see Opteron quad core, that's the best single-thread performance possible with quite a lot of cores already. Imagine a Cell-like chip with two to four Athlon 64 cores and 16 DSP thingies.
 
Shifty Geezer said:
:p Not necessarily though. Perhaps an optical transmission system no further than 10cm from the CPU? Or an atomic storage system using charged atoms that keeps the GB of RAM local to the CPU? Faster processors are all very well but without faster RAM access they'll go nowhere. Try sticking a Cell on a 16MHz SIMM from 1990 and see how fast it runs a complex multiobject physics sim! Developing just faster multicore processors isn't going to see a faster processing future. As CPU performance goes hand in hand with RAM performance, I'm surprised the two components are treated as separate entities. I'd have thought it'd be better to develop both hand in hand as an overall system.

Yes, people have known this since the 'dawn of time'. Hence we have a lovely thing called a memory heirachy with small caches, then bigger caches, then main memory, then backing stores. You cannot eliminate this problem without sticking it all on the core and that just costs too much to be practical.

Faster processors can and will be made but they will sidestep the issue, most likely by moving away from the Von Neumann architecture we have seen for the past 60yrs (it was forseen back then that we would have this problem). Cell is a step away from this with the concept of local stores but without a proper language to utilise them effectively and transparently people will simply stick to the idea of a heirachical memory structure.
We are now running out of hardware based solutions to the common problems.

It is no longer possible (or becoming so) that we cannot simply crank up chip speed due to all the issues and are having to resort to parallelism. ILP has allowed us to get away with the quirks for a few years but this is running out of steam, we are entering an era where thread-level parallelism is required (Cell strongly highlights this with the SPU concept) and multi-core processors are the way forward (and gradually moving to the concurrency levels provided in the hardware itself). What we still lack though are the tools to effectively use these features, very few compilers can do SIMD optimisations/extract parallelism/exploit the memory heirachy in any intelligent ways.

A lot of functional languages such as SML allow this to be done easily but the compilers used quickly turn it straight into the sequential-code-we've-come-to-expect. Research in this area is beginning (see some of the MS research stuff about transactional memories and C#, and C#Omega) and hopefully it will be fruitful as eventually (not in the near-future perhaps but the distant-future) we will hit hardware limits with the current ideologies.
 
Kryton said:
Yes, people have known this since the 'dawn of time'. Hence we have a lovely thing called a memory heirachy with small caches, then bigger caches, then main memory, then backing stores. You cannot eliminate this problem without sticking it all on the core and that just costs too much to be practical.

Faster processors can and will be made but they will sidestep the issue, most likely by moving away from the Von Neumann architecture we have seen for the past 60yrs (it was forseen back then that we would have this problem). Cell is a step away from this with the concept of local stores but without a proper language to utilise them effectively and transparently people will simply stick to the idea of a heirachical memory structure.

What defines a Von Neumann architecture is that program code is stored in main memory and hence itself is data. This hasn't changed with CELL.

Kryton said:
It is no longer possible (or becoming so) that we cannot simply crank up chip speed due to all the issues and are having to resort to parallelism. ILP has allowed us to get away with the quirks for a few years but this is running out of steam, we are entering an era where thread-level parallelism is required (Cell strongly highlights this with the SPU concept) and multi-core processors are the way forward (and gradually moving to the concurrency levels provided in the hardware itself). What we still lack though are the tools to effectively use these features, very few compilers can do SIMD optimisations/extract parallelism/exploit the memory heirachy in any intelligent ways.

CELL is particularly bad at exploiting thread level parallelism past the number of SPUs because the SPUs are basically impossible to virtualize and hence hard to context switch when stalled for memory. SUNs Niagara on the other hand is a good example of a CPU tweaked for thread parallelism (at the cost of single thread performance).

Cheers
Gubbi
 
Last edited by a moderator:
No matter what the future in CPU design is, you can bet Sony and IBM will be there first. Sony and IBM with the EE and now Cell are light years infront of everyone else.
 
!eVo!-X Ant UK said:
No matter what the future in CPU design is, you can bet Sony and IBM will be there first. Sony and IBM with the EE and now Cell are light years infront of everyone else.

More like light seconds. Which is still quite a long distance.

(i only wrote that cause it sounded cool, not because i think they're actually ahead of anyone)
 
Wouldn't STI try to stay ahead due to software maturity and community support by the time other CPUs come out. Presumably there will be more variants of Cell by that time (multiple PPEs, Double Precisions, ...).

This is especially true if they are aggressive in embedding Cell into appliances and PCs, implement all sorts of standard libraries like OpenGL ES on top of Cell, plus open source Cell SDKs and products such as LocationFree SDK. Should the living room scene or other niche areas explode into high growth market, it may be too late for the other players to catch up in rich media hardware processing.
 
Asher said:
The Raytheon inclusion seems odd to me. I've worked with them extensively over the past few years, and all of my work with them were on PPC440s and Power4/5s. Their code uses 64-bit floats extensively, I don't see how Cell is of much use to them right now in its current state.

I would imagine using them in whatever next-generation satellite/"communications system" the NSA can dream up. I would imagine some sort of Cell-based Echelon system where the NSA would have the ability to "monitor" communications from other Cell based systems. Whenever I think of Raytheon I think NSA and vice versa, "no domestic charter", lol, where have I heard those words before...I digress, sorry.

I like the Raytheon boys (and ladies) pretty smart people, not necessarily NSA smart, but still good at their jobs.

I would guarantee that there is a defense contractors lobbyist that is pitching this as the next "must-have" thing.
I'm not sure who will pitch this to the military, but I would imagine Special Operations/Special Forces would get first crack at the super cool toys, then someone will
likely move on to the individual branches and start collecting the billion dollar checks.
 
Blazkowicz_ said:
I think that OoOE and such optimization for higher single-thread performance will be back in.
They were left out of Cell and x360 CPU so these chips could be done today.

Niagara is different stuff, it's meant to be great for app servers with hundreds of users or such situations with heavy number of threads.

Soon we'll see Opteron quad core, that's the best single-thread performance possible with quite a lot of cores already. Imagine a Cell-like chip with two to four Athlon 64 cores and 16 DSP thingies.

If you read here:

Our sources have indicated that the POWER6 will be a deeply pipelined 4-issue CPU, with OOO capabilities that are more along the lines of the 604e rather than the POWER5 or Pentium Pro. Most likely, the POWER6 will be a dual core device, although there is a very slight chance it may be a 4-way CMP.

Could be a design shift away from complex OoOE in the near future at least from IBM.
 
I wished they'd just make up their minds! Etiher go with OOO or IO, but stick with it and allow the tools and developers to learn to work with it! If you keep changing the way things work you introduce a relearning stage each CPU.
 
yep there is a need for a processor with the best of everything. OOOE, branch prediction, good for general purpose computing, good for media processing, good for 3D rendering.

Why does Intel's Platform 2015 sound like the best thing in development ?
anything better?
 
EDit
Does somebody can expain me if it could make sence to add a huge L3 external cache made with very fast and sell processors with more than one chip (aka the xbox gpu+edram ) cpu + L3 memory chips. I don't know how much (in Mo) edram for exemple
can't be put in the die size of a PIV and if would be usefull?
Edit
Stupid, the eib in cell move somewhere 200gb/s on chip with i think good latency, while xenos and edram connect via 32gb/s bus and certainly very bad latencies ( vs on chip latency).
I find the answer myself even if memory go better, bus and connections between chips is the limitating factor assuming we nowhere near putting a whole pc (not your grand ma one lol) on one chip.
time to go to bed :(
edit2
I try to do better have you (all of you read) the anandtech article about SUn new ultra sparc T1?
It's an interessting exemple of multicore implemantation for a very specific goal (data server).
 
Last edited by a moderator:
Intel's Platform 2015 sounds very much like CELL. So Sony/Toshiba/IBM have a 10 years advantage on Intel. Intel's Platform 2015 is an endorsement for CELL.
 
Edge said:
Intel's Platform 2015 sounds very much like CELL. So Sony/Toshiba/IBM have a 10 years advantage on Intel. Intel's Platform 2015 is an endorsement for CELL.

I would not say STI has a 10 year advantage on Intel. Platform 2015, at its fullest, is way, way, way more complex and advanced than Cell.

If the a full-strength Platform 2015 arrived all at once in 2015 as an actual CPU, it would almost be time for the successor to Cell-architecture arrive, if we believe IBM when it says Cell architecture should last 10 years. by 2015, the PS4 would be half way through its life cycle using a next-gen Cell CPU, and the successor to Cell architecture would be known of even if not out.
 
Back
Top