AzBat said:
I understand that Sony Cell and IBM Cell are not the same, but you still haven't gave me any comparisons on how they different. Just remember I didn't understand any of the technical mumbo-jumbo you just posted. I just want to be able to figure out what FLOPS, cores and APUs the Sony version will have and how it relates to these slides. Is it as DM says and that Sony won't release a 2 Tera FLOP capable part for PS3 or is it that Sony's PS3 is more capable than even IBM's Blue Gene? I'm totally confused.
Hey Tom,
The concept of a
Cellular Architecture as I've come to know it is pretty simple to understand and is the embodiment of three major architectural ideologies to achieve high performance in the areas where typical IC's get trapped by the Von Neumann bottleneck*. These three are:
The "pure" concept is this: You say F-you to the high superscalar with hundreds of instructions, long pipelines, a large area % utilized by prediction routines and such - and instead you choose a paradigm that takes a simple pipeline with a reductionist instruction set and significant embedded RAM and create a fundamental construct, a
Cell. This is the
Simple part.
Next, you hit the copy and paste button a given number of times that
should be bounded by 10^
N. This is the vastly
Parallel part.
Finally, you interconnect said
Cells based on locality, thus bounding them to direct communications with a given number of
Cell within three spatial dimensions. By doing do, you eliminate a central control point from the architecture and it's basically a vast finite state machine. This is the
Local part.
The ideology behind this "pure" Cellular ideal is pretty sound for general supercomputing needs, I like it due to it being a pseduo-analogue to a closed neural system - which, well, nevermind. It's also been shown in studies, as well as comments by people here, that x86 is a tremendous waste of logic and area when all is said and done - Cellular computing got around this threw the simple part where it traded like ~20% of absolute performance for a gain of ~50% die space back.
---
Now, STI's
Cell vis-Ã -vis the Suzuoki Patent is a different beast all together. It's definitly been designed for a post-modern broadband environment where they've done away with the
Locality step all together if Suzuoki's patent is true (which I tend to believe as he patented the EE in 1997). That's the first thing that's apparent and it's what will open the door for any computation or data sharing with other
STI Cell devices.
Then, when you look at the actual microarchitecture, they've [STI] basically done away with the whole
Simple step when it comes to a global view. They've obviously designed it with 3D, AI or World Sim type applications in mind as they've loaded the bulk of the processor's area up with dedicated logic blocks [eg. APUs] which have most likely been area optimized - perhaps
utilizing this..
Thus, we can expect the "pure" Cell ideology of area efficiency to carry over in the individual constructs - as seen in the APUs, or the PE cores which I'd hazard a guess are stripped down PPC or MIPS cores. This is a relationship that would seem to be born out of necessity - as you can see by the Blue Gene chart DM is championing, you won't get a plurality of general purpose MIPS or PPC pipelines to reach 1TFLOP; you need to revert to dense/coarse-grained constructs of dedicated logic - ergo the APU. So, there is this balance between striving for a single, "pure" cellular construct and getting the computational power you desire out of a given area,
A (perhaps, 250<
A<320) which is capable of manufacturability.
Also, the eDRAM would seem to remain as it's an integral part of the entire ideology. But, I don't need to get into this with people like Panajev, so that's enough for now.
And the, finally, is the
Parallel stepping. Which
STI Cell isn't really 1-to-1 comparable. Consider this, while each IC has a coarser granularity and thus won't be bound by the 10^
N bound per IC/per system - it has broken free of the locality requirement. Thus, it has a
theoretical upperbound of the aggregate of every
STI Cell device in existence. In Praxis, this will be much more limited, but for IBM servers or Computing-as-Utility or within a Sony/Toshiba living room, this could potentially kick in nicely.
Thus, in conclusion I think it's become apparent that while
STI Cell shares many fundamentals of the Cellular Architectural ideal, it's a set-piece, custom IC that deviates from it as several major junctions. In fact, personally, I would question if it's a Cellular Architecture when viewed as a singular IC - perhaps Mfa's aptly termed,
Memory Anemic Supercomputing array (or something to that affect) is more correct. But, what would I know... I have little formal training in architectures, and WTF do us in the Neurosciences know?!?
I'm probably forgetting stuff, I always do. Yet, I need to get going, so if you see a problem PM me or correct it outright. Unless your name is Chap or DMGA - in which case I really don't want to hear it.
* This being the [increasing] differential between RAM access times which are high multiples of clock cycles and lead to stalls and such. This necessitates logic to predict and deal with masking this cost, thereby further leading your down the rabbit-hole of hemorrhaging logic on non-computational areas with decreasing benefit.