Real Cell comments Inbox
Brad Sharp
<@yahoo.com> to staff
Show options 11:44am (20 minutes ago)
"Finally, IBM won't release performance benchmarks, but they do claim a 10X speedup over a PC in the same power envelope. Take this claim with a large grain of salt, however, because there's no context to it (i.e. on what type of application, vs. what kind of PC, etc. etc.)."
BWAHAHAAA!!!!!
"The way I figure it is that the PPC core, the PPE, is going to be a stripped down version of the POWER series of proccessors"
BWAHAHAAA. From what I've learned, again none of it from your forum of technical MORONS, the Cell is a "gatekeeper" power PC cpu, coupled with 8 vector processing like units. Well what this comment points out, is that that gatekeeper CPU is going to be more like a shitty G4. Not even as good as a regular G4!! It wont even be G5 level, which itself gets smoked by Intel CPU's!!!! Granted, this is only the "main" CPU.
"So its just an in order chip with a bunch of parallel vector units?
Sounds like an amazing chip for DSP work, and scientific calculations, but kind of a let down for the desktop."
"Unless Hannibal's information was wrong, the controller core definitelly isn't anything special. P4 level clockspeed doesn't make up for the fact that it's a 2-way in order design. Keep in mind that this processor has to do the general purpose computing (where extra execution units, out of order execution, and a good branch predictor are important). Depending on how fast their logic is at 90nm and how simple the design is, the pipeline could be quite long also.
So far I think jason_watkins has done the best analysis. I don't really have anything else to add. In the end, I expect the Cell to be an excellent media processor. But it's not going to put Intel and AMD out of business anytime soon."
"Where exactly does all the supposed speed of the Cell come from? I assume the touted speed of 256Gflops is with all the SPEs doing 2 instructions continuously... how real world is this?"
"Also, it was cited somewhere (in the BF, I think...) that one of the G5's weaknesses is not having the best integer execution speed (while having great FP speed) and this was the reason it didn't run general use code too well, because most programs rely on integer math more than FP. But, it seems the Cell's SPEs are geared more for FP. Does this mean it will suffer when not doing FP friendly tasks?
To early for me to tell. By their very nature though SIMD unts solve a certian class of problems well. They are not general purpsoe divices. Now it does seem as though these new units have more capacity to sustain themselve than the run of the mill SIMD facility. The allusion to them being microcomputers should stand out in everyones mind. How those mircrocomputers support general purpose code is another issue all together."
"My main concern is that IBM might be pulling a PS/2. The PS/2 has a powerful distributed architecture capable of impressive performance, but alas it is also baroque and Sony did not supply an array of finished tools to harness all that power. The result is that a lot of games just poked along graphics-wise, coming nowhere near the potential of the hardware."
"Otherwise it seems to me (and like others pointing out) that moving the burden to programmers for this low level stuff like was done in the playstation 2, makes porting code and writing games problematic, and that's something you don't need. x86 tends to get used in embedded platforms because it's easy to program for, from what I understand."
"Am I the only one confused about how they're going to explain to "high end graphics workstation" buyers that a $4000 box has the same chip in it that their kid's $300 PS3 has?
huh?
Really cool stuff, but I'm afraid it looks so weird and specialized that it may be impossible to write decent code for. I remember when the "Emotion Engine" in the PS2 was so advanced that they needed permission from the EU to sell it, but GTA still looks like crap."
"Probably along the lines that Apple does business now. With Apple you get OS X on a 3000+ dollar box, were as you don't currently with the 400+ dollar Walmart PC that is nearly as powerfull.
"
"So IBM/Sony have taken the programmable vertex/pixel shaders out of your modern GPU and hung them, with a little bit of SRAM, off a PPC core... or am I missing something? I'm not sure this will live up to the hype, but I am looking forward to the next installment tomorrow..."
"To me, the big questions about how this would adapt to being in an Apple workstation depends on two things: does the PowerPC portion of the core provide decent performance for general use, and will IBM put the compiler tech that allows Cells to be used efficiently into GCC. If the answer to either of those is no, then this is going to be relegated to being a customized co-processor at best. My guess is that, no matter what's being said at this stage of the game, we won't know the answers on these for a year or so."
"Hi.
Why no one is talking about the memory access? Doesn't it make worse the problem to have the PPC core trying to fed 8 SPEs and himself. The only scenary where the SPEs are good is for computational intensive work without being data intensive work. Even the more simples GPUs today access to 32 MB at least..."
"SPEs are more advanced than DSPs and other co-processor designs. They are full processors, albeit much simpler that the curretn GP processors. They are intended to be autonomous, execute their own code under their own supervision, no hand holding by the PowerPC master. Instead, there is occasional interaction where the PowerPG hands code chunks (APUlets) to SPEs and received results back from them.
In terms of application code, the graphcis processing is cited as the main user of this model; after all it is designed for PS3. However, keep in mind that SPEs are still general purpose, only optimized for number crunching.
The reason why they will benefit all kinds of computers is because all sorts of machines are doing more and more number crunching. For instance, desktop OSes are using more advanced graphics for their UI (Windows Avalon, Macintosh OSX,...) even for the standard 2D interfaces. Just maintaing the UI can consume great amount of CPU's ti! me. Web servers do a lot of text processing, which in effect is integer arithmetic. And so on.
Moreover, typical apps on desktops are changing. There is more and more audio/visual stuff, like music players, instant messaging, tomorrow even video phones,... running at the same time. If we offload all the number crunching onto SPEs then you free up the main processor (PowerPC in this case) and all of a sudden you don't need a 10 Ghz CPU!!! In fact, a simple 2 Ghz PowerPC chip with the help of SPEs could blow away the latest and fastest Intel/AMD offerings. That is the idea.
The problem is like I said, that current application will not automatically take advantage of SPEs. Remember that you C?/Java program starts with main() - 1 thread. They spawn additional threads, of course. The problem is that all threads use the single - main memory!!!! Instead of threads we'll have to program "APUlets". They are similar in many ways except:
1) APUlets has its own memory that is ! separate from main
2) APUlets have to the concept of a "master" with whom they interact (mainly send results back to them)
As an example, of course pure guessing on my part:
public class CRCApulet extends com.ibm.Apulet
{
public void run()
{
byte inputData[] = (byte[]) getContext().getInput("data");
// perform CRC computation
getMaster().sendResult( toBytes(crcResult) );
}
}
So you have an app running on the PowerPC master and it creates this little APUlet and gives it data (say file contents) in the form of byte array. Apulet executes and sends back the result. The "context" is a facade for the local memory and "master" for the PowerPC controller.
So the app could offload CRC calculation on SPEs and thus complete its processing faster!
This doesn't mean that all existing software has to be redone! Only the computationally intensive processing will need to be moved to apulets. The remaining code is fine."
"'m also curious what the Nvidia GPU will be responsible for, like some others~
Why the confusion? The Nvidia GPU will render the graphics.
Of course you're really asking "Why use a GPU when you have all this new processing power available?"
Probably multiple reasons:
Most important reason is probably risk management: By using the Nvidia GPU, Sony does not have to port a rendering pipeline to an entirely new CPU architecture. It can't be trivial to rewrite OpenGL, or whatever Sony uses, to run on Cell. Sony gets an existing rendering implementation that they know works.
Second most important reason: Developer familiarity with using Nvidia GPUs. Why give developers one more hurdle in figuring out the platform.
This may also be a factor: Sony didn't spend all that time/effort/money on Cell just to get rid of the GPU. They want all that power to be available to developers to do other cool stuff with"
"That sure makes all of the shouting of "4GHz!" a lot more sensible. It tells me that the SPEs had better be put to good use though. Doesn't seem like a 2 way inorder design will be breaking into the supercomputing list anytime soon. Call me weird, but I'm almost more interested in this than I am in the SPEs right now. Until we see the SDKs IBM/Sony/whoever can provide to use them, it's certainly going to be difficult to judge performance."
"That's what i was trying to point. Great raw processing power, but how do you feed it with data? I think this approach will make bigger the gap between processor and memory."
"
Let me try. This is largely supposition and inference; I have not read Hannibal's sources. But I think I'm right.
Do not think about the SPE as a coprocessor. Those execute individual instructions. Do not think of the SPE as a processor in parallel with the PPE. The PPE is in front of the SPE, not in parallel with it. Do not think of the SPE as a processor slave of the PPE. While this is not necessarily inaccurate, it is not the best way to understand the SPE.
Think of the SPE as a whole computer in its own right. The SPE is the center of it's own universe. It executes programs, and is particularly good at executing certain classes of number-crunching programs. It has its own private memory (the 256KB LS memory). It has an I/O Controller (the DMAC).
I think it is important to realize that the DMAC is not to be thought of as a memory controller. It is an I/O controller.
In the early 80's we had I/O controllers to access hard drives and s! erial ports. These were high-latency devices that we needed to access in a non-blocking fashion, so that we could do other things with the processor while the I/O request completed. Memory, in contrast, was treated as fast. We waited for memory, because it was fast, and the waits were not long in terms of lots cycles.
Today, memory latency is huge, just like I/O latency was huge in the 80's. Memory access is no longer cheap. It is very expensive. We have maintained the illusion of fast memory using caches and other tricks, but the fact is, memory is slow.
The genius of the SPE, and the Cell in general, is that memory is fast again. Each SPE has a small, fast memory. It also has and I/O controller that treats main system memory like we used to treat hard disks.
The ISA will probably muddy much of this picture by treating memory in a conventional manner, wrt addressing. But it is my suspicion that apulets written with the above attitude in mind will make the b! est use of the system.
There is a reason they chose to present the SPE before the PPE. It is the core of the system. The PPE is a glorified bureaucrat. It will handle the high-level logic for the PS3, but the SPEs will handle the computation. I suspect this will include unit and enemy AI, graphics, sound, media encode/decode, vector math, and everything else that the next generation of Playstation game needs to think about. The PPE, in contrast, will spend its time dealing with network and mass storage I/O, process and apulet scheduling, user input, and other administrative tasks."
"nteresting post, thanks. Can you expand on these bits any more? Specifics about how the SPEs are general purpose? I was under the impression they were like beefed up vector units from the PS2 CPU."
"They're general purpose in the sense that they can execute their own thread, handle branch instructions, etc. But they're obviously not tuned to be fast for anything other than keeping the vector ALU's busy in FPU heavy code (ie games). It's probibly more fair to compare them to the shader cores in GPU's rather than call them general purpose cpu's in the sense of an SMP system"
"There's a lot of bullcrap being posted in this thread. This chip isn't meant for your word processor. It isn't meant for your laptop running OSX and some magic emulation software so that suddenly your iLife applications run at warp speed. It's targeted at exactly one application domain: media processing. And for that domain, it looks to be quite well designed IMHO.
I'd also mention that there have been attached processor cards for x86 for quite a while. They rarely get used outside of very nitch embeded applications. The most general application I'm aware of was for a "MIPS on a Card" rig that came out in the pentium classic era. You could right Lightwave 3D and a few specially written Photoshop filters on it. It, of course, flopped. I don't think intel has anything to fear from cell on the short term. And on the long term, nothing is stopping them from comming out with an x86 system chipset or cpu that has a bundle of attached vector processors with hardware me! mory sync management as well."
"I think it's fairly obvious that the SPE will not be handling graphics. The SPE units combined can only push about 256 GFLOPS max. Even is this theoretical amount could be attained, that only makes it about 50 GFLOPS more than a current top-of-the-line NVIDIA card. Since it's not set to debut for another 2 years inside the PS3, it would have a less-than-stellar debut, being equivalent to what would then be a $150 video card."
"I don't think any of us really expected a VMX/Altivec unit in the controlling processor. It still remains to be seen just how verstile the controlling core is. Before people go ga ga over it getting into a Mac, think of 2 things: first, applications would have to be rewritten to use the SPE's. Second, the space occupied by the SPE's could instead be used for more cache and a 2nd general purpose core, which should yield better performance on desktop applications anyhow.
Edit:
Hannibal just posted in the 2nd article that the controlling core is an in order design. Unless the G4 is a much crappier processor than I imagine, that means the cell controlling core is nothing really worth getting excited about from a mac point of view."
"http://pc.watch.impress.co.jp/docs/2005/0208/kaigaip046.jpg
tells us of a new *dual* controller with a rate of 25.6GBps @ 3.2 Gbps (?!).. they probably wrote this wrong.. (!).. its probably ~ 3.2Gbytes/sec -> ~= 25Gbits/sec.. hmmm..
and since it's dual it is 6.4GBytes/sec..
hmm.. isn't this EXACTLY the i/o controller used on the G5?.. funny thing...
this message started as a quest to find out how the great thirst of the cell would be satisfied.
I now understand that it won't be satisfied (!). So where is the gain with this new architecture? I assume due to the fact that the Altivec on the G5 is almost useless but when it comes to using 8 simpler simd cores it might have a chance to come forward again with this new implementation."
"Strengths of the Cell includes:
1. Multiple SIMD/CPU/Core units - resulting in massive parallelism on a chip.
2. Extremely high bandwidth - 100 Gigabytes a second - compared to current desktops like the PowerMac G5 at 16 gigabytes a second.
Weaknesses:
1. Integer processing speed will only be similar to Intel CPUs unless you can do parallel SIMD work - e.g. with multimedia.
2. Only one full core - the PowerPC core.
3. The need to explicitely optimize code for the Cell, rather than just having the compiler do it - this is the problem with the Itanic processor.
It seems today that the most powerful CPUs are the GPUs by ATI and nVidia. These work similarly by having multiple parallel units. Apple with Mac OS X seems to be on the right track for doing multimedia by taking advantage of GPUs for offloading work from the CPU. I wonder if the PowerMac PowerPC CPU plus ATI/nVidia GPU may actually accomplish - for a personal computer - what the Cell ! is trying to do - with the advantage of having multiple PowerPC cores rather than one as the Cell does, in the future dual-core models.
The fact that the PowerPC core of the Cell can run up to 4 Ghz - in the lab - with a much shorter pipeline than the G5 - makes me question why IBM can't rev up the G5 past 3 Ghz. To me, IBM's inability to do so bodes poorly for being able to produce actual 4 GHz Cells - just as Intel reach it's limits in processor speed."