ISSCC 2005

PC-Engine · Feb 9, 2005

archie4oz said:
My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow?

Click to expand...

eDRAM would be *much* easier than logic to deal with...

Plus I dunno about the slow part... While eDRAM has a longer latency than SRAMs do, the much higher density offered by eDRAM mean less wire-delay than you get with SRAMs which can almost offset the latency penalty suffered by eDRAMs...

I'm not really sure about that. Memory is always used as a testbed for new process technologies because of their simplicity and uniformity which is easy to duplicate without defects however, eDRAM is a little different. eDRAM integration manufacturing is relatively new and immature compared to SDRAM and logic manufacturing. There's also the size/speed tradeoff issue. AFAIK eDRAM cannot scale to the same clock rates as SRAM that's why you only see eDRAM used in lower clocked processors or operate at slower than chip speeds. They only provide wide buses inchip to increase bandwidth instead of high clock speeds.

Again if you look at the EE+GS@90nm the logic is supposedly 90nm but the eDRAM is 130nm which says a lot...

Panajev2001a · Feb 9, 2005

Megadrive1988 said:
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

As PC-Engine said, I think they realized what needed a boost was random access performance and e-DRAM would not help there: the design of the EIB, the memory bandwidth provided by the XDR memory interface, the size of the PU's L2 cache and the doubling of the Local Storage for each APU were chosen to produce an over-all more balanced architecture IMHO.

Gubbi · Feb 9, 2005

Panajev2001a said:
Megadrive1988 said:

Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

Click to expand...

As PC-Engine said, I think they realized what needed a boost was random access performance and e-DRAM would not help there: the design of the EIB, the memory bandwidth provided by the XDR memory interface, the size of the PU's L2 cache and the doubling of the Local Storage for each APU were chosen to produce an over-all more balanced architecture IMHO.

Or maybe it just went on a die diet. who knows ?

With 288 GFLOPS peak performance, CELL is fairly certain to be memory starved.

Sony demoed an eDRAM block at ISSCC one or two years back, that had over a TB/s bandwidth (700GB/s read and store bandwidth AFAICR), an abvious application would have been PS3 (where else would it go).

Somewhere along the way it got removed, possibly in connection with the Nvidia GPU bailout.

Cheers
Gubbi

JF_Aidan_Pryde · Feb 9, 2005

David Kirk said one of the reasons why NVIDIA doesn't use eDRAM is because eDRAM is always one generation behind logic in fab process. So if you can manufacture Cell with eDRAM at 90nm, then you can manufacture it without at 65nm. Just a rough example.

sir doris · Feb 9, 2005

Don't nVidia's latest budget cards use small amouts of eDRAM now?

Could this be a test for future products using it on a bigger scale?

Would these products have been in development before Sony started to colaborate with nVidia :?:

rendezvous · Feb 9, 2005

sir doris said:
Don't nVidia's latest budget cards use small amouts of eDRAM now?

No.

Megadrive1988 · Feb 9, 2005

sir doris said:
Don't nVidia's latest budget cards use small amouts of eDRAM now?

nope. no known consumer or professional/commercial/industrual Nvidia GPU has any eDRAM whatsoever. all Nvidia GPUs have small pools of cache but that does not count at eDRAM.

London Geezer · Feb 9, 2005

I thought some budget and notebook NVIDIA chips had eDRAM but i might have misunderstood.

Megadrive1988 · Feb 9, 2005

Panajev2001a said:
Megadrive1988 said:

Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

Click to expand...

As PC-Engine said, I think they realized what needed a boost was random access performance and e-DRAM would not help there: the design of the EIB, the memory bandwidth provided by the XDR memory interface, the size of the PU's L2 cache and the doubling of the Local Storage for each APU were chosen to produce an over-all more balanced architecture IMHO.

interesting Panajev thanks. I see now that they choose to boost performances in areas where eDRAM would not have helped.

now what do you think of the initial Cell's 234 million trannie count ? more than you expected, less than you expected, or about what you expected? given that there is no eDRAM.

Panajev2001a · Feb 9, 2005

A bit more than I expected, but then I was underestimating the size of the PU and its L2 as well as the LS for each SPU/APU: our original predictions were thought around 128 KB of LS for each SPU/APU.

Legend:

LS = Local Storage.

SPU = Synergistic Processor Unit = APU = Attached Processor Unit.

Megadrive1988 · Feb 9, 2005

Panajev2001a said:
A bit more than I expected, but then I was underestimating the size of the PU and its L2 as well as the LS for each SPU/APU: our original predictions were thought around 128 KB of LS for each SPU/APU.

Legend:

LS = Local Storage.

SPU = Synergistic Processor Unit = APU = Attached Processor Unit.

yup, the APU ~ Attached Processor Unit = SPU ~ Synergistic Processor Unit = SPE ~ Synergistic Processor Element. each of those APUs ` SPUs ` SPEs pack in 21 million trannies * 8 = 168 million trannies. two thirds of the transistors are LS memory, one third is logic. if i'm not mistaken.

Megadrive1988 · Feb 9, 2005

next question, an this is a biggie, even if I dont make a large post about it.

how is it going to work, as far as building larger, more powerful Cell processors or smaller less powerful Cell processors? does each specific desired Cell chip need to be developed and tested extensively (by IBM) or can custom Cell processors be constructed like legos? (something i am amusing myself with, 'lego-cells', lol) by bolting on more APUs/SPUs/SPEs. And it is known that even the SPUs themselves can be beefier with more FPUs. can only IBM do this, or can all the STI partners do this (quickly make custom Cell processors for any given use), and further, if and when Cell is licenced to companies outside of STI, can they also build their own custom Cell processors? The specific Cell processor, 1 PU/PPE: 8 SPUs, 4 FPUs per SPU, shown at ISSCC this week taped out in January 2004. and this is the first prototype chip. how quickly can other Cell processors come out now that the first proto is up and running (and its been running since before ISSCC, i know).

in other words, how is Cell's modular nature going to be put to use in practice?

lets use the Lego analogy. before Legos existed, Legos had to be invented. once Legos had been invented and produced, you could buy Legos and make whatever you wanted out of those Legos. within the limits of Legos of course. now that the Cell building-blocks have been developed, how easy or difficult will it be to make specific Cell processors to fit various needs?

one · Feb 9, 2005

Megadrive1988 said:
next question, an this is a biggie, even if I dont make a large post about it.

how is it going to work, as far as building larger more powerful Cell processors or smaller less powerful Cell processors? does each specific desired Cell chip need to be developed and tested extensively (by IBM) or can custom Cell processors be constructed like legos? (something im amusing myself with, 'lego-cells', lol) by bolting on more APUs/SPUs/SPEs. And it is know that even the SPUs themselves can be beefier with more FPUs. can only IBM do this, or can all the STI partners do this (quickly make custom Cell processors for any given use), and further, if and when Cell is licenced to companies outside of STI, can they also build their own custom Cell processors? The specific Cell processor, 1 PU/PPE: 8 SPUs, 4 FPUs per SPU, shown at ISSCC this week taped out in January 2004. and this is the first prototype chip. how quickly can other Cell processors come out now that the first proto is up and running (and its been running since before ISSCC, i know).

in other words, how is Cell's modular nature going to be put to use in practice?

I doubt Cell is licensed for other companies to build, while Power architectures are licensed and outsourced now to gain the market share. Do you think Intel license others for the Pentium production? IIRC Kutaragi said that they'd manufacture Cell and sell them to outside, in addition to selling to Sony Group companies.

For 4-SPE Cell, they can manufacture 8-SPE processors and kill 4 of them.

Megadrive1988 · Feb 9, 2005

one said:
Megadrive1988 said:

next question, an this is a biggie, even if I dont make a large post about it.

how is it going to work, as far as building larger more powerful Cell processors or smaller less powerful Cell processors? does each specific desired Cell chip need to be developed and tested extensively (by IBM) or can custom Cell processors be constructed like legos? (something im amusing myself with, 'lego-cells', lol) by bolting on more APUs/SPUs/SPEs. And it is know that even the SPUs themselves can be beefier with more FPUs. can only IBM do this, or can all the STI partners do this (quickly make custom Cell processors for any given use), and further, if and when Cell is licenced to companies outside of STI, can they also build their own custom Cell processors? The specific Cell processor, 1 PU/PPE: 8 SPUs, 4 FPUs per SPU, shown at ISSCC this week taped out in January 2004. and this is the first prototype chip. how quickly can other Cell processors come out now that the first proto is up and running (and its been running since before ISSCC, i know).

in other words, how is Cell's modular nature going to be put to use in practice?

Click to expand...

I doubt Cell is licensed for other companies to build, while Power architectures are licensed and outsourced now to gain the market share. Do you think Intel license others for the Pentium production? IIRC Kutaragi said that they'd manufacture Cell and sell them to outside, in addition to selling to Sony Group companies.

For 4-SPE Cell, they can manufacture 8-SPE processors and kill 4 of them.

Okay well forget licencing to companies outside of STI, then, for now. the question still remains, how easily will it be for STI to build more powerful Cell processors. lets say that a very beefy Cell Processor Element is needed. one with 16 APUs~SPUs~SPEs and each gets more than 4 FPUs.

Panajev2001a · Feb 9, 2005

Adding FPU's to each SPU/APU does not seem like it would be a good idea: it would break the SPU ISA and I do not think it is what we wnat to do with CELL.

Scaling can come in number of SPU's/APU's in each PE, in size of L1 and L2 caches for the PU, in clock frequency of PU and SPU's/APU's, etc...: all things that are "invisible" to the programmer (sure, the code that needs a pipeline of 8 SPU's/APU's will probably not run on a PE set-up with only a single SPU's).

passerby · Feb 9, 2005

One more presentation to go - the one on memory bus/interface design.

archie4oz · Feb 9, 2005

PC-Engine said:
archie4oz said:

My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow?

Click to expand...

eDRAM would be *much* easier than logic to deal with...

Plus I dunno about the slow part... While eDRAM has a longer latency than SRAMs do, the much higher density offered by eDRAM mean less wire-delay than you get with SRAMs which can almost offset the latency penalty suffered by eDRAMs...

Click to expand...

I'm not really sure about that. Memory is always used as a testbed for new process technologies because of their simplicity and uniformity which is easy to duplicate without defects however, eDRAM is a little different. eDRAM integration manufacturing is relatively new and immature compared to SDRAM and logic manufacturing. There's also the size/speed tradeoff issue. AFAIK eDRAM cannot scale to the same clock rates as SRAM that's why you only see eDRAM used in lower clocked processors or operate at slower than chip speeds. They only provide wide buses inchip to increase bandwidth instead of high clock speeds.

Again if you look at the EE+GS@90nm the logic is supposedly 90nm but the eDRAM is 130nm which says a lot...

As mentioned before, eDRAM processes are typically a generation behind logic process (and this holds true at Sony as well), although now Sony does have a 90nm eDRAM process in use...

As for clock speed, Sony was showing off 700MHz+ 32Mbit eDRAM macros several years ago, and there are several vendors who offer eDRAM up to 1GHz (typically on 130nm).

Yes, eDRAM presents it's own fair share of problem in implementing. The stuff's been around since the early 90's but most IC manufacturers have avoided it because of costs. SER (soft error rate) can be on problem (which SRAMs don't suffer nearly as much), although IBM has supposedly gotten that down within an acceptable competitive level with SRAMs. The other major problem is power consumption. This is where eDRAM can run into clock speed problem, mainly because it's main benefit over SRAM is density, however it also means more power consumption (especially when not accessing that particular cell). You can of course alleviate that with less eDRAM, but then what's the point? (other than smaller die size). However, Toshiba, Sony, and NEC have made a lot of improvements in the power consumption department. How much I can't quantify. But the performance and reliability of eDRAM is getting to the point these days that IBM's had this big internal movement to replace SRAMs with eDRAMs eventually on a lot of their future designs...

Megadrive1988 · Feb 9, 2005

Panajev, ok understand Cells can be scaled in the ways you mentioned. but I also remember it being mentioned numerous times that the number of FP units and Integer units in the APUs~SPUs can also be scaled. so now that is not a good option?

now onto something else, I saw some comments posted on Gaming-Age Forum, which originate from somewhere else. someone's inbox, or another forum, but anyway here, read and be amused, agree, disagree or whatever.

Bwahahaaa! The more info that comes out about Cell, the more I like it. Reading some comments at Slashdot and Ars Technica has given me some real insight. Unlike your posters who are the biggest bunch of technical idiots I have ever seen. What morons. Most everything posted about cell at youe forum is ignorant hyperbole. "Intel is doomed. Cell is 100 times more powerful. You cant even compare how powerful cell is. It's going to be on 45 nm process. Orders of magnitude more powerful than Xbox2". Like 12 year olds. It's amazing how technically stupid your forum is, seriously.

Anyways, like I said, Cell is anything from intriguing to a disaster. Some actual valid information I gleaned from forums that know what they're talking about RE Cell:

Current GPU's can already do 200 Gflops. Cell supposedly does 256. In other words, in 2 years a $150 video card will smoke Cell. AHAHAHAAA. Predictable. The PC architecture destroys another pretender. I knew it would. Put it this way, there was no way Cell wouldn't be outclassed by PC's in six months from it's release. I knew that all along, which made the Cell hyperbole spewing from your forums more laughable.

Cell has theoretical performance ten times current PC's. This is the 8 SMD version. The 4 SMD version in PS3 would then be at most 5 times as powerful! At most! BWAHAHAHAAA PART TWO!!! 5 times for only the CPU (AT MOST) and all the sudden, that's pretty meaningless to XBOX2 especially considering another factor known all along, that CPUS DONT MATTER NEARLY AS MUCH AS GPUS IN GAMING!!!!! Another fact lost on your Sony trolls. All along in my mind this has made the cell hype stupid! It's the CPU! I dont care if it's such a great CPU it cooks toast, it's still only the CPU!

Isn't it amazing how "orders of magnitude" becomes "5 times" (if you're lucky) once real specs of this turkey come out! How not surprising!

IBM wont release any meaningful Cell benchmarks so far. OOoooh, big surprise, not.

None wants the Cell for mobile devices. It's stupid. It's overkill. There's no point. It's power hungry, too expensive, WHY? Just use a via low power chip or something!!!

No one wants the Cell for desktop. From what I gather it will be a good multimedia CPU (IE PS3) but not good for general purpose desktop computing. OOps! No mobile, no desktop, No market!!!! BWAHAHAHAAAA.

I can just see the CPU establishment, Intel, AMD, Microsoft, laughing at this thing. They've got nothing to fear.

Send my condolences to Gofreak!! And tell him keep running that Sony damage control while I laugh at him BWAHAHAAAA!!!!! Even he knows it's not half as good as he thought.

Why am I sending this to your generic e-mail? I just felt the need to comment on the idiocy and well, it's too much pain to get approved an even if I was I'd just get banned at your forum for saying Cell aint all it's cracked up to be. We know there's not free speech at your forum to tell the truth!

I will send another e-mail with good comments from Ars etc. You know, actual real knowledge instead of the technical IDIOCY at yor forum!

Haha, laugh it up. Brad Sharp
<@yahoo.com> to staff
Show options 11:51am (13 minutes ago)
OK, that last email was way too long and stupid, I just recommend you point your Ga'ers to the Ars Technica discussions

Here:http://episteme.arstechnica.com/eve.../m/275002379631

and Here:http://episteme.arstechnica.com/eve.../m/398003679631

They'll learn a lot more than they will from "Gofreak", I'll tell you that. And certainly, not all of it or even most bad about the Cell. Cell might be pretty good I dont deny that. I'm just saying, the hype on your forum was just ridiculous. Cell is NOT going to revolutionize computing, destroy Intel and Microsoft, be "orders of magnintude" more powerful than anything else, or any of the other stupidity your forum was spewing.

Real Cell comments Inbox

Brad Sharp
<@yahoo.com> to staff
Show options 11:44am (20 minutes ago)
"Finally, IBM won't release performance benchmarks, but they do claim a 10X speedup over a PC in the same power envelope. Take this claim with a large grain of salt, however, because there's no context to it (i.e. on what type of application, vs. what kind of PC, etc. etc.)."

BWAHAHAAA!!!!!

"The way I figure it is that the PPC core, the PPE, is going to be a stripped down version of the POWER series of proccessors"

BWAHAHAAA. From what I've learned, again none of it from your forum of technical MORONS, the Cell is a "gatekeeper" power PC cpu, coupled with 8 vector processing like units. Well what this comment points out, is that that gatekeeper CPU is going to be more like a shitty G4. Not even as good as a regular G4!! It wont even be G5 level, which itself gets smoked by Intel CPU's!!!! Granted, this is only the "main" CPU.

"So its just an in order chip with a bunch of parallel vector units?

Sounds like an amazing chip for DSP work, and scientific calculations, but kind of a let down for the desktop."

"Unless Hannibal's information was wrong, the controller core definitelly isn't anything special. P4 level clockspeed doesn't make up for the fact that it's a 2-way in order design. Keep in mind that this processor has to do the general purpose computing (where extra execution units, out of order execution, and a good branch predictor are important). Depending on how fast their logic is at 90nm and how simple the design is, the pipeline could be quite long also.

So far I think jason_watkins has done the best analysis. I don't really have anything else to add. In the end, I expect the Cell to be an excellent media processor. But it's not going to put Intel and AMD out of business anytime soon."

"Where exactly does all the supposed speed of the Cell come from? I assume the touted speed of 256Gflops is with all the SPEs doing 2 instructions continuously... how real world is this?"

"Also, it was cited somewhere (in the BF, I think...) that one of the G5's weaknesses is not having the best integer execution speed (while having great FP speed) and this was the reason it didn't run general use code too well, because most programs rely on integer math more than FP. But, it seems the Cell's SPEs are geared more for FP. Does this mean it will suffer when not doing FP friendly tasks?

To early for me to tell. By their very nature though SIMD unts solve a certian class of problems well. They are not general purpsoe divices. Now it does seem as though these new units have more capacity to sustain themselve than the run of the mill SIMD facility. The allusion to them being microcomputers should stand out in everyones mind. How those mircrocomputers support general purpose code is another issue all together."

"My main concern is that IBM might be pulling a PS/2. The PS/2 has a powerful distributed architecture capable of impressive performance, but alas it is also baroque and Sony did not supply an array of finished tools to harness all that power. The result is that a lot of games just poked along graphics-wise, coming nowhere near the potential of the hardware."

"Otherwise it seems to me (and like others pointing out) that moving the burden to programmers for this low level stuff like was done in the playstation 2, makes porting code and writing games problematic, and that's something you don't need. x86 tends to get used in embedded platforms because it's easy to program for, from what I understand."

"Am I the only one confused about how they're going to explain to "high end graphics workstation" buyers that a $4000 box has the same chip in it that their kid's $300 PS3 has?

huh?

Really cool stuff, but I'm afraid it looks so weird and specialized that it may be impossible to write decent code for. I remember when the "Emotion Engine" in the PS2 was so advanced that they needed permission from the EU to sell it, but GTA still looks like crap."

"Probably along the lines that Apple does business now. With Apple you get OS X on a 3000+ dollar box, were as you don't currently with the 400+ dollar Walmart PC that is nearly as powerfull.

"

"So IBM/Sony have taken the programmable vertex/pixel shaders out of your modern GPU and hung them, with a little bit of SRAM, off a PPC core... or am I missing something? I'm not sure this will live up to the hype, but I am looking forward to the next installment tomorrow..."

"To me, the big questions about how this would adapt to being in an Apple workstation depends on two things: does the PowerPC portion of the core provide decent performance for general use, and will IBM put the compiler tech that allows Cells to be used efficiently into GCC. If the answer to either of those is no, then this is going to be relegated to being a customized co-processor at best. My guess is that, no matter what's being said at this stage of the game, we won't know the answers on these for a year or so."

"Hi.

Why no one is talking about the memory access? Doesn't it make worse the problem to have the PPC core trying to fed 8 SPEs and himself. The only scenary where the SPEs are good is for computational intensive work without being data intensive work. Even the more simples GPUs today access to 32 MB at least..."

"SPEs are more advanced than DSPs and other co-processor designs. They are full processors, albeit much simpler that the curretn GP processors. They are intended to be autonomous, execute their own code under their own supervision, no hand holding by the PowerPC master. Instead, there is occasional interaction where the PowerPG hands code chunks (APUlets) to SPEs and received results back from them.

In terms of application code, the graphcis processing is cited as the main user of this model; after all it is designed for PS3. However, keep in mind that SPEs are still general purpose, only optimized for number crunching.

The reason why they will benefit all kinds of computers is because all sorts of machines are doing more and more number crunching. For instance, desktop OSes are using more advanced graphics for their UI (Windows Avalon, Macintosh OSX,...) even for the standard 2D interfaces. Just maintaing the UI can consume great amount of CPU's ti! me. Web servers do a lot of text processing, which in effect is integer arithmetic. And so on.

Moreover, typical apps on desktops are changing. There is more and more audio/visual stuff, like music players, instant messaging, tomorrow even video phones,... running at the same time. If we offload all the number crunching onto SPEs then you free up the main processor (PowerPC in this case) and all of a sudden you don't need a 10 Ghz CPU!!! In fact, a simple 2 Ghz PowerPC chip with the help of SPEs could blow away the latest and fastest Intel/AMD offerings. That is the idea.

The problem is like I said, that current application will not automatically take advantage of SPEs. Remember that you C?/Java program starts with main() - 1 thread. They spawn additional threads, of course. The problem is that all threads use the single - main memory!!!! Instead of threads we'll have to program "APUlets". They are similar in many ways except:
1) APUlets has its own memory that is ! separate from main
2) APUlets have to the concept of a "master" with whom they interact (mainly send results back to them)

As an example, of course pure guessing on my part:

public class CRCApulet extends com.ibm.Apulet
{

public void run()
{
byte inputData[] = (byte[]) getContext().getInput("data");
// perform CRC computation
getMaster().sendResult( toBytes(crcResult) );
}

}

So you have an app running on the PowerPC master and it creates this little APUlet and gives it data (say file contents) in the form of byte array. Apulet executes and sends back the result. The "context" is a facade for the local memory and "master" for the PowerPC controller.

So the app could offload CRC calculation on SPEs and thus complete its processing faster!

This doesn't mean that all existing software has to be redone! Only the computationally intensive processing will need to be moved to apulets. The remaining code is fine."

"'m also curious what the Nvidia GPU will be responsible for, like some others~
Why the confusion? The Nvidia GPU will render the graphics.

Of course you're really asking "Why use a GPU when you have all this new processing power available?"

Probably multiple reasons:

Most important reason is probably risk management: By using the Nvidia GPU, Sony does not have to port a rendering pipeline to an entirely new CPU architecture. It can't be trivial to rewrite OpenGL, or whatever Sony uses, to run on Cell. Sony gets an existing rendering implementation that they know works.

Second most important reason: Developer familiarity with using Nvidia GPUs. Why give developers one more hurdle in figuring out the platform.

This may also be a factor: Sony didn't spend all that time/effort/money on Cell just to get rid of the GPU. They want all that power to be available to developers to do other cool stuff with"

"That sure makes all of the shouting of "4GHz!" a lot more sensible. It tells me that the SPEs had better be put to good use though. Doesn't seem like a 2 way inorder design will be breaking into the supercomputing list anytime soon. Call me weird, but I'm almost more interested in this than I am in the SPEs right now. Until we see the SDKs IBM/Sony/whoever can provide to use them, it's certainly going to be difficult to judge performance."

"That's what i was trying to point. Great raw processing power, but how do you feed it with data? I think this approach will make bigger the gap between processor and memory."

"
Let me try. This is largely supposition and inference; I have not read Hannibal's sources. But I think I'm right.

Do not think about the SPE as a coprocessor. Those execute individual instructions. Do not think of the SPE as a processor in parallel with the PPE. The PPE is in front of the SPE, not in parallel with it. Do not think of the SPE as a processor slave of the PPE. While this is not necessarily inaccurate, it is not the best way to understand the SPE.

Think of the SPE as a whole computer in its own right. The SPE is the center of it's own universe. It executes programs, and is particularly good at executing certain classes of number-crunching programs. It has its own private memory (the 256KB LS memory). It has an I/O Controller (the DMAC).

I think it is important to realize that the DMAC is not to be thought of as a memory controller. It is an I/O controller.

In the early 80's we had I/O controllers to access hard drives and s! erial ports. These were high-latency devices that we needed to access in a non-blocking fashion, so that we could do other things with the processor while the I/O request completed. Memory, in contrast, was treated as fast. We waited for memory, because it was fast, and the waits were not long in terms of lots cycles.

Today, memory latency is huge, just like I/O latency was huge in the 80's. Memory access is no longer cheap. It is very expensive. We have maintained the illusion of fast memory using caches and other tricks, but the fact is, memory is slow.

The genius of the SPE, and the Cell in general, is that memory is fast again. Each SPE has a small, fast memory. It also has and I/O controller that treats main system memory like we used to treat hard disks.

The ISA will probably muddy much of this picture by treating memory in a conventional manner, wrt addressing. But it is my suspicion that apulets written with the above attitude in mind will make the b! est use of the system.

There is a reason they chose to present the SPE before the PPE. It is the core of the system. The PPE is a glorified bureaucrat. It will handle the high-level logic for the PS3, but the SPEs will handle the computation. I suspect this will include unit and enemy AI, graphics, sound, media encode/decode, vector math, and everything else that the next generation of Playstation game needs to think about. The PPE, in contrast, will spend its time dealing with network and mass storage I/O, process and apulet scheduling, user input, and other administrative tasks."

"nteresting post, thanks. Can you expand on these bits any more? Specifics about how the SPEs are general purpose? I was under the impression they were like beefed up vector units from the PS2 CPU."

"They're general purpose in the sense that they can execute their own thread, handle branch instructions, etc. But they're obviously not tuned to be fast for anything other than keeping the vector ALU's busy in FPU heavy code (ie games). It's probibly more fair to compare them to the shader cores in GPU's rather than call them general purpose cpu's in the sense of an SMP system"

"There's a lot of bullcrap being posted in this thread. This chip isn't meant for your word processor. It isn't meant for your laptop running OSX and some magic emulation software so that suddenly your iLife applications run at warp speed. It's targeted at exactly one application domain: media processing. And for that domain, it looks to be quite well designed IMHO.

I'd also mention that there have been attached processor cards for x86 for quite a while. They rarely get used outside of very nitch embeded applications. The most general application I'm aware of was for a "MIPS on a Card" rig that came out in the pentium classic era. You could right Lightwave 3D and a few specially written Photoshop filters on it. It, of course, flopped. I don't think intel has anything to fear from cell on the short term. And on the long term, nothing is stopping them from comming out with an x86 system chipset or cpu that has a bundle of attached vector processors with hardware me! mory sync management as well."

"I think it's fairly obvious that the SPE will not be handling graphics. The SPE units combined can only push about 256 GFLOPS max. Even is this theoretical amount could be attained, that only makes it about 50 GFLOPS more than a current top-of-the-line NVIDIA card. Since it's not set to debut for another 2 years inside the PS3, it would have a less-than-stellar debut, being equivalent to what would then be a $150 video card."

"I don't think any of us really expected a VMX/Altivec unit in the controlling processor. It still remains to be seen just how verstile the controlling core is. Before people go ga ga over it getting into a Mac, think of 2 things: first, applications would have to be rewritten to use the SPE's. Second, the space occupied by the SPE's could instead be used for more cache and a 2nd general purpose core, which should yield better performance on desktop applications anyhow.

Edit:

Hannibal just posted in the 2nd article that the controlling core is an in order design. Unless the G4 is a much crappier processor than I imagine, that means the cell controlling core is nothing really worth getting excited about from a mac point of view."

"http://pc.watch.impress.co.jp/docs/2005/0208/kaigaip046.jpg

tells us of a new *dual* controller with a rate of 25.6GBps @ 3.2 Gbps (?!).. they probably wrote this wrong.. (!).. its probably ~ 3.2Gbytes/sec -> ~= 25Gbits/sec.. hmmm..

and since it's dual it is 6.4GBytes/sec..

hmm.. isn't this EXACTLY the i/o controller used on the G5?.. funny thing...

this message started as a quest to find out how the great thirst of the cell would be satisfied.

I now understand that it won't be satisfied (!). So where is the gain with this new architecture? I assume due to the fact that the Altivec on the G5 is almost useless but when it comes to using 8 simpler simd cores it might have a chance to come forward again with this new implementation."
"Strengths of the Cell includes:
1. Multiple SIMD/CPU/Core units - resulting in massive parallelism on a chip.
2. Extremely high bandwidth - 100 Gigabytes a second - compared to current desktops like the PowerMac G5 at 16 gigabytes a second.

Weaknesses:
1. Integer processing speed will only be similar to Intel CPUs unless you can do parallel SIMD work - e.g. with multimedia.
2. Only one full core - the PowerPC core.
3. The need to explicitely optimize code for the Cell, rather than just having the compiler do it - this is the problem with the Itanic processor.

It seems today that the most powerful CPUs are the GPUs by ATI and nVidia. These work similarly by having multiple parallel units. Apple with Mac OS X seems to be on the right track for doing multimedia by taking advantage of GPUs for offloading work from the CPU. I wonder if the PowerMac PowerPC CPU plus ATI/nVidia GPU may actually accomplish - for a personal computer - what the Cell ! is trying to do - with the advantage of having multiple PowerPC cores rather than one as the Cell does, in the future dual-core models.

The fact that the PowerPC core of the Cell can run up to 4 Ghz - in the lab - with a much shorter pipeline than the G5 - makes me question why IBM can't rev up the G5 past 3 Ghz. To me, IBM's inability to do so bodes poorly for being able to produce actual 4 GHz Cells - just as Intel reach it's limits in processor speed."

Fafalada · Feb 9, 2005

Gubbi said:
Or maybe it just went on a die diet. who knows ?
With 288 GFLOPS peak performance, CELL is fairly certain to be memory starved.

Not sure eDram would be a great solution for that though - having to manually manage another layer of memory on top of local storages would complicate things a fair bit more.

At any rate - I would argue that eDram bandwith would come more handy on the GPU side, and at least that's an area that even if it were managed by hand is familiar to lots of people already.

vliw · Feb 9, 2005

Megadrive1988 said:
Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?

I don't see many advanages to use EDRAM with CELL or any other conventional processor.
Testing EDRAM is the most expensive process to do in soic design and you can safely use SRAM without much trouble.
You need large quantities of on-chip memory cache to justify EDRAM use and CELL is equipped with 512 Kb only plus SPU SRAM(if i'm right).

ISSCC 2005

PC-Engine

Panajev2001a

Gubbi

JF_Aidan_Pryde

sir doris

rendezvous

Megadrive1988

London Geezer

Megadrive1988

Panajev2001a

Megadrive1988

Megadrive1988

one

Unruly Member

Megadrive1988

Panajev2001a

passerby

archie4oz

ea_spouse is H4WT!

Megadrive1988

Fafalada

vliw

Similar threads