Details trickle out on CELL processor...

V3 said:
I think the ISSCC is still in progress, expect some official press releases after...

Really ?

On Dec. 1, you can get ISSCC 2005 advance program from here. The press get the program before. The actual conference will be in Feb. 2005, so don't expect much by then.
 
ISSCC 2005 runs from 6-10 February. Since the papers will be released soon, I'll just lazily wait for others to summarize the details for me. :devilish:
 
My bad...ISSCC is not in progress! :) It must've been a phantom/virtual one in my head! :p

one said:

http://www.nytimes.com/2004/11/29/t...39fd80203e&ei=5006&partner=ALTAVISTA1

The above link seems to work without registration...

The workstations with Cell inside will allow video game developers and special effects producers to create products in a fraction of the time it takes now, the companies said. That could reduce the cost of making movies and video games, a potential boon for movie studios.

I wanna see the software dammitt!
 
The CELL related 5 programs @ ISSCC 2005 -

10.2 The Design Implementation of a First-Generation CELL Processor

This is the main overview program.

7.4 A Streaming Processing Unit for a CELL Processor

26.7 A 4.8GHz Fully Pipelined Embedded SRAM in the Streaming Processor of a CELL Processor

These are for streaming processors.

20.3 A Double-Precision Multiplier for a First-Generation CELL Processor

This is by IBM,

28.9 Clocking and Circuit Design for a Parallel I/O on a First-Generation CELL Processor

and this is by Rambus, Inc. & Stanford University.

Note that all programs are supposed to describe First-Generation CELL Processor, which were already in production early this year. ISSCC doesn't adopt armchair theories, but actual samples.
 
one said:
Note that all programs are supposed to describe First-Generation CELL Processor, which were already in production early this year. ISSCC doesn't adopt armchair theories, but actual samples.

Cool, so I'd assume we can say with reasonable probability that the first generation parts are based off the 10S, 90nm sSOI, process. Which would bode well indeed based off the speeds attained if they scale as expected to 65nm. Oh, and thanks for the round-up One.

EDIT: Nevermind, it states as much on the first page. Sorry for the useless post.
 
Also, I wouldn't put much faith in Mr. Zimmon's comments, he's doing the same bandwith/flop math that Deadmeat used a year or so ago. External bandwith isn't an indication of calulation ability (ala contemporary GPUs).
Isn't this on the basis of constantly using data that isn't cached in memory. It's probably worst case, for data that can be cached on chip the number would be higher, I guess the performance will hinge on how much onchip cache memory cell's have. I'm suspecting it won't be huge, so for large data sets which is what lot of next gen games will be using, the memory bandwidth may hinder a lot of the potential performance of the cell units. I guess we'll see if the 128k is enough. I guess it could all just be another case of sony concentrating on theoretical numbers instead of balancing the chip for maximum real world performance.
 
Isn't this on the basis of constantly using data that isn't cached in memory.
No, it's based on doing exactly one Floating point operation for every data read/write, which is much more extreme then simply not hitting the cache often. Even a cpu with no local memory or cache should easily beat that number with a reasonably sized register set.

Case in point - applying this logic to current Xenon specs, you top the machine out at 6.2GFlops...
 
The memory hierarchy is configured differently on a Stream processor. Multimedia applications don't run well on standard cache configurations like an x86 CPU.

Imagine overcomes the bandwidth bottlenecks of global register files and memory systems by using a three-level bandwidth hierarchy organized to support stream operations. Streams are transferred between memory and a stream register file (SRF) by a four-bank streaming memory system (2GB/s) that reorders references to improve bandwidth. Once a stream is loaded from memory, it is typically circulated between the SRF and the arithmetic clusters several times before returning the result to memory, exploiting the 32GB/s bandwidth of the SRF. Finally, during a computation kernel, intermediate results are forwarded directly between local register files associated with the arithmetic units without need to return to the global register file, using the 544GB/s local register bandwidth. On representative benchmark programs, exploiting the locality inherent in stream applications in this manner reduces bandwidth demands on global register ports by a factor of 20 compared to a typical scalar architecture.

Imagine overcomes the performance limiting effects of conditional operations by sorting streams according to a conditional variable rather than through conditional control flow. These conditional stream operations divide data into homogenous sets that can then be processed without the overhead of conditional control instructions. Compared to conventional approaches of branch prediction or predication, conditional stream operations enable very high levels of instruction and data parallelism to be exploited without incurring a large penalty on every unpredictable conditional operation.

http://cva.stanford.edu/imagine/project/im_overview.html
 
Jaws said:
Each processing element comprises a Power-architecture 64-bit RISC CPU,

This seems to have been overlooked, Power and NOT PowerPC for the PUs? Hmmm...

The 64bit Power core running in CELL will probably be a simple scalar cpu based off the PowerPC family. And the same 64bit Power core will be Xbox 2 and Revolution, but configured differently. Thats my best guess.
 
Now we still don't know if Sony's going to put 1 cell, 2 cells, or 4 cells in the PS3.

Comparing apples to pears:
Code:
Item                1 cell                       Xbox 2
CPUs             1 @4.8GHz                   3 @3.5+GHz
ALUs             8  @4.8GHz                 48 @500+MHZ
On-CPU RAM     128K*8 = 1MB                    1 MB

One Cell seems very roughy in the same ballpark as an Xbox 2. (Especially if you think the GHz numbers end up converging, which is likely because both companies are buying their CPU technology from the same vendor.)

Sony seems to have a tough choice to make: launch at the same time as Xbox 2, but roughly at performance parity, or ride Moore's Law for one or two extra generations to get a more powerful product.

It looks like they're planning on waiting one year, which is 2/3rds of a Moore's Law cycle. That would tend to indicate that the PS3 will be a 2 Cell system. (And would tend to indicate that PS 3 will be 2 to 3 times more powerful than Xbox 2.)[/code]
 
FatherJohn said:
Sony seems to have a tough choice to make: launch at the same time as Xbox 2, but roughly at performance parity, or ride Moore's Law for one or two extra generations to get a more powerful product.

Why would they wait? Most importantly, we don't know the area of one of these 90nm fabbed PE's yet, so we can't make a prediction for a yeildable 65nm IC. And, secondly, they have already been sampling 65nm, 2nd generation ICs; so whatever will end up in the PS3 likely exists already. The ones being discussed here have been around for around for awhile. And I also have my reasons to think the GHz won't converge, namely I believe there are architectural features that make Cell inheriently faster. And word on the street recently is that synthesis and placement is basically fucked for 65nm at IBM; as I told nAo, I see this as hurting the XCPU project more than Cell.

PS. You just compared a single Cell Processor to the entire X2 system and called that parity. Do you think the PS3 won't have a "GPU"? Think about it.
 
FatherJohn said:
Now we still don't know if Sony's going to put 1 cell, 2 cells, or 4 cells in the PS3.

Streaming processors will consume very little wattage even at high frequency 8) 4 PEs in a 65nm Broadband Engine is very likely, I guess.
 
Yeah, I don't think PS3 will have a full GPU. There's certainly no reason to have any vertex processors. And depending on how the second cell's APUs are configured they may be able to use them as some sort of renderer. (The Stanford Imagine group tried an experiment where they configured their stream processor as a Reyes-style renderer. It worked, but it was abysmally slow -- 20 time slower than a contemporary Z-buffer-based renderer, but using 3x the transistors. The dirty secret of stream-based processors is that they are very hard to get useful work out of.)

Sony's a fine company, but they can't magicly put more transistors into their products for the same price than other people can. So they are limited in what they can do. They have to make engineering trade offs, the same as their competitors.

They've chosen to invest heavily in APUs, and there isn't any practical use for APUs in a game machine other than rendering. So I assume that's what they're there for.

(People can wave their hands and say that the extra compute power is for physics or AI, or brand new uses that nobody's thought of yet. But I think that's pure wishful thinking. It's a waste of money to put in so much extra power unless most games can take advantage of it.)
 
The dirty secret of stream-based processors is that they are very hard to get useful work out of.

Are you forgetting that the programmable parts in GPU are aslo base on stream processors ?
 
Back
Top