Cell

bbot · Aug 6, 2002

This link was posted in the console forum. What is your opinion of "cell"? Will Intel/Nvidia be able to compete successfully against it?

http://news.com.com/2100-1001-948493.html?tag=fd_lede

Guest · Aug 6, 2002

bbot said:
This link was posted in the console forum. What is your opinion of "cell"? Will Intel/Nvidia be able to compete successfully against it?

http://news.com.com/2100-1001-948493.html?tag=fd_lede

Looks like a one chip solution.. The nvidia/intell/every other chip and graphic designing company is in jeaprody if it is effective. The comment about Mores law being too slow is ... shocking. I suspect that if indeed this technology will be in the Playstation 3 MS may become extremely interested as well. Further if it is the solution for the next generation Playstation it will be amazingly powerful as I think Sony is very much wanting that market for itself. Interesting post. Technically it falls in 3D tech so it shouldn't be moved but it might

MfA · Aug 6, 2002

I dont quite see Sony making a x86 compatible PC with it soon, so I doubt its a threat to the PC market in the short term ... if Intel perceive it as a threat they have plenty of time to react. If the PS3's image quality is overwhelmingly better and they provide a monitor output it could hurt NVIDIA a bit by pulling customers away from PCs, but I doubt it will make much impact.

Sony would have to get into the PC market to threaten them, and it cant do that with the PS3 ... since consoles are fundamentally closed platforms, and not too many people want that on their desktops.

KimB · Aug 6, 2002

This sort of chip will be the future, because as chips become more and more powerful, the busses will become the limiting factors.

But, I don't feel we've yet reached that point on the consumer end. The new bus technologies that have yet to be implemented should keep SOC's out of the high-end consumer level for some time.

In the mean time, chips like these should make significant successes in the low-end market, but will pretty much always be outperformed by multiple dedicated solutions.

For example, a SOC design might be used in the server market if the CPU has all of the I/O functions and a simple video processor integrated. In this way, the large, complex CPU could sacrifice just a little bit of die space for a significant cost savings.

Entropy · Aug 6, 2002

http://www.research.ibm.com/journal/rd/462/nair.html
This is an interesting research article detailing the future of chip design and it even gives some history of past designs and why they were implemented. the natural progression shows a cell type processor with many processors running alongside each other.

Very interesting read.

Entropy

DemoCoder · Aug 6, 2002

I first heard of CELL years ago in the IBM systems journal when they were talking about putting 32-cpu-cores-per-chip for Blue Gene, the first peta-flop computer. What I don't understand is why is PS3 bulking up on so much CPU power? CELL would be great for AI, physics, and other simulation problems, but are they are thinking of using cell cores to do rasterization? I suppose 4-8 cells would make a great replacement for the EE, but I think it would be a waste if they don't significantly improve the Graphics Synthesizer.

Panajev2001a · Aug 7, 2002

Democoder... why not having a smaller CELL as rasterizer

a custom 16 CELL cores chip as CPU and a smaller CELL based chip as rasterizer/GPU...

Add on top an HLSL, cook in the oven for ~2+ years and get ready to serve in the Million portions range

randycat99 · Aug 7, 2002

I'd say the rasterizer hardware will probably stay as dedicated rasterizing hardware. The Cell strategy could be extended to fulfill the duties of the current vector units as ultimate programmable vector and pixel shaders? (just my guess, as a casual spectator) ...But I guess that *is* rasterization, isn't it? I'm not sure how it is all supposed to fit together then. Maybe at the rate they are going (perhaps not a 16 core PS3 chip, but something following it...) the Cell will pull-off realtime raytracing in a game?

phynicle · Aug 7, 2002

i head that each of these cores runs at a frequency of 1 gighertz
is this right?

PC-Engine · Aug 7, 2002

The only thing that's been said officially is that each chip not core will be capable of 1 TFLOPS

MfA · Aug 7, 2002

That was before an architecture had even been decided on too

(hell, we dont even know if its been nailed down right now)

Dio · Aug 7, 2002

I remember the first announcements on the Emotion Engine having a similar degree of 'this is a revolutionary chip' hysteria.

Four years down the line, the PS2 looks a bit outdated already - certainly compared to the PC and the Xbox and IMHO it ain't that great next to the Gamecube.

Sony shot themselves in the foot (only a minor wound, though) with PS2 by making it so darn complex to program. If you have to manage 16 cores on one chip and make them all work together in your code as well - ugly, very ugly. Hopefully this time they will have the software to make it easier for the developers.

eSa · Aug 7, 2002

From article in the first post, the claimed performance target ~ is 1 teraflops.

To achieve this, we need :

One cell chip with 16 cores, clockrate 2.5 GHz.

Each core has four paraller vector processing units. Single unit can execute four "multiply-add" instructions pareller in one cycle (theoretical peak). This is possible because when multiplying four element vector with 4x4 matrix each "column times row" operation is independent from others.

This way we get : (4 muls + 3 adds) * 4 * 16 * 2.5 GHz ~ 1 teraflops /sec. So with this configuration single "Cell" chip could transform about 40 billion vertices / sec and with triangles that would be ~ 13.3 billion triangles transformed / sec

Gubbi · Aug 7, 2002

I don't think the FPUs will implement discrete add and mul units. It's more likely to be MAC or MADD, so it'll take 4 SIMD MAC/MADDs to transform a vertex = 32 Floating point ops (4 adds wasted).

So transform throughput will be 10^9 /32 ~=32Gvertices/s

I predict that they will be nowhere near this.

Cheers
Gubbi

phynicle · Aug 7, 2002

ok i read the blue gene somewhere before
and i heard that these cores have their access to a portion of edram

KimB · Aug 7, 2002

eSa said:
This way we get : (4 muls + 3 adds) * 4 * 16 * 2.5 GHz ~ 1 teraflops /sec. So with this configuration single "Cell" chip could transform about 40 billion vertices / sec and with triangles that would be ~ 13.3 billion triangles transformed / sec

With strips, fans, and a post-tranform cache, the ratio of vertices to triangles can be 1:1 or better. So, 40 billion vertices/sec is very roughly equivalent to 40 billion triangles/sec.

And, of course, lighting and other things would decrease the real-world values...

BoddoZerg · Aug 7, 2002

Excuse me for my ignorant question, but how can a "cell" made out of many general-purpose chips ever hope to compete against a GPU, which has highly specialized texture units, pixel shaders, etc?

Worse yet, is it going to be so hard to program code for the "cell" that it'll take 2 years for developers to exploit the full potential of the hardware? (at which point, the Xbox2 will come in and steal their graphics crown)

KimB · Aug 7, 2002

It's pretty simple, really.

The "Cell" design would have include one or more cells that would be exclusively for graphics processing.

But, since graphics chip companies show no signs of failing to push transistor counts to the max on the available die processes in the near future, you simply cannot have as powerful a chip if some of that die goes to things other than graphics processing.

Only after the busses between the chips becomes the limiting factor will cell designs be good.

In the meantime, cell designs with nothing but multiple CPUs (and not integrated graphics...unless you're talking about integrated low-grade graphics) should be excellent for designing massive multiprocessing systems.

KnightBreed · Aug 7, 2002

BoddoZerg said:
Excuse me for my ignorant question, but how can a "cell" made out of many general-purpose chips ever hope to compete against a GPU, which has highly specialized texture units, pixel shaders, etc?

With enough transistors/cells anything is possible. As long as it's fiscally feasible to manufacture you can keep adding cores until you've reached the required performance.

Worse yet, is it going to be so hard to program code for the "cell" that it'll take 2 years for developers to exploit the full potential of the hardware? (at which point, the Xbox2 will come in and steal their graphics crown)

Difficult development can be aided with quality middleware tools.

psurge · Aug 7, 2002

(from Entropy's link):

SMPs as cells in a cellular architecture
In making his case for the Connection Machine, Hillis [15] argued that each processing element in the machine had to be small and simple. He claimed that it would be unreasonable to assume both that &#8220;there are plenty of processors&#8221; and that &#8220;there is plenty of memory per processor.&#8221; That may indeed have been the case in 1985 when his thesis was published. However, gigascale technology has the potential to change this assumption in a dramatic way. Before 2011 it should be possible to build a very powerful cellular system of thousands of processors using a package about 20 cm on a side (Figure 7), each containing 1000 superscalar processors with 256 GB of DRAM and delivering a peak of 20 Tflops per package. Each of 64 chips in the package is a 16-way SMP like that shown in Figure 5.

64 16-way SMP chips = 20 TFlops... that gives you 312.5 GFlops per chip,
or ~19.5 GFlops per core - in 2011...

later in the article :

Once technology allows placement of as many as 16 processors on a single chip, the focus of on-chip organization can shift from providing more computation power to bringing memory right alongside the processors. The amount of computation that could be delivered by a chip that has more than 4 GB of DRAM with 16 attached processors at 10 GHz would satisfy the requirements of most multimedia and game applications, and yet would be versatile enough to be a node in a commercial server cluster. Cluster computing is young, but is already beginning to make an impact in the server world. Its usefulness stems from the fact that it presents a scalable distributed computing view at a high level, but still allows a familiar shared-memory view at lower levels. The redundancy in such systems also makes them more tolerant to defects and failures, an important issue in large gigascale chips.

So if we assume that he is talking about a 10GHz chip, that would be about 2 flops a cycle, which is reasonable for a general purpose computing element.

However, today at 1 GHz that would give you a single chip with 30GFlops of compute power. Using 4-way SIMD FP, maybe you could get to ~120GFlops peak on a chip.

Cell

Guest

Guest

Similar threads