The question is: IS CELL right for a game console?

Josh378 · Feb 9, 2005

Ok, Us IGN folk are arguing our heads off about if CELL is good for gaming.....some of us are saying yes..the others are saying "NO".

Some of us are thinking that Multi-cpu's are the future of gaming, and others think CELL should just stick to servers and CGI stations.

I think this is a good discussion to bring here since we have far more experienced engineers and developers with ideas and thoughts on the subject and might spark up an interesting discussion.

DO you think that CELL should be use in a game console?

Your thoughts...

-Josh378

ninelven · Feb 9, 2005

erased

Phil · Feb 9, 2005

perhaps I should ask the question back: why should a multi core processor be limited to servers and CGI stations? And what would those servers and CGI be performing?

Josh378 · Feb 9, 2005

well, one guy at IGN gave a pretty good argument about multi-processing CPU's...I won't mention his name but here's his argument:

Until the cost of accessing data is zero, or very close to it, parallel processing will ALWAYS suffer with certain types of data. It is not a matter of problems with coding, it isn't a matter of "thinking outside the box" it is simple a matter of dependancies.

If some piece of data is dependant on another, it cannot be executed until said data is executed. Period. And since information cost is not zero, another processor CANNOT "know" what another processor is "thinking" until said processor is done.

FLOPS are a measure of floating point math. This kind of math is very common in all applications, but even more so in rendering graphics, synthesizing music, physics, encryption.

Since graphics are rendered by changing the colors of individual pixels, and there are thousands to millions of pixels, it is a highly parallel operation where the outcome of one pixel is not dependant on the outcome of another pixel, so each individual pixel can be an individual process.

Physics too, on its most basic level is a highly parallel operation.. for instance, (not possible today on a large scale), individual atoms could be individual processes, their actions are influenced by outside sources, so the over reaching process could farm out little chunks, get back the data, see how each responded, etc... in a larger scale, physics becomes very dependant. It would be the responsibility of the programmer to code it efficiently.

However, a game engine is not very multi-parallel at all. While many individual parts can be "farmed" out, the main thread will need to be run linearly.

Ultimately the question arises, of, is an individual entity in the CELL system powerful enough to run a game engine and match or EXCEED the competitors at this task?

Imagine a system where there are thousands of individual entities thinking simultaneously, but only one of them is in control of the entire thing. Let's now imagine one hundred of these saying "Ok, I'm done with the graphics part" and another hundred saying "ok, I'm done with the physics part" and another hundred "music's done".. etc.. but imagine that the one entity keeping the whole thing in check is not powerful enough, he's gonna say "woah woah, slow down there, stop all talking at once" and BAM, there's the bottleneck.

Now, mind you, I'm not saying that the PS3 sucks, but what I am saying is that it is IMPOSSIBLE to make a multi-parallel processing system that doesn't suffer from these kinds of bottlenecks given a set of data that is highly dependant. It doesn't matter how much money you throw at it, it doesn't matter how smart your engineers are, what DOES matter is the data set, and how well that data set matches your processing abilities and layout.

Deep Blue is an example of what Cell is. Deep Blue has plug and play processor upgrading abilities just like cell, and Cell is likely the bastard child of deep blue in many ways. However, deep blue is not the uber computer for every application. Near peak performance is ONLY hit with a VERY limited set of applications.

Now, the peak performance of cell is VERY high. Extremely impressive, and very exciting, but it will NOT hit peak performance MOST of the time. On average, a multiprocessor system operates at 40% efficiency. This is because only specific chunks of data benefit from parallel processing.

See, this is why I'm so confused about CELL.. because a parallel system is natively VERY good at handling a task like graphics, but now they are using nVidia to handle this? Like others have speculated, I could see nVidia being on board to help with the API's, but IF nVidia is providing the GPU, then it stands to mean that the CELL is NOT as powerful as we are to believe, since the single game application that benefits the most from parallelism is being offloaded, then CELL cannot be the end all, be all chip that the claims make.

Yes, current programing paradigms have revolved around linearity, and yes CELL will require many processes to be rethought, but the fact remains that TIME is linear, and anything that happens in the NEXT time frame is dependant on what happened in THIS time frame, fundamentally, the basis of everything in this universe is linear, there will be tasks that CANNOT be split up because of dependancies, and anything that requires a lot of dependancies will be the bottleneck in a parallel processing system. This is as fundamental as the 4 natural laws.

However, IBM is trying to convince the world that they have somehow "beaten" this problem, and that CELL will do EVERYTHING optimally. But hey, they are the ones that have to convince millions of programmers to write software for the thing. It's a sales pitch.. just like any other.

I could see his point..but I would liek to see what IBM has plans for this?

-Josh378

Sonic · Feb 9, 2005

Well, IBM already has convinced thousands to developer for it. Afterall, it is going into the PS3.

I don't see how it will be any worse for consoles than the current crop of CPU's going into them. The Emotion Engine really isn't that most straight forward CPU out there. Can't relaly fault the PowerPC chip in th eGamecube, I think it is really nice. A Pentium 3 seems to be good enough for Xbox but it also isn't meant for gaming.

Let's just wait and see.

archie4oz · Feb 9, 2005

Who cares? As long as it can add, multiply, reciprocal, and shift some bits around, then any processor is good enough...

Shifty Geezer · Feb 9, 2005

1) A single processor system will never reach the performance of a multi-CPU system. Even if a Dual core is only 40% faster than a single core, there's no way to get a maxxed out single core at the same speed as a Dual core.

Even if Cell isn't 100% efficient, it will still be faster overall than a single processor design

2) Games consoles have always been multi-processor. You have a graphics processor, sound processor, IO processor, and CPU. You always have things waiting on others that you have to balance out. Squeezing all that multiprocessor functionality onto a single chip can only be a good thing, for cost and performance!

3) Conventional games may be very linear, but a restructuring, learning a new methodology, should overcome a lot of these troubles. As you say, parallel systems like Deep Blue might suffer terrible bottlenecks if trying to run a conventional linear game, but has anyone attempted to write Doom3 for Deep Blue specifically to play to it's advantages?! As another thread says, writing for PS2 and XB and GC, you have to balance out what you do for different architectures. It's just a case of balancing out game code to run in parallel. I see this as possible.

There'll be an AI Apulet, Physics Apulet, sound Apulet, each independant, with the main game loop reading the current state of AI, Physics etc. If the main loop gets delayed, the AI and physics keep chugging along and the loop picks up where it can.

For me the big question is what functionality the SPU's provide? Thus far they've sounded to me like FPUs, which is very limiting, almost as though you've one CPU and 8 FPUs. This implies Cell with have less overall CPU grunt than a dual core PPC like XB2 in non-math crunching exercises. I'd like to know what a SPU is capable of - can it run a 3D platformer on it's own, or is it limited to churning out the vectors and nothing else?

Titanio · Feb 9, 2005

Josh378 said:
If some piece of data is dependant on another, it cannot be executed until said data is executed. Period. And since information cost is not zero, another processor CANNOT "know" what another processor is "thinking" until said processor is done.

Of course, but this doesn't mean processors need to stop working or will go underutilised. This is why we have such things as Coffman's algorithm etc.

Josh378 said:
Imagine a system where there are thousands of individual entities thinking simultaneously, but only one of them is in control of the entire thing. Let's now imagine one hundred of these saying "Ok, I'm done with the graphics part" and another hundred saying "ok, I'm done with the physics part" and another hundred "music's done".. etc.. but imagine that the one entity keeping the whole thing in check is not powerful enough, he's gonna say "woah woah, slow down there, stop all talking at once" and BAM, there's the bottleneck.

In a good parallel program, you design such that tasks finish at the same time. One particular part of a game doesn't have to be assigned to just one core. If the slow task can be reasonably split (though this can be tricky, but not in all cases), and the subsequent parts are mutually exclusive, you could assign the parts to multiple cores such that it finishes when the others do. Anyway, it's not that big a deal if the "slow" task is only relatively slow, but as far as the user is concerned, things are still fast. Sure, the other cores may go underutlised while that task is finishing up, but again as I said above, you'd be looking to parallise that task if it's really holding things up and want to keep the cores running as much as possible

Josh378 said:
Now, the peak performance of cell is VERY high. Extremely impressive, and very exciting, but it will NOT hit peak performance MOST of the time. On average, a multiprocessor system operates at 40% efficiency.

This isn't true. At least, it depends on your application, it depends on your development expertise and experience. You can achieve almost linear speedups, it just..depends. It's not something that can be quantified in one number, though. Your mileage will vary.

Josh378 said:
See, this is why I'm so confused about CELL.. because a parallel system is natively VERY good at handling a task like graphics, but now they are using nVidia to handle this? Like others have speculated, I could see nVidia being on board to help with the API's, but IF nVidia is providing the GPU, then it stands to mean that the CELL is NOT as powerful as we are to believe, since the single game application that benefits the most from parallelism is being offloaded, then CELL cannot be the end all, be all chip that the claims make.

The latest GPU have almost as much floating point power as a 8-APU Cell PE. A next-gen NVidia GPU will be a good match for Cell, from a flops perspective, but will be dedicated to that specific task, and thus that power will be more fully utilised than it probably would be in a cell-based Sony GPU (NVidia are the experts here).

He should get used to this idea. All 3 next gen systems are likely to go parallel, just perhaps to differing degrees. Computing in general is going parallel. Sure, it may require a shift in thinking with regards to how software is programmed and structured etc. but it's worthwhile for the performance you will gain. If you refuse to start thinking in concurrent terms, you'll fail to exploit increasing computational power going forward, since it'll mostly come from parallelism. And sure, it may be hard starting off for some, as it really does require you to think differently, but with time and experience, everyone will be thinking in parallel and concurrent terms. There is a nice Dr. Dobbs article on the whole shift towards concurrent programming out there if you're interested.

Josh378 · Feb 9, 2005

^^^I would love that article plz (were debating with an IBM engineer about CELL in conventional Gaming, prehaps I can bring up some interesting points...of course all credit will be given...)

-Josh378

Titanio · Feb 9, 2005

Josh378 said:
^^^I would love that article plz (were debating with an IBM engineer about CELL in conventional Gaming, prehaps I can bring up some interesting points...of course all credit will be given...)

-Josh378

Here ya go: http://www.gotw.ca/publications/concurrency-ddj.htm

I might have to look into that debate. IGN PS2 boards, you say?

Josh378 · Feb 9, 2005

^^^yep...luckily, the thread have been kept in check(constant moderatorss are watching the thread) and flamewars are kept to a minimum....might be interesting if you added your imput....I'm no IT/engineer expert(3D applications IS my strength), but I do know enough of whats being discussed...

-Josh378

Farid · Feb 9, 2005

Shifty Geezer said:
1) A single processor system will never reach the performance of a multi-CPU system. Even if a Dual core is only 40% faster than a single core, there's no way to get a maxxed out single core at the same speed as a Dual core.

Even if Cell isn't 100% efficient, it will still be faster overall than a single processor design

archie4oz said:
Who cares? As long as it can add, multiply, reciprocal, and shift some bits around, then any processor is good enough...

That, more or less, answers the question asked in this thread.

Sure, one can bring some more details, analysis, thoughts and facts to back this up, but that would be just repeating things that have been already discussed, something like 10 time before, on thoses very same boards.

There's hundred page of replies about Cell in the board and therefore a "search" could give a headache, though... So...

andypski · Feb 9, 2005

Shifty Geezer said:
1) A single processor system will never reach the performance of a multi-CPU system. Even if a Dual core is only 40% faster than a single core, there's no way to get a maxxed out single core at the same speed as a Dual core.

Even if Cell isn't 100% efficient, it will still be faster overall than a single processor design

I think this is a massive oversimplification of a generally highly complex problem, and as such I expect it is Not True.

I believe that there will be many tasks which a single core system, given that it can have the same transistor budget to play with as a given dual or multi-core processor, will run faster than the multi-core system.

Many tasks are inherently not parallelisable - hence the multiple cores do not help, and the transistors spent on the multiple cores go to waste.

Even in the case where a task is theoretically parallelisable you may run into situations where inter-process communication and data transfer are your bottleneck, rather than execution resources, in which case this may throttle the multiple cores back significantly. There might well even be cases where in restructuring a task to become parallel you increase the overall amount of data transfer required relative to running it serially. In this case the balance may also tip back towards the serial, single core. If memory and data were your bottlenecks then clearly you would have been better off spending transistors on the memory subsystem and caches, rather than more execution units.

Parallel processing is not a magic bullet.

Inane_Dork · Feb 9, 2005

Cell can be right, because it's an architecture.

Personally, I would prefer two 1x4 Cells over one 1x8 Cell, but either will likely be fine.

Farid · Feb 9, 2005

andypski said:
I think this is a massive oversimplification of a generally highly complex problem, and as such I expect it is Not True.

I believe that there will be many tasks which a single core system, given that it can have the same transistor budget to play with as a given dual or multi-core processor, will run faster than the multi-core system.

That'd be actually true if a one can prove that a "complex" (read a core that can compare favorably to modern multi-cores, in term of raw power) single core can be designed in today's market. Intel told us they could, in the end they can't...

Also, I assume that Shifty Geezer is talking in context of a console CPU. Where parallelism makes a lot of sense.

scificube · Feb 9, 2005

just some thoughts...straighten me out

I not sure I'm understanding part of his argument.

An engine needs to be 'linear' so I will assume this means that an engine must be one thread.

This is why it is asked can one SPE handle an engine faster than the competition can next round. Although I would this the answer is yes, I don't know if this analogy actually applies just yet.

The engine could still be one 'Task' with many threads within it no? So why is not plausible that an SPE could handle any one of those threads.

Of course...dependencies, but this problem is not corrected any more so with the single core CPU's either. (I am being simplistic...a bit out of self preservation admittedly) If x needs y to compute z the wait for x to be computed is not removed from the scenario by using a single core approach any more than a multi-core one. The difference being though is the level of forward progress you could actualy make towards the final goal.

With a single core approach no forward progress anywhere else can be made until x is computed and furthermore if x is being computed in another thread a context switch must occur to that thread, x must be computed, the thread that needs x needs to get off the ready list and onto the CPU again to move forward in it's code. In a mult-core situation you could have the thread that generates the x value being computed on another core simultaneously. If it generates x quickly enough the other thread would have no reason to have it's execution halted when x was asked for. If x hasn't been computed yet there has still been simultaneous progress in the same amount of time to that end so x will be ready to go sooner rather than later as with a single core not having this capability. You could even perhaps have the x generating thread signal the OS to put the sleeping one that needed the value onto a core now if the sleeper threads exectution was critically important to the entire task. Of course, you could do this also with a single core approach but without the benefit of forward progress on muliple fronts simultaneously it would be slower.

I am being simplistic and could be wrong but it seems thinking like this could make multi-core beneficial to single core even in the face of the dependency problem. I'm not saying that big chunks of an engine could be handled in a multi-threaded fashion like this with clear cut threads for physics, AI, graphics etc, because these theing have layers of inter-dependency. I'm saying that you could use a primary thread of control in the task that handles all and is essentially the 'major systems' in the body of the engine and have it use other threads as only organs in the larger systems. The entire task would comprise a complete body of an engine.

I don't think physics can be encapsulated and run completely independently on a single core due to the dependency problems with AI entities that can interact with phyical objects etc. However, portions of the physics system or AI system or whatever could be handled on the core to what I will assert (I should say believe no?) to a satifactory degree and make it a valuable asset that you could run these subsystems simultaneously at a fast rate. I don't have near enough knowlegde or expertise to lay out exactly where that line is but I think it exists and could be taken advantage of.

I guess my point is that you shouldn't be forced to look at it in terms of putting an entire engine on one SPE and comparing performance to that of a PowerPC core of another system. I think it's better to ask can the SPE's handle the chunks of an engine given to them to process better than the cores of the competition in making the whole engine's execution faster.

Perhaps I'm wrong but I don't think the single core architectures do not suffer from dependency issues anyway. Something someone raised to me at IGN is that the Xbox2's CPU is to use 2 or 3 PPC cores and that each of these cores is to be multi-threaded (2-4 per core). I have some interesting questions about this possibility.

Would these cores also be able to perform at 4GHz?

Could the Xbox2's CPU get more done with it's 4-12 threads than the Cell could with it's 9 threads per unit time given simlilar tasks? This may require some discussion of the Xbox2's leaked design. But with the cores sharing caches vs. the Cells cores sharing AND having local storage caches is this plausible? Are all the cores equal or are some for vertex processing ie. geometry in the Xbox2's CPU. At a glance it appears the Cell has shallow pipelines than allows it to function at such high speeds...wouldn't a PPC core that could handle up to 4 threads simultaneously be they rendering or general purpose tasks be deep in comparison and thus slower?

Don't filet me. I just trying to understand things better like everyone else.

Shifty Geezer · Feb 9, 2005

andypski said:
Shifty Geezer said:

1) A single processor system will never reach the performance of a multi-CPU system. Even if a Dual core is only 40% faster than a single core, there's no way to get a maxxed out single core at the same speed as a Dual core.

Even if Cell isn't 100% efficient, it will still be faster overall than a single processor design

Click to expand...

I think this is a massive oversimplification of a generally highly complex problem, and as such I expect it is Not True.

I believe that there will be many tasks which a single core system, given that it can have the same transistor budget to play with as a given dual or multi-core processor, will run faster than the multi-core system.

Could a single processor actually make use of that many transistors?!

Where would the budget go? Loads of pipelines (parallelism)?

If a single processor solution with the same transistor count can achieve the same performance as parallel processing, why hasn't it been done? Though parallelism isn't the Golden Fleece solution, AFAIK a single core processor suffers design limitations that prevent it from making use of massive transistor counts. As we learn to develop larger, more complicated chips, making use of those transistors will require parallelism.

Many tasks are inherently not parallelisable - hence the multiple cores do not help, and the transistors spent on the multiple cores go to waste.

As I say though, a process may not be comfortably dividable between processors, but concurrent processes dealing with different areas is possible. The Cell diagrams have shown APUs/CELLs waiting for other APUs/CELLs to finish their task so dependencies are an anticipated factor. But optimization could squeeze a lot from the system.

andypski · Feb 9, 2005

Vysez said:
andypski said:

I think this is a massive oversimplification of a generally highly complex problem, and as such I expect it is Not True.

I believe that there will be many tasks which a single core system, given that it can have the same transistor budget to play with as a given dual or multi-core processor, will run faster than the multi-core system.

Click to expand...

That'd be actually true if a one can prove that a "complex" (read a core that can compare favorably to modern multi-cores, in term of raw power) single core can be designed in today's market. Intel told us they could, in the end they can't...

I believe that the statements that I made are true, in the sense that they were not specified in absolute terms. If you take an inherently serial task, and run it on a single core that is dedicated to running serial tasks quickly, and then you have a dual core with the same number of transistors running that same serial task then, other aspects of design being equal, I would expect the single core to win. I don't think that's a leap of faith that is too big to make - if it's a serial task, what are all those transistors on the second core doing?

Simply put, (utilisation lower) = (performance lower). A gross oversimplification, I know, but that's the point I'm trying to get across.

Problems with pushing clock speeds and higher transistor densities are a separate issue. If Intel choose to use their transistor budget on multiple cores to take advantage of their abilities in parallel tasks (of which many _do_ exist which can be accelerated this way) then that's fine. But with the same transistor budget I see no reason why they couldn't design a single core that would still outperform their dual core on serial tasks.

Besides, the market being addressed is different. See below.

Also, I assume that Shifty Geezer is talking in context of a console CPU. Where parallelism makes a lot of sense.

Why? Are consoles magic?

I would actually have said that in terms of multiple core processors, where each core is an equal citizen, the PC market actually makes a lot of sense. In the PC space you have complex operating systems running multiple execution threads for entirely separate processes that may never need to synchronise or communicate. Bingo, big speedup in these cases by having multiple cores.

This is certainly a market in which CPU manufacturers are likely to target multiple cores, and with good reason as given in the paragraph above. The benefits of multiple independent CPUs in such an environment is pretty well demonstrated over single cores - I run MPEG decode on one CPU to watch a DVD, while browsing the internet on the other. These two tasks never need to synchronise or communicate, so I may be able to get good utilisation of my parallel resources with minimal effort.

Now Cell seems not to be quite the same thing - it's a main processor/dispatcher with a set of sub-processors designed for parallel work. It's not like a dual-core system, which might have multiple identical main processors running multiple independent tasks as described above, the target of Cell seems to be to accelerate parallel execution functions in a single task, such as graphics, where you might be able to run vertex shading on multiple vertices at once on different execution resources for instance.

The problem is that if you're accelerating a single task then you probably _do_ need to synchronise, communicate and share data at some point. It's not a slam-dunk like multiple cores in a multi-tasking multi-threading OS. In some cases it may go massively faster, in some cases the parallel execution units may be woefully under-utilised, and you would have been better off with one blindingly fast execution unit.

Anyway, my purpose is not to argue that either multiple cores or single cores are 'superior' in some way. I was just pointing out that I don't believe the original statement...

"A single processor system will never reach the performance of a multi-CPU system."

...is one that is fundamentally true in any way.

Scott_Arm · Feb 9, 2005

The thing I've wondered about Cell is not how powerful it is, it's how difficult it is to develop for. I've been hearing all this stuff about how development times are getting longer each year, and development costs are rising, and it sounds like Cell processing is offloading a bit more work onto the software developers. To me, that doesn't sound like the right thing to do. I can easily be proven wrong, I'm sure, because I'm no master of CPU architectures. It just seems that if game companies are having a hard time harnessing the power of the Xbox, PS2 or PC and getting their games delivered on time, how will developers ever get those things accomplished on a system with many times the power and an even more complicated development? Is this something that can be solved with development tools offered by Sony/IBM?

andypski · Feb 9, 2005

Shifty Geezer said:
Could a single processor actually make use of that many transistors?! Where would the budget go? Loads of pipelines (parallelism)?

Quite easily for many tasks - memory caches are one obvious starting point, and these can speed up manipulation of large data sets massively.

Exploitation of implicit instruction stream parallelism is another - fine grained parallelism (instruction), rather than coarse grained (task).

If a single processor solution with the same transistor count can achieve the same performance as parallel processing, why hasn't it been done?

It has been for many years in many fields - why aren't all desktop processors massively parallel, if your statement is true? Why don't all desktop processors have a Cray-style massively parallel design, with vector processing units all over the place?

The answer is that for many years, running one task fast has been the important factor, and for many tasks in computing vector processing is of no practical use at all. The transistors used for all the vector units would hence have been a total waste of space and money for most people day to day.

The question is: IS CELL right for a game console?

Should CELL be used in a GAME CONSOLE?

NO, it's needed for Servers and workstations

Yes, because it's the future technology for consoles

Other thoughts (post below)

Josh378

ninelven

PM

Phil

wipEout bastard

Josh378

Sonic

Senior Member

archie4oz

ea_spouse is H4WT!

Shifty Geezer

uber-Troll!

Titanio

Josh378

Titanio

Josh378

Farid

Artist formely known as Vysez

andypski

Inane_Dork

Rebmem Roines

Farid

Artist formely known as Vysez

scificube

Shifty Geezer

uber-Troll!

andypski

Scott_Arm

andypski

Similar threads