Console CPU's ?

ERP said:
Alejux said:
xbdestroya said:
Well I'm not talking in terms of flops per se, because in that arena I'm not looking at the PE in and of itself as a significant factor. However, the PE's are able to do some things that the SPE's, as specialized as they are, are really not equipped to handle. In this sense I'm looking at the abilities of these consoles beyond gaming, and perhaps in the games themselves. I should add, I'm not a dev or anything, so maybe I underestimate the number of tasks the SPE's can be utilized for. I just feel that though diminished in flops performance, there is a consolation-prize sort of advantage held by the multiple cores fo the XBox 2 by virtue of their greater (slightly) versatility.

I'm curious. What kind of computing CAN'T a SPE do? I mean, I know that it really shines in math intensive applications, but otherwise, all processors do is basically do very simple math and move information from one place to another. So I'm curious what are the real limitations of SPE's in normal day to day applications, being that they are 99% of the times extremelly non-cpu intensive, having only I/O as their bottlenecks.

The limitations are simple enough that it can only runcode and read data directly from it's own local memory, so if you exceed 256Kb then you have to break the task down and if you can't do that then it won't run.

Trivial example might be an interpretted/GC'd language, most of which require a much larger footprint than 256K to run effectively.

If your trying to hide DMA latency the actual usable amount of memory would be less than that (typically 1/2 for most simple solutions).


I heard the SPE's will use a form of code overlay system to accept codes larger then the local memory capacity (256kb). It's in the patent, but I'm not sure how it will work.

But even if you discount that, the fact is, that SPE's and CELL are all part of a completely new form programming from what people are used to. Parallel programming is the future, there's no running away from it. So it's not that SPE's are limited, but more that they require a different programming model. A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.
 
Ok let's take it from this direction; trying to kind of reverse-engineer why Sony might make one decision vs another in terms of SPE's and PE's.

What I want to know now, is there anything the Power-derived cores in XBox 2 would be BETTER at than the PS3's SPE's? Now, the obvious answer is yes. But I'm wondering what these things are, and how frequently they might come up. Is one PE enough to compensate for the possible frequency of these occurences?
 
Alejux said:
ERP said:
Alejux said:
xbdestroya said:
Well I'm not talking in terms of flops per se, because in that arena I'm not looking at the PE in and of itself as a significant factor. However, the PE's are able to do some things that the SPE's, as specialized as they are, are really not equipped to handle. In this sense I'm looking at the abilities of these consoles beyond gaming, and perhaps in the games themselves. I should add, I'm not a dev or anything, so maybe I underestimate the number of tasks the SPE's can be utilized for. I just feel that though diminished in flops performance, there is a consolation-prize sort of advantage held by the multiple cores fo the XBox 2 by virtue of their greater (slightly) versatility.

I'm curious. What kind of computing CAN'T a SPE do? I mean, I know that it really shines in math intensive applications, but otherwise, all processors do is basically do very simple math and move information from one place to another. So I'm curious what are the real limitations of SPE's in normal day to day applications, being that they are 99% of the times extremelly non-cpu intensive, having only I/O as their bottlenecks.

The limitations are simple enough that it can only runcode and read data directly from it's own local memory, so if you exceed 256Kb then you have to break the task down and if you can't do that then it won't run.

Trivial example might be an interpretted/GC'd language, most of which require a much larger footprint than 256K to run effectively.

If your trying to hide DMA latency the actual usable amount of memory would be less than that (typically 1/2 for most simple solutions).


I heard the SPE's will use a form of code overlay system to accept codes larger then the local memory capacity (256kb). It's in the patent, but I'm not sure how it will work.

But even if you discount that, the fact is, that SPE's and CELL are all part of a completely new form programming from what people are used to. Parallel programming is the future, there's no running away from it. So it's not that SPE's are limited, but more that they require a different programming model. A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.


OK I'll agree to a point.....

However, some tasks simply require random access to large datastructures. You can implement these at a cost on an SPE type architecture, but they won't be very efficient.

And given that some game teams are already in the hundreds of creators, and for the most part and there is adequate power on the Power potion of the CPU (which has none of the restrictions) how quickly do you think that change will come?

I know what I expect and that's a lot of single threaded games which use the SPE's for graphics and little else. Someone will use all that power at some point, for something beyond making things pretty, but I don't believe that day is particularly close to it's launch.

Parallel architectures are coming, I'm not sure I know what form the most prolific ones will end up being. They might be like Cell, might be more like identical multiprocessor cores with local shared memories and the ability to connect to similar devices through a fast interconnect.

IMO Cell gives up a lot for it's Flop rating, the (rethorical) question is, is that the right trade off?

Parallel software requires a significantly different midset, but the architecture itself can make it easier of harder to make the transition.
 
Alejux said:
But even if you discount that, the fact is, that SPE's and CELL are all part of a completely new form programming from what people are used to.

well, not really. different people are used to different things. for example, transputers have been know for, erm, a long time. and the transputer model does not try to pretend memory latencies do not exist - actually it takes them very seriously. so does the cell architecture - data-set access latencies are a serious factor in the computational model, so cell puts them into the equation (enter streaming), as compared to the classical cache-based cpus, where data access latencies are taken on pure faith.. or in the best case a subject to obscure pagan rituals.

A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.

trust me, everybody who have had been seriously into the computational field have been doing so way before sony decided to go cell. cell is not revolutionary -- aamof it arrives with a great delay. if you base your views on the wintel platform - don't, it's like judging about the state of genetics by the qualities of the cabage at the groceries.
 
darkblu said:
A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.

trust me, everybody who have had been seriously into the computational field have been doing so way before sony decided to go cell. cell is not revolutionary -- aamof it arrives with a great delay. if you base your views on the wintel platform - don't, it's like judging about the state of genetics by the qualities of the cabage at the groceries.

Of course I'm basing it on wintel development! I'm aware that parallel programming is not something new, but until now, it was only used by a minority of programmers. Now comes a time, when the majority will have to adopt it.

The problem is, I really don't think that the majority of programmers are equipped to deal with parallel programming and all the problems that come with it.

Maybe I'm crazy, but what I would really like to see, is some clean and revolutionary new programming model, oriented towards massive parallization. Sort of like a new step: [structured programming] => [OOP] => [?] .
 
Once that you are talking about that, can anyone tell me what should be the power for general porpose code of GPUs if, like Xe is suposed to do, they can write on memory :?: , and that compared to CELL... :?:


PS: I now from things like
Scout has achieved improved computational rates that are roughly 20 times faster than a 3-GHz Intel Xeon EM64T processor without the use of streaming SIMD extensions, and approximately four times faster than SIMD-enabled, fully optimized code
they can do great things but I would like to hear you.

http://www.eet.com/in_focus/silicon_engineering/showArticle.jhtml?articleID=55300898

full article

http://www.eet.com/showArticle.jhtml?articleID=55300904
 
Alejux said:
darkblu said:
A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.

trust me, everybody who have had been seriously into the computational field have been doing so way before sony decided to go cell. cell is not revolutionary -- aamof it arrives with a great delay. if you base your views on the wintel platform - don't, it's like judging about the state of genetics by the qualities of the cabage at the groceries.

...
Maybe I'm crazy, but what I would really like to see, is some clean and revolutionary new programming model, oriented towards massive parallization. Sort of like a new step: [structured programming] => [OOP] => [?] .

A golden oldie i.e. [Functional programming], e.g. Scheme, LISP etc...?

Well, the CELL patents mention a 'new programming model'...hmm...

And in console land, devs love their assembly programming and laugh in the face of OOP! :devilish:
 
just a quick question , anyone else hearing that there wont be a dedicated sound chip and that the cell chip will handle the sound ?
 
jvd said:
just a quick question , anyone else hearing that there wont be a dedicated sound chip and that the cell chip will handle the sound ?

I've heard that - not too surprising. In fact, Cell is supposed to be awesome at sound processing.

Rumor is in fact that NVidia may be looking to harness GPU power with their next go at PC audio solutions.
 
jvd said:
just a quick question , anyone else hearing that there wont be a dedicated sound chip and that the cell chip will handle the sound ?
Don't you confuse it with the leaked Xenon document?
 
Just because I want them to be different, I expect the new Nintendo system to have only a single Power based cpu(slower than what MS and Sony are using) with a ton of cache, along with a tile based renderer(yeah right) graphics chip. Actually, I'd more expect something based off of/similar to flipper, a gpu with lots of hardwired functions but less raw performance. The chip will perform in the same league as what's in the xbox 2 and ps3, but think of it as the midrange $200 version of a $500 video card.

Xbox 2 I expect 1 dual core cpu below 3 ghz, and something very close to ati's next gen part, but modified in some ways. I would think pixel/vertex units pumped up, while fillrate is cut. That, or a dual core graphics chip.

Ps3 I expect a fillrate pixelshading monster of a gpu, with a cpu that's really good for vertex shading but otherwise is inferior to what the xbox 2 has.

I expect xbox 2's more PC-designed cpus(ie, they like branching code) to be the performance winners this gen.
 
ERP said:
I know what I expect and that's a lot of single threaded games which use the SPE's for graphics and little else. Someone will use all that power at some point, for something beyond making things pretty, but I don't believe that day is particularly close to it's launch.
Well I figure as much too, but just a question to throw in the mix...
what happens in hypothetical scenario where PS3 GPU would have its own VertexShading? Would people still stick to stuffing more graphics stuff to SPEs, or...
 
one said:
jvd said:
just a quick question , anyone else hearing that there wont be a dedicated sound chip and that the cell chip will handle the sound ?
Don't you confuse it with the leaked Xenon document?

No person in a pm told me . Dunno if its true so i asked
 
Alejux said:
<snip>... the fact is, that SPE's and CELL are all part of a completely new form programming from what people are used to. Parallel programming is the future, there's no running away from it. So it's not that SPE's are limited, but more that they require a different programming model. A lot of people will have to start really rethinking their way of coding, if they want to evolve from now on.

Well all processors are parallel processors if you have more than one in a system. The truth is that an SPE is a packet processor, it accepts a chunk of data, processes it and spits it out. While it's processing one chunk of data, it can fetch the next one at the same time.

The question is: How useful is that ? The answer is: depends.

Fairly useful for dense vector and matrix operations and other workloads you can divide into equal-sized chunks like video encoding/decoding, vertex shading and image processing.

And pretty useless for anything else. Workloads that require fine grain memory access, workloads with dependent memory accesses and workloads with dynamic data structures will run like a pig on SPEs.

The only thing about the SPEs that indicate they are meant for parallel systems is the fact that they are simple and dumb, hence small (for the computational oomph that they pack) and hence you can pack alot on a die.

Cheers
Gubbi
 
Gubbi said:
And pretty useless for anything else. Workloads that require fine grain memory access, workloads with dependent memory accesses and workloads with dynamic data structures will run like a pig on SPEs.
Gubbi
Sorry, but I disagree. What you say it's true, but imho, a lot of that stuff that would run like a pig on a SPE might be transformed in something that would run fine. Something like dependent memory accesses can be grouped and phased, scattered memory accesses sometimes can be streamlined, and so on..
Don't get me wrong, SPEs will never be as efficient as general purpose processors, but tasks can be customized and tailored to run nicely even on those small and dumb processors :)
 
Gubbi said:
And pretty useless for anything else. Workloads that require fine grain memory access, workloads with dependent memory accesses and workloads with dynamic data structures will run like a pig on SPEs.
That may not be necesserily true - if PS3 Cell had the implementation of caching DMA transfers(from one of those patents) through say, L2, things wouldn't necesserily look quite as bad for random accesses either.

IIRC none of the ISSCC Cell articles had info about these kind of details, but I believe there Was at least a mention of SPEs talking with L2 cache.

Speaking of which, with memory 400-500cycles away on these new consoles, wouldn't anything that goes out of L2 a lot hurt the in-order PPC cores practically just as much as SPEs?
 
nAo said:
Gubbi said:
And pretty useless for anything else. Workloads that require fine grain memory access, workloads with dependent memory accesses and workloads with dynamic data structures will run like a pig on SPEs.
Gubbi
Sorry, but I disagree. What you say it's true, but imho, a lot of that stuff that would run like a pig on a SPE might be transformed in something that would run fine. Something like dependent memory accesses can be grouped and phased.
Care to elaborate ? Can it be generalized? My point is that, use one way of indirection and you lose spatial locality, which the SPEs needs badly.

nAo said:
scattered memory accesses sometimes can be streamlined, and so on..
That is already being done in many cases, modern cpus loads and stores data in cachelines afterall, so a big boost can be gained by arranging things to be spatially close.

You just need to do that on a whole new level with SPEs.

I agree that we disagree though :)

Cheers
Gubbi
 
Fafalada said:
Gubbi said:
And pretty useless for anything else. Workloads that require fine grain memory access, workloads with dependent memory accesses and workloads with dynamic data structures will run like a pig on SPEs.
That may not be necesserily true - if PS3 Cell had the implementation of caching DMA transfers(from one of those patents) through say, L2, things wouldn't necesserily look quite as bad for random accesses either.

Reading data from the L2 would be automatic, since the L2 would snoop all memory transactions (and hence what the DMA engine does) and could serve data that it has cached. How the SPE's (the DMA engine) would place data in the L2 I have no idea.

Fafalada said:
Speaking of which, with memory 400-500cycles away on these new consoles, wouldn't anything that goes out of L2 a lot hurt the in-order PPC cores practically just as much as SPEs?

Yes, and out-of-order CPUs for that matter as well (ie. P4's OOO capabilities only just covers the latency from it's L2/3 cache).

Cache is essential to the PPE. The fundamental difference between the SPE and the PPE is that you have to explicitly control data flow in the SPE, whereas the PPE's demand loaded caches will exploit any temporal locality there might be for you.

That, and the overhead of setting up DMA, getting it executed as compared to a ld r0,r1;

Cheers
Gubbi
 
Fafalada said:
Speaking of which, with memory 400-500cycles away on these new consoles, wouldn't anything that goes out of L2 a lot hurt the in-order PPC cores practically just as much as SPEs?

Yep, these days you might as well consider that L2 cache is all the memory they can see.

SPE force you to code well (by manually DMAing into local RAM), whereas conventional processor architectures let you code really badly and then moan about where all the performance has gone.

I expect the 90's and first half of 2000's will be seen as the 'golden age' of programming. It was easy, the hardware did all the work for you (OOOE, branch prediction, ILP architectures, etc.) and you could just make your code 'pretty' (abstraction and software engineering seen as more important than speed).

That era is over, across the board (PC and console) we now place a premium on programmers who understand how to code in multiple processor, long memory latency and ugly (swizzled, no pointers etc.) enviroments.

Of course the bigger question, is how programming becoming harder affects the game (and other software) industry. At the moment you might have a team of 20 programmers but only 5 low level programmers, do we have to re-train 3/4 of the team or do we simple accept most of the code isn't running efficiently. The generation probably isn't the crunch one, the main cores will be fast enough that most game teams will continue fairly normally (though even the highest level programmers are going to have to consider L2 cache). But what about next-next gen, when we start seeing 30-40 processors and main memory latency in the 1,000's of cycles...
 
Gubbi said:
Care to elaborate ? Can it be generalized? My point is that, use one way of indirection and you lose spatial locality, which the SPEs needs badly.
Obviously it can't be generalized, otherwise I'll be famous now, LOL ;)
In a game often a programmer knows in advance how the application is gonna hit the memory (how many times, with a specific pattern, etc..)
and data can be re-organized to exploit data locality.
SPEs, according some patents, have a simple automatic prefetch mechanism. If you store your data carefully the first indirection can be slow
but the following indirections can be much less costly, and if you can schedule enough work to do before use that first indirection you're pretty much done.
Sometime you'd want to swap nested loops to maximize the use of local data.
As example imagine a simple collisions system that accept some kind of query (is this ray hitting something? when? where? how? ).
Usually you make the query to such collisions subsystem, and once the system is back with the answers you needed you can go on and make some decision (like destroy an object or making other queries).
This kind of stuff will run as a pig (I like to say that ;) ) on a SPE, but what if you change your way to see a collisions subsystem.?
You could make a collision engine that collects all the queries per a given frame/pass.Each query is pre-spatially sorted and then processed, re-using a big deal of data (like collision meshes..) on different queries. In a subsequent pass all the results are retired and processed, and this process can be iterated multiple time per frame. This would run much faster on a SPE.
I believe this is something very similar to what ATI does with their current dependent texture mapping implementation. (it was very clear from PS 1.4).

You just need to do that on a whole new level with SPEs.
Yeah, that's true. Nothing that I'm afraid of ;)

ciao,
Marco
 
Back
Top