Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

aaronspink · Nov 22, 2005

Titanio said:
On SPUs accessing other LS - can anyone confirm the PPEs role here, if any? Can't one SPU put something on the EIB, and another pick it up?

All data into and out of the SPU is done via DMA engines. To effect data movement from one SPU to another SPU, it is required that the LS portion of SPU2 is memory mapped into an effective adress in the system address space. The a DMA descriptor is then constructed that will DMA from the SPU1 LS into this effective address.

It is the same method that would be used to move something from the SPU1 LS to main memory. The difference is that the data will effectively bounce to SPU2's Local store.

Though I may have it reversed because I can't remember if in the SPU to SPU DMA case, if push, pull, or both are supported via the DMA mapping and quite frankly, I'm too lazy to look it up again.

There is no direct method of data movement from one SPU's LS to another SPU's LS.

Aaron Spink
speaking for myself inc.

aaronspink · Nov 22, 2005

SynapticSignal said:
I'm not an cpu expert
but the fact that spe lacks branching at all, are in order, don't have cache is not related to efficiency and real-world speed?

SPU's DO have branching. What they lack is any kind of branch prediction logic.

so why Gabell regrets and Carmack talks of cell programming as "pain in the ass" ?

Because it is a pain in the ass just like any other special purpose processor. XeCPU has some of the same issues but just not to the same degree.

Both are in-order processors with all the scheduling issues that implies. Both have poor single thread performance. Both will require a significant amount of re-architecting and re-designing actual code, programming models, and algorithims to get working at high speed.

Aaron Spink
speaking for myself inc.

mckmas8808 · Nov 22, 2005

Nice way to make the CELL processor look like a everyday kind of a okay processor with nothing that special about it.

Shifty Geezer · Nov 22, 2005

I think aaronspink's made some worthwhile points. He's obviously less optimistic than I am. Unfortunately he wrote so much I can't afford the time needed to reply pointwise

SubD · Nov 22, 2005

aaronspink said:
Um, from all the documentation available, this isn't true. The SPUs do have limited atonomy but must be configured and setup via the PPE in order to run any code. The SPUs are not turing complete.

Aaron Spink
speaking for myself inc.

Yes, just like x86 chips aren't turing complete because they need to have a bios configure and set them up to run any code.

ShootMyMonkey · Nov 22, 2005

You could call the SPE a hardware engineer's revenge on programmers for being so lazy all these years and soaking up everything that Moore's law has so far provided...

I can't recall the last time I heard of hardware engineers giving a crap about programmers' lives in the first place. In fact, I'd even think the marketing guys care more. Maybe I'm just too cynical...

IME "modern" compilers do a totally crappy job at hiding instruction latency on both X360 and PS3.

Not having seen anything on the PS3, I can at least say about the 360 compiler that it reeks of not having been updated much at all since the last time Microsoft did a PowerPC compiler. Pretty much every new VC feature that has come about since then doesn't exist (even though many of them never worked that well in the first place *cough*edit&continue*cough*).

I can't help but think when people say general purpose code they are talking about legacy previously written code.

To me, "general purpose" code is going to be the stuff that relates to moving data around (as opposed to computing something with it). Juggling pointers, constructing messages, callbacks, copying buffers, allocating/deallocating, transferring data between devices, etc. The CPU is not going to be the lone determinant of performance here, but the fact of the matter is it will definitely not be great on next-gen consoles.

Yes, just like x86 chips aren't turing complete because they need to have a bios configure and set them up to run any code.

Ummm... you do realize he's talking about more than that, right? The SPEs are at least claimed to be Turing complete, but that's different from saying their role puts them in a position to exercise that. I wouldn't be completely averse from the ISA alone to think that they might be Turing-complete by themselves in the package of a stand-alone chip. However, within the CELL, they're kind of cut off, which strips them of that completeness distinction.

Moreover, that's pretty much as it should be. I can say brainf*ck as a language is Turing complete, but that doesn't mean it's fully ready and able for Win32 apps. Same thing with the SPEs -- they might be capable of anything, but that doesn't mean that they're even halfway decent at anything other than streaming vector computations.

aaaaa00 · Nov 22, 2005

ShootMyMonkey said:
Not having seen anything on the PS3, I can at least say about the 360 compiler that it reeks of not having been updated much at all since the last time Microsoft did a PowerPC compiler. Pretty much every new VC feature that has come about since then doesn't exist (even though many of them never worked that well in the first place *cough*edit&continue*cough*).

I think they mostly concentrated on the code generation back-end and the optimizer for the 360 PowerPC compiler.

Productivity features are nice to have, but I'm pretty sure they would want a good optimizer first.

one · Nov 22, 2005

http://www.reed-electronics.com/electronicnews/article/CA6282722

Jim Kahle, IBM fellow and chief technologist for the cell processor, said the likely uses of the chip will be in aerospace and defense, life sciences -- particularly bioinformatics, which uses multiple processors for searching through large volumes of data -- and in security.

â€œOur twist is that we brought in technology from supercomputers,â€ said Kahle. â€œThat includes real-time computing and real-time operating systems and Linux.â€

In the past, IBM has subsidized software development to build critical mass where there was none, most notably with its OS/2 operating system. Kahle said the current approach is to collaborate with developers and give guidance where necessary.

How that equates into market acceptance remains to be seen, however. Kevin Krewell, editor in chief of the Microprocessor Report, said the software developers kit is enormously complicated. â€œThis is a do-it-yourself platform for hard-core code writers,â€ he said. â€œIt takes more than a day to download. Itâ€™s still not ready for prime time as an integrated tool environment.â€

Until that happens, the processor will be rolled out for specific purposes, and parts of it will be used in other products such as Microsoftâ€™s Xbox 360, due to hit the market later this month. â€œWhat they did was take part of the design for the cell processor and create something specifically for Microsoft,â€ said Krewell.

mckmas8808 · Nov 22, 2005

one said:
http://www.reed-electronics.com/electronicnews/article/CA6282722

Hmmm... take part of the CELL processor eh?

Titanio · Nov 22, 2005

mckmas8808 said:
Hmmm... take part of the CELL processor eh?

The PPE and Xenon cores are supposed to be almost identical, we've known this for some time.

mckmas8808 · Nov 22, 2005

Titanio said:
The PPE and Xenon cores are supposed to be almost identical, we've known this for some time.

Yeah I know just having a little fun.

MrWibble · Nov 22, 2005

aaronspink said:
All data into and out of the SPU is done via DMA engines. To effect data movement from one SPU to another SPU, it is required that the LS portion of SPU2 is memory mapped into an effective adress in the system address space. The a DMA descriptor is then constructed that will DMA from the SPU1 LS into this effective address.

It is the same method that would be used to move something from the SPU1 LS to main memory. The difference is that the data will effectively bounce to SPU2's Local store.

Though I may have it reversed because I can't remember if in the SPU to SPU DMA case, if push, pull, or both are supported via the DMA mapping and quite frankly, I'm too lazy to look it up again.

There is no direct method of data movement from one SPU's LS to another SPU's LS.

The question, I believe, was whether the SPUs are independant of the PPE or not. Yes, they have to get data to and from RAM or other SPUs using DMA - however DMA controllers are built into every SPU. So in what sense does the PPE have to be involved?

I'd consider the ability to DMA data from place to place as fairly "direct".

ERP · Nov 22, 2005

Titanio said:
The PPE and Xenon cores are supposed to be almost identical, we've known this for some time.

FWIW - They have a lot of similarities, but they are far from identical.

DeanoC · Nov 22, 2005

ERP said:
FWIW - They have a lot of similarities, but they are far from identical.

Indeed, Cell has been under going revisions while XeCPU core was been massed produced for X360. So this urban myth that they are identical is wrong, maybe once they shared a common ancestor but then so do dogs and humans...

Panajev2001a · Nov 22, 2005

DeanoC said:
Indeed, Cell has been under going revisions while XeCPU core was been massed produced for X360. So this urban myth that they are identical is wrong, maybe once they shared a common ancestor but then so do dogs and humans...

The question is: aside from VMX-128 versus regular VMX, is the core fetch/decode/dual issue (dual nested queues) part of the core still SO very similar between say DD2 PPE and XeCPU's cores (as much as the functional blocks level description, the pipeline description, etc... of both cores tell us [through IBM's papers and presentations about each of the two types of cores])? Or is DD2 PPE that different even there and we have to go back to perhaps the much unknown DD1 PPE to see more similarities ?

Shifty Geezer · Nov 22, 2005

Edit : Removed Inquirer jibe.

DeanoC · Nov 22, 2005

Shifty: Cut that crap out... Funny for a second until its starts bouncing round the net, and a comment that I didn't intend like that, gets me in trouble.

For the record: I meant no disrepect for either processor by the evolution analogy. I just can't spell chimpanzee so dog seemed easier...

Shifty Geezer · Nov 22, 2005

Sorry, no harm meant. Of course we all appreciate you didn't mean it like that all. I'll remove.

ERP · Nov 22, 2005

Panajev2001a said:
The question is: aside from VMX-128 versus regular VMX, is the core fetch/decode/dual issue (dual nested queues) part of the core still SO very similar between say DD2 PPE and XeCPU's cores (as much as the functional blocks level description, the pipeline description, etc... of both cores tell us [through IBM's papers and presentations about each of the two types of cores])? Or is DD2 PPE that different even there and we have to go back to perhaps the much unknown DD1 PPE to see more similarities ?

I've benchmarked both and in many tasks there is a significant per core clock for clock performance difference, that I do not believe can be explained by the compiler difference.

AFAIK DD2 is closer to the Xenos cores than DD1.

nAo · Nov 22, 2005

ERP said:
I've benchmarked both and in many tasks there is a significant per core clock for clock performance difference, that I do not believe can be explained by the compiler difference.

AFAIK DD2 is closer to the Xenos cores than DD1.

Time has passed, season changes.. have you recently benchmarked the supernoisy thing?

Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

aaronspink

aaronspink

mckmas8808

Shifty Geezer

uber-Troll!

SubD

ShootMyMonkey

aaaaa00

one

Unruly Member

mckmas8808

Titanio

mckmas8808

MrWibble

ERP

DeanoC

Trust me, I'm a renderer person!

Panajev2001a

Shifty Geezer

uber-Troll!

DeanoC

Trust me, I'm a renderer person!

Shifty Geezer

uber-Troll!

ERP

nAo

Nutella Nutellae

Similar threads