Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Titanio said:
On SPUs accessing other LS - can anyone confirm the PPEs role here, if any? Can't one SPU put something on the EIB, and another pick it up?

All data into and out of the SPU is done via DMA engines. To effect data movement from one SPU to another SPU, it is required that the LS portion of SPU2 is memory mapped into an effective adress in the system address space. The a DMA descriptor is then constructed that will DMA from the SPU1 LS into this effective address.

It is the same method that would be used to move something from the SPU1 LS to main memory. The difference is that the data will effectively bounce to SPU2's Local store.

Though I may have it reversed because I can't remember if in the SPU to SPU DMA case, if push, pull, or both are supported via the DMA mapping and quite frankly, I'm too lazy to look it up again.

There is no direct method of data movement from one SPU's LS to another SPU's LS.

Aaron Spink
speaking for myself inc.
 
SynapticSignal said:
I'm not an cpu expert :oops:
but the fact that spe lacks branching at all, are in order, don't have cache is not related to efficiency and real-world speed?

SPU's DO have branching. What they lack is any kind of branch prediction logic.

so why Gabell regrets and Carmack talks of cell programming as "pain in the ass" ?

Because it is a pain in the ass just like any other special purpose processor. XeCPU has some of the same issues but just not to the same degree.

Both are in-order processors with all the scheduling issues that implies. Both have poor single thread performance. Both will require a significant amount of re-architecting and re-designing actual code, programming models, and algorithims to get working at high speed.

Aaron Spink
speaking for myself inc.
 
Nice way to make the CELL processor look like a everyday kind of a okay processor with nothing that special about it.
icon14.gif
 
I think aaronspink's made some worthwhile points. He's obviously less optimistic than I am. Unfortunately he wrote so much I can't afford the time needed to reply pointwise :D
 
aaronspink said:
Um, from all the documentation available, this isn't true. The SPUs do have limited atonomy but must be configured and setup via the PPE in order to run any code. The SPUs are not turing complete.

Aaron Spink
speaking for myself inc.

Yes, just like x86 chips aren't turing complete because they need to have a bios configure and set them up to run any code.

:rolleyes:
 
You could call the SPE a hardware engineer's revenge on programmers for being so lazy all these years and soaking up everything that Moore's law has so far provided...
I can't recall the last time I heard of hardware engineers giving a crap about programmers' lives in the first place. In fact, I'd even think the marketing guys care more. Maybe I'm just too cynical...

IME "modern" compilers do a totally crappy job at hiding instruction latency on both X360 and PS3.
Not having seen anything on the PS3, I can at least say about the 360 compiler that it reeks of not having been updated much at all since the last time Microsoft did a PowerPC compiler. Pretty much every new VC feature that has come about since then doesn't exist (even though many of them never worked that well in the first place *cough*edit&continue*cough*).

I can't help but think when people say general purpose code they are talking about legacy previously written code.
To me, "general purpose" code is going to be the stuff that relates to moving data around (as opposed to computing something with it). Juggling pointers, constructing messages, callbacks, copying buffers, allocating/deallocating, transferring data between devices, etc. The CPU is not going to be the lone determinant of performance here, but the fact of the matter is it will definitely not be great on next-gen consoles.

Yes, just like x86 chips aren't turing complete because they need to have a bios configure and set them up to run any code.
Ummm... you do realize he's talking about more than that, right? The SPEs are at least claimed to be Turing complete, but that's different from saying their role puts them in a position to exercise that. I wouldn't be completely averse from the ISA alone to think that they might be Turing-complete by themselves in the package of a stand-alone chip. However, within the CELL, they're kind of cut off, which strips them of that completeness distinction.

Moreover, that's pretty much as it should be. I can say brainf*ck as a language is Turing complete, but that doesn't mean it's fully ready and able for Win32 apps. Same thing with the SPEs -- they might be capable of anything, but that doesn't mean that they're even halfway decent at anything other than streaming vector computations.
 
ShootMyMonkey said:
Not having seen anything on the PS3, I can at least say about the 360 compiler that it reeks of not having been updated much at all since the last time Microsoft did a PowerPC compiler. Pretty much every new VC feature that has come about since then doesn't exist (even though many of them never worked that well in the first place *cough*edit&continue*cough*).

I think they mostly concentrated on the code generation back-end and the optimizer for the 360 PowerPC compiler. :) Productivity features are nice to have, but I'm pretty sure they would want a good optimizer first.
 
http://www.reed-electronics.com/electronicnews/article/CA6282722
Jim Kahle, IBM fellow and chief technologist for the cell processor, said the likely uses of the chip will be in aerospace and defense, life sciences -- particularly bioinformatics, which uses multiple processors for searching through large volumes of data -- and in security.

“Our twist is that we brought in technology from supercomputers,†said Kahle. “That includes real-time computing and real-time operating systems and Linux.â€

In the past, IBM has subsidized software development to build critical mass where there was none, most notably with its OS/2 operating system. Kahle said the current approach is to collaborate with developers and give guidance where necessary.

How that equates into market acceptance remains to be seen, however. Kevin Krewell, editor in chief of the Microprocessor Report, said the software developers kit is enormously complicated. “This is a do-it-yourself platform for hard-core code writers,†he said. “It takes more than a day to download. It’s still not ready for prime time as an integrated tool environment.â€

Until that happens, the processor will be rolled out for specific purposes, and parts of it will be used in other products such as Microsoft’s Xbox 360, due to hit the market later this month. “What they did was take part of the design for the cell processor and create something specifically for Microsoft,†said Krewell.
 
aaronspink said:
All data into and out of the SPU is done via DMA engines. To effect data movement from one SPU to another SPU, it is required that the LS portion of SPU2 is memory mapped into an effective adress in the system address space. The a DMA descriptor is then constructed that will DMA from the SPU1 LS into this effective address.

It is the same method that would be used to move something from the SPU1 LS to main memory. The difference is that the data will effectively bounce to SPU2's Local store.

Though I may have it reversed because I can't remember if in the SPU to SPU DMA case, if push, pull, or both are supported via the DMA mapping and quite frankly, I'm too lazy to look it up again.

There is no direct method of data movement from one SPU's LS to another SPU's LS.

The question, I believe, was whether the SPUs are independant of the PPE or not. Yes, they have to get data to and from RAM or other SPUs using DMA - however DMA controllers are built into every SPU. So in what sense does the PPE have to be involved?

I'd consider the ability to DMA data from place to place as fairly "direct".
 
Titanio said:
The PPE and Xenon cores are supposed to be almost identical, we've known this for some time.

FWIW - They have a lot of similarities, but they are far from identical.
 
ERP said:
FWIW - They have a lot of similarities, but they are far from identical.
Indeed, Cell has been under going revisions while XeCPU core was been massed produced for X360. So this urban myth that they are identical is wrong, maybe once they shared a common ancestor but then so do dogs and humans...
 
DeanoC said:
Indeed, Cell has been under going revisions while XeCPU core was been massed produced for X360. So this urban myth that they are identical is wrong, maybe once they shared a common ancestor but then so do dogs and humans...

The question is: aside from VMX-128 versus regular VMX, is the core fetch/decode/dual issue (dual nested queues) part of the core still SO very similar between say DD2 PPE and XeCPU's cores (as much as the functional blocks level description, the pipeline description, etc... of both cores tell us [through IBM's papers and presentations about each of the two types of cores])? Or is DD2 PPE that different even there and we have to go back to perhaps the much unknown DD1 PPE to see more similarities ?
 
Shifty: Cut that crap out... Funny for a second until its starts bouncing round the net, and a comment that I didn't intend like that, gets me in trouble.

For the record: I meant no disrepect for either processor by the evolution analogy. I just can't spell chimpanzee so dog seemed easier...
 
Panajev2001a said:
The question is: aside from VMX-128 versus regular VMX, is the core fetch/decode/dual issue (dual nested queues) part of the core still SO very similar between say DD2 PPE and XeCPU's cores (as much as the functional blocks level description, the pipeline description, etc... of both cores tell us [through IBM's papers and presentations about each of the two types of cores])? Or is DD2 PPE that different even there and we have to go back to perhaps the much unknown DD1 PPE to see more similarities ?

I've benchmarked both and in many tasks there is a significant per core clock for clock performance difference, that I do not believe can be explained by the compiler difference.

AFAIK DD2 is closer to the Xenos cores than DD1.
 
ERP said:
I've benchmarked both and in many tasks there is a significant per core clock for clock performance difference, that I do not believe can be explained by the compiler difference.

AFAIK DD2 is closer to the Xenos cores than DD1.
Time has passed, season changes.. have you recently benchmarked the supernoisy thing? :)
 
Back
Top