Black Dragon37
Newcomer
They also built Xenon. That's shite too?Gubbi said:Of course they'll say that, the built the shite.
They also built Xenon. That's shite too?Gubbi said:Of course they'll say that, the built the shite.
Nemo80 said:The big difference is however that the SPE Model is not a SMP/T one at all. It can be thought of something like a Master -Slave relationship where the Master (PPE) delivers tasks to the individual SPE, much like simply calling a subroutine.
aaaaa00 said:It's trivial to construct any form of multithreaded relationship on an SMP machine, because it is the most general and flexible form of multithreading that exists.
If you want, you can easily build a "master - slave" design pattern on SMP by designating a main thread and constructing a job queue for each slave thread you allocate.
The synchronization required for doing this on an SMP is no more and no less than in the SPE Model -- you need some sort of lock to protect each slave thread's job queue from corruption by access from multiple concurrent threads.
ihamoitc2005 said:Gubbi said:It's not that you explicitly have to set up a DMA that is the problem, it's that local stores aren't kept coherent. The lack of memory coherence *is* a bitch. The nuisance of the heterogenous ISA is minor compared to that.
You didnt read the link.
It starts with ...
The Cell Broadband Engine is a single-chip multiprocessor with nine processors operating on a shared, coherent memory.
and
While each SPE is an independent processor running its own application programs, a shared, coherent memory and a rich set of DMA commands provide for seamless and efficient communications between all Cell processing elements.
Is a gross understatement. SPEs having more than 256KB context compared to the ~1KB for a regular PPE context means that task switching is all but impractical.The SPEs are more adept at compute-intensive tasks and slower at task switching.
ihamoitc2005 said:Also referring to heterogeneous ISA ...
http://domino.research.ibm.com/comm...?Open&printable
Memory access is performed via a DMA-based interface using copy-in/copy-out semantics, and data transfers can be initiated by either the IBM Powerâ„¢ processor or an SPU. The DMA-based interface uses the Power Architectureâ„¢ page protection model, giving a consistent interface to the system storage map for all processor structures despite its heterogeneous instruction set architecture structure.
Sounds pretty straightforward no?
Black Dragon37 said:They also built Xenon. That's shite too?
inefficient said:I believe you are wrong. Your thinking in classic classic SMP/T terms. And like Nemo80 hinited, the correct way to look at SPE progamming is not like this. The key advantages the Cell has here is the DMA memory access model and that each SPE has a local store. In the cell programming model you would set up a DMA on the SPE and then let it execute/read/write in it's own private area.
expletive said:Regardless of the actual hardware benefits, a lot of developers who prefer the 360 have commented on the overall dev environment and the tools they can use. Debuggers, performance tools, etc. Plus they are all tools that developers who develop for the PC are familiar with already and those who havent, claim they are easy to use. (and after seeing the pc version of the 360 controller, its obvious this is a HUGE part of MS' mid to long-term strategy: one development budget-two platforms)
That said, paralleism with 3 identical cores and 6 identical threads should be a bit easier than a PPE and SPE design where each has different needs and potentially different roles shouldnt it? (I have to credit that thought to Carmack though, as he stated in his Quakecon address.)
What we have not seen, however, is if the Cell will provide an advantage in the closed-box system known as the PS3 and i think thats what is really on trial in this thread.
J
aaaaa00 said:It's generally easier to write fast code when it's easier to write correct code, since fast but incorrect code is not typically very useful.
Correct multithreaded code is much easier to write when you have N identical CPUs all sharing identical access to the same main memory, with a well-ordered memory model and cache coherency guaranteed by the hardware. (Which is pretty much x86 SMP in a nutshell in fact.)
Such an architecture is fairly well understood today, and any college concurrent programming textbook will teach you the basics of synchronization objects and have parallel algorithms that work correctly and reasonably well on an SMP.
Each step away you take from such an architecture introduces stuff that makes it more complicated just to insure code correctness, never mind performance.
The point Carmack is making is that xbox 360 is already pretty much the best case scenario for multithreaded architectures -- but even there, insuring code correctness is going to be hard to do before you even start to think about making the performance better.
Gubbi said:The fact that you have to explicitly move data around with DMAs is not an advantage.
darkblu said:regardless of how true his statement is in itself, the question is: what do _you_ read into his statement.
the first paragraph of Carmack's statement basically says: 'it is very easy to spawn a thread and get it running on the 360 - just as easy as it is on your grandma's smp pc'
to which eveybody can only nod in agreement, as there's nothing to misundersand here and that message gets clearly and correctly propagated. now, getting a thread up and running and actually getting efficient parallelism are two entirely different things, as anybody who has ever tackled a single parallelism problem could tell you. so let's see what Carmack says further in his second paragraph.. he says exactly this - 'regardless of how easy it's to tinker with threads (in your grandma's smp way) this still grants you nothing in terms of effective paralellism'.
ok, now that we cleared up the matter with Carmack's statement we can return to the original topic - how much easier it is to achieve _efficient_parallelism_ on the 360 over the cell. and now it's your turn to step in and actually build your argument.
I've got to say I like the idea of SPE's forced memory management. I work on high level PC code and there's often occassions when I WANT to know what's passing through cache and where my data in distance from the processing logic. But then at Uni out of all the languages and programming models the one I liked most was assembler. I preferred to know exactly what the hardware is doing and to think like a CPU to make the most of it.Panajev2001a said:Originally Posted by GubbiThe fact that you have to explicitly move data around with DMAs is not an advantage.
Says you and others... but not everyone dislikes it .
I can't see who was talking about context switches on a SPE. Anyone wanting to run two+ concurrent threads on a SPE and switch between them needs their head examining! You set it a task, let it finish it, and then move onto another task. When would you not want to work that way on a SPE?Gubbi said:DMA transfers are kept coherent, but stores to SPE's local store are not. This means that the entire local store is part of the SPE's context and have to be saved on a context switch.
How about the prospect of a software framework for Cell that can automatically manage/optimize dataflow in this deterministic environment instead of hand optimization by a programmer?Gubbi said:The fact that you have to explicitly move data around with DMAs is not an advantage. Repeat: NOT... NOT.... N.O.T. an advantage. It was done to remove the complexity of keeping 9 cores coherent.
For small, trivial things, sure assembler is great, and understanding exactly how the machine is operating at a low level is a good thing. But aren't we so far beyond the simple, trivial cases that this is just not feasible--except for the highly-focused, performance-profiled directed situations?Shifty Geezer said:I've got to say I like the idea of SPE's forced memory management. I work on high level PC code and there's often occassions when I WANT to know what's passing through cache and where my data in distance from the processing logic. But then at Uni out of all the languages and programming models the one I liked most was assembler. I preferred to know exactly what the hardware is doing and to think like a CPU to make the most of it.
expletive said:b. from an 'ease of use' standpoint, the design in the 360 is the best possible case for a developer to coax performance benefits out of multithreading
c. even on the best possible case, its very difficult to realize real world benefits
Shifty Geezer said:Dunno. Large programs are broken into smaller procedures or code segments that make up your engine and these get pieced together to make the whole program. 256kb LS for data and code means your program isn't going to be totally massive, and I would guess much smaller than 256kb. Heck, 200kb of assembler isn't a pretty thought! You can achieve a lot in 32kb (whole 8 bit games even. Imagine how fast the original Elite could run when written for a SPE ) and I'd expect a process could be broken into manageable and efficient chunks. Seems more a matter of good design is needed rather than mystical programming powers. And note SPE's don't need assembler so the points moot anyway. Unless you're still developing for PS2!
each SPE is an independent processor running its own application programs, a shared, coherent memory
ihamoitc2005 said:People wanting multi-tasking on one SPE dont understand how to use it. Its not even needed. It processes one program at time in order of que, like 7 grocery store cashiers but where customers can move from long line to short line as needed..