The role of the PPE?

RedBlackDevil · Mar 6, 2006

Npl said:
First processors had a few Thousands of Transistors, and still are capable of doing any programmable task if given enough memory (and a reasonably big adressspace) - speed being abviously a problem.

Neither of the words you thro around is a *functional* requirement for a CPU, but rather performance-optimizations.

yes, as the C=64 cpu, it can do any programmable task, ops yes, there's a speed problem

when I distinct "theoretical" and "real world" I've wasted some time

GPUs cant do the later 2 points - at least for now - and thus dont qualify as full CPU

you are wrong

Xenos can do
Xenos IS the memory controller of 360

Jawed · Mar 6, 2006

Shifty Geezer said:
RedBlackDevil : You certainly could create a game on a SPE. It wouldn't be as efficient in the code outlined above, but it is possible, unlike a GPU. For example, someone could port 'Manic Miner' to SPE and have it run. You won't get a current GPU running 'Manic Miner'.

"Frogger":

http://www.beyond3d.com/articles/shadercomp/results/

by Eyal Teler (fourth on the page). Runs fine on R300.

Jawed

scooby_dooby · Mar 6, 2006

Titanio said:
Untrue. Why do I have to take a given task and run it across multiple SPEs? I may well choose to have a task run as a serial, non-parallel program on one SPE (in parallel with any number of other independent tasks, which themselves may or may not be parallelised).

It's still running on a 2nd core. The issues of synchronizng the data across multiple cores still exist.

Npl · Mar 6, 2006

RedBlackDevil said:
yes, as the C=64 cpu, it can do any programmable task, ops yes, there's a speed problem

when I distinct "theoretical" and "real world" I've wasted some time

Ok, sounded quite different when I read it

RedBlackDevil said:
you are wrong

Xenos can do
Xenos IS the memory controller of 360

Its rather GPU & seperate Mem-Controller on 1 Die. The Pixel/Vertex Shaders are still more regular than you want to believe. MEMEXPORT aint changing much in that respect, you are still setting up the programms to be run via CPU and the Shaders CANT change it itself - a SPE can.

Shifty Geezer · Mar 6, 2006

RedBlackDevil said:
mah, SPE is geared for single-precision SIMD computation. Most of its arithmetic instructions operate on 128-bit vectors of four 32-bit elements.

a gpu as Xenos too, for example, can read any type of data, what's your point?

That SPE is geared towards a particular strength isn't in question. The point is you said SPE was incapable of doing the things a CPU can, which is a flat out load of prime bunkum nonsense.

have all the logic needed to perform any programmable task?
in only 7 milions of transistor?

Pentium Pro : 5.5 million transistors
Intel 80386DX : 275,000 transistors
Motorolla 68000 : 68,000 transistors
Intel 8086 : 29,000 transistors
MOS 6502 : 9,000 transistors
Zilog Z80 : 8,500 transistors

What have computers using Pentium Pro and earlier CPUs actually been using if not General Purpose Processors, because anything at 7 million transistors or less is, by your reckoning, incapable of having the necessary logic to be a general purpose processor? Was the BBC running on a GPU? Was the Sinclair Spectrum powered by a DSP?

no, they lack control logic of the instruction window, it doesn't do register renaming or instruction reording, so it needs neither a rename register file or a reorder buffer, it lacks branch prediction and code scheduling logic, and it lack cache, as being a little little celeron

None of which are essential for being able to write any program. See the 8086 above!

local addressable memory is not cache.

Cache isn't logic, so you can't say SPE hasn't the logic and then point to cache. Cache isn't essential for running general purpose tasks. It's an advantage but not essential. See the 8086 above! SPE has all the logic it needs to perform any process asked of it, hence it is capable of general purpose code.

Shifty Geezer · Mar 6, 2006

Jawed said:
"Frogger":

http://www.beyond3d.com/articles/shadercomp/results/

by Eyal Teler (fourth on the page). Runs fine on R300.

Jawed

Cool

. It's not manic miner though

How would you class the IO though? Is the GPU still needing CPU intervention to provide it with keyboard inputs? If so, SPE has the difference in having direct access to functionality without needing other procesors to format and feed it data, as it were. If not, SM2.0 is more capable than I thought!

RedBlackDevil · Mar 6, 2006

Shifty Geezer said:
That SPE is geared towards a particular strength isn't in question. The point is you said SPE was incapable of doing the things a CPU can, which is a flat out load of prime bunkum nonsense.

Pentium Pro : 5.5 million transistors
Intel 80386DX : 275,000 transistors
Motorolla 68000 : 68,000 transistors
Intel 8086 : 29,000 transistors
MOS 6502 : 9,000 transistors
Zilog Z80 : 8,500 transistors

great comparison, can find some worse cases please?

the whole Cell is 235 Milions of transistors, the 7 spe are 49 milions togheter
this is a best comparison

how, they don't need the 190 Milions of exeeding transistors, just because a spe is a complete cpu [/ironic]

What have computers using Pentium Pro and earlier CPUs actually been using if not General Purpose Processors, because anything at 7 million transistors or less is, by your reckoning, incapable of having the necessary logic to be a general purpose processor? Was the BBC running on a GPU? Was the Sinclair Spectrum powered by a DSP?
None of which are essential for being able to write any program. See the 8086 above!
Cache isn't logic, so you can't say SPE hasn't the logic and then point to cache. Cache isn't essential for running general purpose tasks. It's an advantage but not essential. See the 8086 above! SPE has all the logic it needs to perform any process asked of it, hence it is capable of general purpose code.

yes, the spe can be used to actual general game cose as 8086, great comparison, but this time I agree, what I have to say to make understand the difference between theory and "real world"?

really, I'm repeating the same thing over and over

RedBlackDevil · Mar 6, 2006

Jawed said:
"Frogger":

Originally Posted by Shifty Geezer
RedBlackDevil : You certainly could create a game on a SPE. It wouldn't be as efficient in the code outlined above, but it is possible, unlike a GPU. For example, someone could port 'Manic Miner' to SPE and have it run. You won't get a current GPU running 'Manic Miner'.

http://www.beyond3d.com/articles/shadercomp/results/

by Eyal Teler (fourth on the page). Runs fine on R300.

Jawed

Fixed

Titanio · Mar 6, 2006

scooby_dooby said:
It's still running on a 2nd core. The issues of synchronizng the data across multiple cores still exist.

No, you're running it on one core. There's no synchronisation at play here if it's a wholly independent task, as I qualified. If you're sharing data with other processes on other SPEs, or whatever, then yes, you have synchronisation to deal with. But that's really not a big deal at all, sharing access to data. That's really a nuts-and-bolts issue, the much bigger challenge will be parallelising your algorithmics in the first place if you want to leverage more than one core (or in some cases, at least, that'll be the challenge..beyond those which lend themselves more easily to parallelism).

All that said, there are some tasks you may be able to parallelise without having to worried about shared memory access at all. If your data is parallelisable, you can give one chunk to one process on a SPE, another to another process on a different SPE, and let each process their respective chunks in parallel without treading on the other's toes. Parallelism itself, even, does not necessarily imply synchronisation either.

In summary though, if you cannot parallelise a task, that is no impediment to using a SPE for it. You can run that serial task on a single SPE, in parallel with others (whether it needs to be watchful of shared data access or not).

nelg · Mar 6, 2006

The reason I asked is that Deano mentioned in his blog that he predicts that eventually, "3rd Gen: SPU completely dominant with PPU now more of a game coprocessor". With this in mind I assumed that it was really the learning curve involved that necessitated the PPE. w So the question is why not have a single modified SPU, with branch prediction etc., in place of the PPE? Does it do anything inherently better than the SPUs (modified ones)?

Shifty Geezer · Mar 6, 2006

RedBlackDevil said:
Fixed

Fixed what? Seems to me you just quoted two people together, and I'd already read and responded to Jawed's comment without you needing to repeat it. Perhaps you can explain how the GPU handles player IO and how SPE's are exactly the same?

Now you're losing track of the argument, so I'll remind you of what you started with

SPE are SP-FP units
PPE is a General Purpose Central Processing Unit

spe can't run general game code.
that's all, now you understand that without PPE, is like to be without CPU but with 7 Co-Proc
useless

What you have said is a SPE can't run general purpose code. That's false. You've given reasons why SPE can't run general code, none of which have been valid. You've given descriptions of SPE architecture that isn't true. You've said a 7 million transistor processor hasn't enough logic to be a genreal processor, despite CPUs of old managing with far far less transistors. So far your reasoning has been faulty.

Now, if you want to change your phrasing and say 'SPE can't run general purpose code quickly enough to match a conventional CPU' just go ahead and say so. That's a very different thing to 'is only a Single Precision Floating Point unit' as you first argued. It would also show a degree of intelligence if when corrected on points like 'SPE's being unable to access memory without going through PPE', you acknowlede the correction.

scooby_dooby · Mar 6, 2006

Titanio said:
No, you're running it on one core. There's no synchronisation at play here if it's a wholly independent task, as I qualified.

You introduced that qualifier but it wasn't what we were talking about. I said any code that can't be synchronized across multiple cores, will have to be run on the PPE, it would seem to me this would be predominately inter-dependant tasks.

It would also depend on the skills of the developer, their experience and timelines.

Shifty Geezer · Mar 6, 2006

Why would it have to be run on PPE and not a SPE though? Let's say audio can't be synchronized across multiple cores. Why would that have to be run on PPE?

Titanio · Mar 6, 2006

scooby_dooby said:
You introduced that qualifier but it wasn't what we were talking about. I said any code that can't be synchronized across multiple cores, will have to be run on the PPE

Why? Why can't you take that serial task and run it on a SPE?

You're also hinging your argument on the notion that there's may be an impossibility in controlling access to shared data - if I assume correctly that this is what you mean by synchronisation. I'm not sure if that should ever be impossible. I mean, one can think of "issues" with sharing data, for example in a worst case if you have a lot of processes all intensively accessing a small amount of data, but such challenges aren't related to the simple mechanism of controlling shared access.

one · Mar 6, 2006

AFAICS in what the Cell developers themselves had to say, PPE is there because of the tight schedule, easier software migration, and exploiting what IBM already had in SMP and virtualization support.

http://www.research.ibm.com/journal/rd/494/kahle.html

Support for introduction in 2005

The objective of the partnership was to develop this new processor with increased performance, responsiveness, and security, and to be able to introduce it in 2005. Thus, only four years were available to meet the challenges outlined above. A concept was needed that would allow us to deliver impressive processor performance, responsiveness to the user and network, and the flexibility to ensure a broad reach, and to do this without making a complete break with the past. Indications were that a completely new architecture can easily require ten years to develop, especially if one includes the time required for software development. Hence, the Power Architecture* was used as the basis for Cell.

Power Architecture compatibility

The Broadband Processor Architecture maintains full compatibility with 64-bit Power Architecture [4]. The implementation on the Cell processor has aimed to include all recent innovations of Power technology such as virtualization support and support for large page sizes. By building on Power and by focusing the innovation on those aspects of the design that brought new advantages, it became feasible to complete a complex new design on a tight schedule. In addition, compatibility with the Power Architecture provides a base for porting existing software (including the operating system) to Cell. Although additional work is required to unlock the performance potential of the Cell processor, existing Power applications can be run on the Cell processor without modification.

The fourth objective (schedule) was met jointly by constructing the Cell processor by using the Power Architecture as its core. Thus, IBM experience in designing and verifying symmetric multiprocessors could be leveraged. Even though the SPEs operate on local memory, the DMA operations are coherent in the system, and Cell is a ten-way SMP from a coherence perspective. Also, by building on Power technology, existing operating systems and applications can run without modification, and the extra effort the programmer makes is needed only to unleash the power of the SPEs. Ease of programming was also the primary motivation for including the Power Architecture SIMD extensions on the PPE. This allows for a staged approach, where code is developed and then SIMD-vectorized in a familiar environment, before performance is enhanced by using the synergistic processors.

nelg · Mar 6, 2006

Thanks one!

scooby_dooby · Mar 6, 2006

Titanio said:
Why? Why can't you take that serial task and run it on a SPE?

You're also hinging your argument on the notion that there's may be an impossibility in controlling access to shared data - if I assume correctly that this is what you mean by synchronisation. I'm not sure if that should ever be impossible. I mean, one can think of "issues" with sharing data, for example in a worst case if you have a lot of processes all intensively accessing a small amount of data, but such challenges aren't related to the simple mechanism of controlling shared access.

Hinging my argument? It's well known that developers have a long way to go in even effectively using multicore architectures at all, the game engine's that currently exist are not built for multi-core and if you listen to some people like the developers anandtech spoke to it will be 3 or 4 years before they even get a handle on a simple symetric 2-3 core architecture like Xenon. Given, that may be an extreme outlook, but there are obviously basic problems in splitting a game engine across multiple cores, and these problems must still exist even if you are only using one spe, as it's still an additional core and the challenges in spreading code across more than one core do not just vanish, you're still gonna run into these same base issues.

I'm only assuming that the most difficult problem in splitting the game engine is keeping everything synchronized, but maybe one of the resident developers can add some more detail on some of the bigger problems on spreading a game engine over 2 or more symetrical or asymetrical cores while trying to extract maximum efficiency from all the cores...

Titanio · Mar 6, 2006

scooby_dooby said:
Hinging my argument? It's well known that developers have a long way to go in even effectively using multicore architectures at all, the game engine's that currently exist are not built for multi-core and if you listen to some people like the developers anandtech spoke to it will be 3 or 4 years before they even get a handle on a simple symetric 2-3 core architecture like Xenon. Given, that may be an extreme outlook, but there are obviously basic problems in splitting a game engine across multiple cores, and these problems must still exist even if you are only using one spe, as it's still an additional core and the challenges in spreading code across more than one core do not just vanish, you're still gonna run into these same base issues.

I'm only assuming that the most difficult problem in splitting the data is keeping everything synchronized, but maybe one of the resident developers can add some more detail on some of the bigger problems on spreading a game engine over 2 or more symetrical or asymetrical cores while trying to extract maximum efficiency from all the cores...

Parallelising tasks is a much larger problem than nuts and bolts synchronisation. When I think of the latter I think of shared memory access and the like. You're discussing now much more complex and higher level issues, and I have said before that you'll run into more challenges with this - the algorithmics, basically - than the mechanics of mutual exclusion or whatever.

However, NONE of this determines what code can be run where, in and of itself. As I read it, your argument -

"these problems must still exist even if you are only using one spe, as it's still an additional core and the challenges in spreading code across more than one core do not just vanish"

- is that if you cannot parallelise something, spread it over multiple SPEs, then you can't use a SPE for it? That's simply untrue.

Using a single SPE for a task does not entail parallelising it over two cores, at all. Those issues - the parallelising of the algorithm - do just "vanish", since it's just one core the the code is executing on. Where's the other core?

Bobbler · Mar 6, 2006

RedBlackDevil said:
great comparison, can find some worse cases please?

the whole Cell is 235 Milions of transistors, the 7 spe are 49 milions togheter
this is a best comparison

how, they don't need the 190 Milions of exeeding transistors, just because a spe is a complete cpu [/ironic]

An SPE isn't 7million... it's ~21 million. It isn't really an SPE without its SRAM/etc. The SPEs take up around ~170m transistors of the Cell, the FlexIO, PPE, EIB, etc. take up the rest.

yes, the spe can be used to actual general game cose as 8086, great comparison, but this time I agree, what I have to say to make understand the difference between theory and "real world"?

really, I'm repeating the same thing over and over

Lets go off on this wild tangent and assume that an SPE is like an 8086 in its ability to run "general" game code (per transistor -- assume 29k 8086 transistors == 7m (or 21 even!) SPE transistors), it still doesn't change the fact that an SPE is going to run at a clock speed roughly 300-500 times faster.

one · Mar 6, 2006

scooby_dooby said:
Hinging my argument? It's well known that developers have a long way to go in even effectively using multicore architectures at all, the game engine's that currently exist are not built for multi-core and if you listen to some people like the developers anandtech spoke to it will be 3 or 4 years before they even get a handle on a simple symetric 2-3 core architecture like Xenon. Given, that may be an extreme outlook, but there are obviously basic problems in splitting a game engine across multiple cores, and these problems must still exist even if you are only using one spe, as it's still an additional core and the challenges in spreading code across more than one core do not just vanish, you're still gonna run into these same base issues.

I assume there are many PS2 developers who are used to developing games and engines on a CPU with 3 asymmetric processing units.

The role of the PPE?

RedBlackDevil

Jawed

scooby_dooby

Npl

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

RedBlackDevil

RedBlackDevil

Titanio

nelg

Shifty Geezer

uber-Troll!

scooby_dooby

Shifty Geezer

uber-Troll!

Titanio

one

Unruly Member

nelg

scooby_dooby

Titanio

Bobbler

Shazbot!

one

Unruly Member

Similar threads