PlayStation III Architecture

Panajev2001a · Jan 24, 2003

Damnit, you beat me! And your explination has a higher code density than mine - you do in 3 lines what I did in 50. Shit, I caught what Lazy8 has.

It must be contagious

I better take some Vitamin C after I finish reading your reply

Btw, if you like to discuss about something... "absolute timer", you saw that in the patent right ?

What it is supposed to do, according to the patent, is to provide that the execution time of each apulet/software Cell in the same time when you are running the software Cell on a faster APU to solve synchronization issues when the execution of a particular apulets is used to trigger other events...

[0143] In the future, the speed of processing by the APUs will become faster. The time budget established by the absolute timer, however, will remain the same. For example, as shown in FIG. 28, an APU in the future will execute a task in a shorter period and, therefore, will have a longer standby period. Busy period 2808, therefore, is shorter than busy period 2802, and standby period 2810 is longer than standby period 2806. However, since programs are written for processing on the basis of the same time budget established by the absolute timer, coordination of the results of processing among the APUs is maintained. As a result, faster APUs can process programs written for slower APUs without causing conflicts in the times at which the results of this processing are expected.

While this is an interesting concept, what I am wondering is: what happens if I run a program, written for an APU with speed X, on an APU of speed ( Y )... ?

X > Y

Which is the inverse of the situation the patent illustrate...

Panajev2001a · Jan 24, 2003

And your explination has a higher code density than mine - you do in 3 lines what I did in 50.

What can I say... I have a Thumb mode

Still this was a nice explanation... son of Lazy8s

( j/k )

Second: Cell is elegent in that as just one computational core and it's associated ISA, it spans the entire spectrum of computing needs. While, obviously, not everything will be based off of this architecture - much can be and can be economically.

This is important because I beleive, as Kutaragi has stated and Kaku has written about, that the future will be one of pervasive computing. Where almost every item we use will in someway be networked together and contain low-cost processing elements. It seems alien today, but all truely forward looking staements do. The problem, as Kutaragi talked about in that one interview, is that the internet of today is more like a bunch of stand alone islands with their own ISA, OS, procesing elements, ect. Cell, atleast for Sony and anyone who adopts this, will eliminate those burdens - without software thats costly in it's incompatability problems and/or preformance sacrifices.

Vince · Jan 24, 2003

Panajev2001a said:
While this is an interesting concept, what I am wondering is: what happens if I run a program, written for an APU with speed X, on an APU of speed ( Y )... ?

X > Y

Which is the inverse of the situation the patent illustrate...

Yeah, I gotcha. I didn't read into that part at all, just based on what you've stated. Off the top of my head - Perhaps, the absolute timer... is absolute. Thus, you code to a set, master clock, and then even if you're designing around a faster APU and it's preformance gains, it would still retain synchronization/compatability on some level with said 'Y' speed APUs. Then, next generation (not stictly of consoles) you raise the absolute timer. Or totally loose backwards processing ability, no?

simplicity · Jan 24, 2003

Cell is elegent in that as just one computational core and it's associated ISA, it spans the entire spectrum of computing needs.

Umm, sure. If you could convince people to throw away all their working code and rewrite everything. And if people didn't care about price/performance. Grid computing is only useful for a very narrow selection of problems.

Vince · Jan 24, 2003

simplicity said:
Umm, maybe so, but you could do exactly the same thing using many other commercial cores, such as MIPS, SPARC, PowerPC or x86.

Each of these doesn't have: (a) The transistor/die to performance ratio of Cell (I've done the math, do it yourself if you don't believe me. (b) The same ability to scale in performance, while maintaining architectural compatibility. (c) The cost effectiveness (d) A determined group of backers.

No-one is saying that another party couldn't do this - that Only Sony has the ability. But, nobody IS doing this. Microsoft has the 'legacy' PC market that they need to attend to as well as integrate in some point into the future living room. They also have a dependency upon x86 or a similar architecture due to Intel and AMD, who are in turn influenced in their designs by the PC world that MS helps shape.

I just don't see the Cellness as being useful in the consumer electronics world.

And if there's some other world where it's more useful, I don't see it being uniquely powerful.

I don't see you as a forward-looking person who can logically and dynamically think based on the world around him. This is going to happen regardless - this is just the closest and most feasible endeavor AFAIK.

Perhaps you get off on tinkering with your electronics and their problems, but I can't wait for the day that I just pick-up a new PDA and it seamlessly works with the other electronics in my house - allowing me to do what I want - without ever thinking about it. Only then will computing have reached a level of maturity. And this, my friend, is what will propel household level computing use forward.

Vince · Jan 24, 2003

simplicity said:
Umm, sure. If you could convince people to throw away all their working code and rewrite everything.

Whoa, <brain has officially shut down> Nobody is saying that the current PC industry or electronics industry will die overnight. You mind-boggle me. Showing the same sort of irrationality as Lazy8 - at times like this, I miss Ben even more.

See, what will happen is Sony and Toshiba will release Cell and use it themselves. They will put it, in some form, in their products starting in 2005. See, Sony isn't dumb. Back in 2000/2001 they had this big Pow-Wow (is that spelled right?) between the Sony Groups, which usually only happens when the company is going to fundamental change, and this it did. They thought about the future, unlike you, and saw that Broadband delivery and networked products are either going to kill them if they stay idle, or be their future if they regroup. So, they chose to change - they re-aligned the Sony Groups around SCE and, basically, Cell. Everything they [Sony] now plan or do it in someway a step towards the broadband future - Did you see how they just fired the Sony Music chair? Because he wasnâ€™t working with the other groups and making process on electronic delivery, don't you read the Journal?

Beyond this, looks whoâ€™s involved - Toshiba on the hardware. Matsushita/Panasonic on the Open-Sourced OS for electronic devices based off Linux (who called that). If you want to believe other, then basically everyoneâ€™s onboard with Blu-Ray.

So, whatâ€™s going to happen is - Sony will start releasing products with Cell and a way to communicate; Wireless, Broadband, et al. People will buy them. People will want other electronic products. People and Salesman says, "Wow, If I buy the Sony product, it'll work with my other one and allow me to do this.. Or that... Easily." People buy more Sony products. Sony makes more money. Other companies want to make more money. Other companies see Sony Advantage (eg. Trinitron). Other companies license Cell from Sony. Sony makes more money.

And if people didn't care about price/performance. Grid computing is only useful for a very narrow selection of problems.

Whoa again. How do you know Cell won't be one of the best price/performance solutions? It's Cellular, this it's scalable from 1 core (ie. cost effective) to multiple cores. It has its RAM onboard - no external memory needed in low-cost devices. It's on a 65-nm, SOI process - will probably yield well, be small in its simpler configurations, and generate a low thermal output.

Did you not read anything I wrote about GRID-Computing? I give up.... Someone else can deal with you; some people just don't want to learn.

simplicity · Jan 24, 2003

Or, maybe they'll ship Cell in the PS3, where it will be moderately successful, and build a couple of GS-cube-type research projects that never ship commercially. Just like what happened with their last architecture.

I am a forward thinking person with an open mind -- I regularly get sucked in by new fads in CPU architecture -- heck, when I was a kid I thought the Intel IAPX 432 was going to kick ass. (Is anyone else here old enough to remember the pre-RISC days when capability architectures were going to take over the world?)

But after seeing so many CPU architecture fads come and go (remember Lisp Machines, Thinking Machines, Trilogy, classic RISC, Systolic arrays, Pixel Planes, Talisman, PicoJava, MaJIC, JINI, TransMeta, Itanium, Nuon, Tile-based-rendering?), I started looking beyond the hype, and started to apply simple tests to new ideas, such as, "what is the customer benefit?", "if it's such a great idea, can we do it with existing products?". And "how will the established players in the marketplace react to this?"

Grid computing just doesn't pass these simple tests, so I remain skeptical of its potential for success.

Panajev2001a · Jan 24, 2003

So, whatâ€™s going to happen is - Sony will start releasing products with Cell and a way to communicate; Wireless, Broadband, et al. People will buy them. People will want other electronic products. People and Salesman says, "Wow, If I buy the Sony product, it'll work with my other one and allow me to do this.. Or that... Easily." People buy more Sony products. Sony makes more money. Other companies want to make more money. Other companies see Sony Advantage (eg. Trinitron). Other companies license Cell from Sony. Sony makes more money.

and if we add Toshiba's products, which will get Cell too, we have an even broader range of products that would inter-operate together...

Think if Sony designed/pushed for Cell in cable boxes... you could set with your TV remote a channel and a program you want to record and the cable box at due time would find the sotrage it needs in a attached Cell compatible device and record it ( you do not need to know where... it could be a HD on a small Cell box in a whole other building... the cable box could record the stream in MPEG4 streamable fortmat and you could play it back when you want streaming it from this other location: this other location could be your PS3 and you could set it in such a way that the PS3 would write the MPEG 4 stream onto a Blu-Ray disc you left inside [piracy concerns aside for the purpose of this discussion )... very TiVo like but you would not even need to program your TiVo device... only hit the record button on the Interactive TV guide...

What Cell does IMHO to push pervasive computing is to take away a good chunk of the HW incompatibility issue that forbids devices from inter-operating quickly amongst them ( you can write complex interfaces but that slows down the whole process )...

Whoa again. How do you know Cell won't be one of the best price/performance solutions? It's Cellular, this it's scalable from 1 core (ie. cost effective) to multiple cores. It has its RAM onboard - no external memory needed in low-cost devices. It's on a 65-nm, SOI process - will probably yield well, be small in its simpler configurations, and generate a low thermal output.

Having on-board e-DRAM and SRAM ( LS + RAM ) is quite a NEAT advantage IMHO...

No need of external caches, no need of external main RAM ( low cost devices ) means that the motherboard will be much less complex as it won't need an expensive Northbridge, any memory slots, no memory bus, basically just a MAC chip, I/O HW and external input/output connectors ( USB, Firewire, video out, etc... ) and there you have a whole PDA with network capabilities and only two chips... you could even put the MAC chip and the I/O ASIC on the same die as the Cell processor... a nice SoC

...

And scalability ( 1 or more cores ) is IMHO the tip of the iceberg...

Number of execution units in each APU: can vary according to performance requirements...

Number of APUs in each PE: can vary according to performance requirements and inter-operation requirements ( I suspect that there are going to be basic standard specs as far as APUs per PE that are going to be followed when designing code that runs on a PS3 and a PDA for example... what could happen is that the apulet/software Cell is sent back because in the header it was requiring more APUs than the PDA had, maybe the PS3 [or any other Cell based device of similar complexity] could reformat the software Cells sent to the PDA after inquiring [or it could have done it before] how many APUs are available and formatting new software Cells accordingly if possible... a "simple" way to do it, that would not work all the time, would be to have two versions of the programs that would be used to have two Cell devices inter-operate and the second version assumes the minimum of 1 APU and does the work more serially than in parallel... what do you think Vince ? I might be seriously overlooking something as this is something I quickly thought on the spot )

Number of PEs in a single chip: can vary according to performance requirements...

Software Cells can run anywhere/any PE on a network ( meeting the number of APUs requirement specified in the header ) regardless of number of PEs, clock frequency or number of execution units in the APUs...

That gives quite a lot of flexibility to the HW designers to make a Cell based product with specs fit around the purpose of the HW they are working on and have it inter-operate quite painlessy with other Cell devices...

Panajev2001a · Jan 24, 2003

two more things to say ( yeah two more

):

first about my example of saving a video-stream onto a Blu-Ray disc while you are away or you are watching TV or something... what about DRM you will say ?

Well that can be dealt with: the stream might come with some code and bits here and there set so that the stream might play only in the Cell devices you registered ( this would imply that after you buy your Cell product you connect it online [even on a 56 k conenction] and it gets registered under your User_id and generating a key that would be written in those streams you save to a Blu-Ray disc ) and would not be playable after a certain date has expired or forever if you "electronically bought" the video...

Playing that Blu-Ray disc on another user device would not be allowed unless your subscription plan allowed you ( this can be encoded in the audio/video stream itself )...

Vince,

about the absolute timer... let me post all the relevant sections in the patent ( I love thinkering with this synchronization issue

):

[0139] The processors of system 101 also employ an absolute timer. The absolute timer provides a clock signal to the APUs and other elements of a PE which is both independent of, and faster than, the clock signal driving these elements. The use of this absolute timer is illustrated in FIG. 28.

[0140] As shown in this figure, the absolute timer establishes a time budget for the performance of tasks by the APUs. This time budget provides a time for completing these tasks which is longer than that necessary for the APUs' processing of the tasks. As a result, for each task, there is, within the time budget, a busy period and a standby period. All apulets are written for processing on the basis of this time budget regardless of the APUs' actual processing time or speed.

[0141] For example, for a particular APU of a PE, a particular task may be performed during busy period 2802 of time budget 2804. Since busy period 2802 is less than time budget 2804, a standby period 2806 occurs during the time budget. During this standby period, the APU goes into a sleep mode during which less power is consumed by the APU.

[0142] The results of processing a task are not expected by other APUs [edit: in the same PE I expect], or other elements of a PE, until a time budget 2804 expires. Using the time budget established by the absolute timer, therefore, the results of the APUs' processing always are coordinated regardless of the APUs' actual processing speeds.

[0143] In the future, the speed of processing by the APUs will become faster. The time budget established by the absolute timer, however, will remain the same. For example, as shown in FIG. 28, an APU in the future will execute a task in a shorter period and, therefore, will have a longer standby period. Busy period 2808, therefore, is shorter than busy period 2802, and standby period 2810 is longer than standby period 2806. However, since programs are written for processing on the basis of the same time budget established by the absolute timer, coordination of the results of processing among the APUs is maintained. As a result, faster APUs can process programs written for slower APUs without causing conflicts in the times at which the results of this processing are expected.

[0144] In lieu of an absolute timer to establish coordination among the APUs, the PU, or one or more designated APUs, can analyze the particular instructions or microcode being executed by an APU in processing an apulet for problems in the coordination of the APUs' parallel processing created by enhanced or different operating speeds. "No operation" ("NOOP") instructions can be inserted into the instructions and executed by some of the APUs to maintain the proper sequential completion of processing by the APUs expected by the apulet. By inserting these NOOPs into the instructions, the correct timing for the APUs' execution of all instructions can be maintained.

So we have two things here ( I will post this message first and I will expand on these two things in a later post, so that people can read this at least ):

a.) an absolute timer to cordinate execution of code across different APUs ( Software Cells, as you can see in the headers of the software Cells themselves, are allowed to require a certain number of APUs to be used for the APU program(s)/code ). The "Time Slice" is as long as all the APUs needed for "that specific task" need to finish the task.

b.) each APU (

, wel quite a shock almost... ) can parse the APU program and insert NOOPs if the code was written for a slower APU ( you would increase the standby period by inserting the NOOPs when you are executing code generated for a slower APU on a faster APU... )...

The absolute timer is always faster than the clock speed of the Cell chip we have and always tries to give us a specific time budget ( for all the APUs that perform a task the time slice given to them is longer than the time they will all be done with that task )...

I think that the solution to my problem is easier than what I expected it to be... since the absolute timer orchestrate execution across APUs on the same chip, as long as we do not go with an asynchronous design in which we clock some APU slower than others ( if we clocked them faster than the speed the code running on them was generated at it would not be a problem, put some NOOPs, but I'd stay away from asynchronous design for a while ) we should not run into troubles...

See, if we design a program for a 2 GHz Cell chip which uses in parallel 4 APUs and we try to run it on a 1 GHz Cell chip with 4 available APUs there is not going to be a problem: you see that the "finished" result coming out later than if we ran the program on a 2 GHz machine ( more thought to be put here if you were thinking about synchronization issues between several different Cell based boxes on a network link ), but it would come out first of all and it would come out CORRECT if our plan was only to offload processing to another set of free APUs and wait until they reply that the data is available...

I mean if the program was written for a 2 GHz machine with 2 GHz APU, well the program is running now on a 1 GHz machine with 1 GHz APUs... if it takes half a time unit for step 1 on APU 1 at 2 GHz it will take half a time unit on the APU 2 thatg runs at 1 GHz as the absolute timer changes with clock speed...

( and you have some APUs that would wait for those messages... goodness, I am drooling over Cell already :lol bad Panajev bad

... I keep thinking at how versatile this architecture is [see the TCP/IP processing examples and the MPEG processing example in the patent] )...

Edit to the edit: I was going ballistic... too far...

Panajev2001a · Jan 24, 2003

Ok... some more

I was reading something that almost scared me off as it seemingly shot down all the expectations I was building about Cell... I said seemingly...

[0060] To take further advantage of the processing speeds and efficiencies facilitated by system 101, the data and applications processed by this system are packaged into uniquely identified, uniformly formatted software cells 102. Each software cell 102 contains, or can contain, both applications and data. Each software cell also contains an ID to globally identify the cell throughout network 104 and system 101. This uniformity of structure for the software cells, and the software cells' unique identification throughout the network, facilitates the processing of applications and data on any computer or computing device of the network. For example, a client 106 may formulate a software cell 102 but, because of the limited processing capabilities of client 106, transmit this software cell to a server 108 for processing. Software cells can migrate, therefore, throughout network 104 for processing on the basis of the availability of processing resources on the network.

If you read the last part I highlighted, you would think that yes the ISA is constant across all APUs, but changing the number of FP Units would cause the software Cell to migrate to another APU with enough FP Units...

But reading carefully this is not the case...

The patent says "a client 106 may formulate a software cell 102 but, because of the limited processing capabilities of client 106, transmit this software cell to a server 108 for processing."

but it also says this "Software cells can migrate, therefore, throughout network 104 for processing on the basis of the availability of processing resources on the network."...

What this tell me is that the previous comment is related to the Software Cells' header in which there is a field that says "number of APUs" and one about "sandbox size" ( btw, isn't it neat that each APU can only access one sandbox because of the APU ID key, but the PU can set the APU ID mask so that an APU can access more than a single sandbox ? the PU, btw, runs only trusted programs )... what I expect is, for example, that a software Cell needs 4 APUs and there are not 4 APUs or they are busy... in that case we do see the software Cell migrate...

The patent also says this:

Because each member of system 101 performs processing using one or more (or some fraction) of the same computing module, the particular computer or computing device performing the actual processing of data and applications is unimportant.

We all knew this... same ISA across all Cell processors and uniform ( yet modular and scaable ) structure guarantees that we can perform the required computation anywhere on the network... if anywhere on the network there is at least one chip that has the required resources ( number of APUs, etc... )...

What does it mean uniform structure or "same computing model"... in what condition would a software Cell have to migrate ?

Again from the patent:

These processors also preferably all have the same ISA and perform processing in accordance with the same instruction set. The number of modules included within any particular processor depends upon the processing power required by that processor.

Click to expand...

This clearly separates the number of processing modules ( regarding processing power concerns ) like PEs, etc... from the ISA: the ISA stays the same even if we change the number of processing modules according to performance needs...

Then we think again at the "Required Number of APUs" field in the software cells' header... that is the only parameter that would ( if I am reading everything correctly ) basically EXPOSE this "different computing model", this " lack of available resources that forces the software cell to migrate" from one PU/Cell system to another...

I do not think that a change in FP Units would cause this to happen as it would mean that the number of FP Units would be FIXED into the ISA, which does not seem the case as it would have been hinted in a bit heavvier manner, instead there seems to be evidence of the contrary...

The whole idea as the quote I posted earlier is based on UNIFORMITY of the computing model and same ISA across ALL the processors... Such an important concept thet I do not think they did not have that in mind wher writing this:

[0068] FIG. 4 illustrates the structure of an APU. APU 402 includes local memory 406, registers 410, four floating point units 412 and four integer units 414. Again, however, depending upon the processing power required, a greater or lesser number of floating points units 512 and integer units 414 can be employed.

Click to expand...

No mention of changes to the ISA, no mention of incompatibility, only "processing power" concerns which might require to put more or less than 4 FP Units and 4 Integer Units...

Thanks for reading guys.... I am spent now... for a while at least

kenneth9265 · Jan 25, 2003

Hi! I am new to this forum, I may not be up to date on tech stuff, but I love this site and thread! I have a question. I know that it is way too early but, what is the forum's guess on (providing what is know about the patent) how may Polygons per sec will the PS3 be able to do in real game settings? Will gamers finally be able to see "Toy Story-like" graphic? Will we be seeing the first games rendering 1 billion poly/sec? ( that was three questions wasn't it :?:

Did I say this is a great Forum???

Panajev2001a · Jan 25, 2003

how many... ? think about the word A LOT

megadrive0088 · Jan 25, 2003

I don't expect PS3 games to be "Toy Story like" in graphics... consoles won't be powerful enough with the right imaging features by 2005-2006.
however, think of lower end CG. like what you saw in the early 90s on television. (commercials, tv series) - or better yet, the CG you see in PS2 games. we'll probably have games like that on PS3

Panajev2001a · Jan 25, 2003

well SH3 realtime is better than a lot of CG I saw on PSX...

LogisticX · Jan 25, 2003

I don't expect PS3 games to be "Toy Story like" in graphics... consoles won't be powerful enough with the right imaging features by 2005-2006 ...or better yet, the CG you see in PS2 games. we'll probably have games like that on PS3

You mean like FFX?

Vince · Jan 25, 2003

I don't expect PS3 games to be "Toy Story like" in graphics... consoles won't be powerful enough with the right imaging features by 2005-2006

Hmm... I don't know. While Archie or Faf's presence would be much appreciated (although I realize why they shy away), I think you don't realise how powerful even todays hardware truely is.

For example, DoomIII is the undesputed benchmark of next generation games - Yet, it's designed around the 'features' that are present in the Nv1x line of cards dating from late ~2000. Carmack, himself, has stated that the introduction of the DX9 generation has made his work irrelevent from a "cutting-edge" point of view.

Now, going from there - the future will be not only in raw specs (as you asked for), but in the architectures flexibility and the ability of a developer to utilize this threw HLSLs such as in OGL2.0 or DX9+ or a spin-off of Stanford's (among any number that exist). Being intelligent is often preferable to being 'brute-force' powerful - a perfect example of this is Shaders. Gary Tarolli of nVidia once said a great example: If you're making a realistic fabric - you can either model each fiber out of polygons (unrealistic), or write a Shader which will give the same visual effect.

So, I'm rambling, but in conclusion. You haven't seen anything yet - not only talking about PS3 or Xbox Next, but even todays CineFX.

Steve Dave Part Deux · Jan 25, 2003

The challenge is going to be developing a cohesive development environment that programmers can migrate to with relative ease(i.e. one that can interface easily and effectively with existing software) and couple it with tools that are sophisticated and powerful.

Lazy8s · Jan 25, 2003

While real-time graphics will soon look satisfyingly close to Toy Story quality, they'll still be a ways off from the actual levels of sampling and the intensive algorithms used on the offline renderers for Toy Story. Systems of the next generation certainly won't have those kinds of resources, but they'll make enhancements in other ways to provide equally stunning effects and results when combined with good art direction/application.

kenneth9265 · Jan 26, 2003

Thanks guys!

Panajev2001a · Jan 26, 2003

(although I realize why they shy away)

Stop looking over here, I am only trying to understand things here

Seriously, I would like if they were here too: it wuld help me keeping my immagination strictly to "educated guesses" level and have more perspective...

Well if they or you Vince want to help me shot down those huge long posts I have made in this page and save the intelligent ideas I had ( hopefully there are

), please do so... I enjoy polite discussion

PlayStation III Architecture

Similar threads