PlayStation III Architecture

*patiently waits for Panajev to get to his set of paragraphs for answers*
:D

*throws out another question* :)

what again are the constants of this architecture? I know you said the number of functional units (fp and integer) can change as needed, but are say, the number of APUs per PE always going to be 8? etc...
(not counting Visualizer which may or may not be a special case...or....not completely a Cell/PE)

I didn't say that... the patent did :D

The number of APUs per PE and the number of PEs can variate too, this architecture is modular... IMHO the patent makes it quite evident that thanks to this "absolute timer", the constant ISA across ALL kinds of APUs and other tricks you're not depending on number of execution units or frequency and a CELL program ( not using the Visualizer as some devices might not even have it ) should run on all APUs in a PDA, in a TV or a Microwave oven...
 
Thanks for the great post Panajev, Ive been reading it, and your other ones again. all extremely interesting.... this whole subject.



at least one of my speculations seems to be coming true. that the GS3 (Visualizer) has it's own Cell bolted on, or intergrated into, the rasterizing portion. thus the Broadband Engine/4-Processing Element configuration should be mostly unburdened from having to provide geometry & lighting computations (i know im thinking in the old T&L terms from 3-5 years ago)


Edit:

I didn't say that... the patent did

opps!, sorry, that's right, the patent did, not you :)

The number of APUs per PE and the number of PEs can variate too, this architecture is modular... IMHO the patent makes it quite evident that thanks to this "absolute timer", the constant ISA across ALL kinds of APUs and other tricks you're not depending on number of execution units or frequency and a CELL program ( not using the Visualizer as some devices might not even have it ) should run on all APUs in a PDA, in a TV or a Microwave oven...

Ahh ok, that's extremely interesting....that the number of APUs, number of Processing Elements (and FP/Integer units per APU) are all modular. makes me think even more of the P10.

I still do not know what I.S.A. stands for or what it is.... is that the PE? (sorry!)

heh, i brought out my LEGOs to help visualize how this whole architecture works ...tremendous fun, i highly recommend trying it for anyone that has Legos... :D
 
I think the Broadband Engine would still be providing T&L: 1 TFLOPS only for physics ;) uhm, it doesn't sound right to me...

those 256 GFLOPS can be used pretty fast when you think about long and complex pixel programs and triangle setup which should be done by the visualizer as long as other tricks performed on the GPU...


look at this...

http://makeashorterlink.com/?B27621F03
 
Agreed I hope developers like Fafalada will like thsi Visualizer... they wanted 100% full programmability and it seem they got it...
You bet we will. :p Funny thing is that if this comes to be the real thing, it's remarkably close to some of my sillier speculations I talked about when Cell was first mentioned.
 
I think the rendering engine is the Visualizer...

well.... yeah, of course it is :D the Visualizer is the GS3 or in place of the GS3. the Visualizer is the rasterizer PLUS at least SOME of the T&L/
vertex processing. Like the whole GS plus one of the VUs on EE. IMHO.
 
ISA = Instruction Set Architecture... the last layer of abstraction that you have between code and the micro-processor underlying HW... the set of operations the processors understands basically, what is exposed about the underlying HW to the programmer basically... ( very quick definition )
 
You bet we will. :p Funny thing is that if this comes to be the real thing, it's remarkably close to some of my sillier speculations I talked about when Cell was first mentioned.


Please tell me more about what you like about this architecture and what not... I do mlike it plenty... FINALLY CARMACK could code for a Sony console :) This seems calling for OpenGL 2.0 ( or Stanford Shading Language ) ;) so badly ;)

What I see changing is, like the patent says, the number of execution units, PEs and APUs... I do no see the GPU going to fixed functions like the current GS has...

I think Sony WANTS the Broadband Engine andthe Visualizer GPU to be very flexible and fully programmable... please PS3 devs do not start saying that the libs Sony will provide you are too high level ok :LOL:
 
You bet we will. Funny thing is that if this comes to be the real thing, it's remarkably close to some of my sillier speculations I talked about when Cell was first mentioned.

I actually even remember that statement... ;)

Please tell me more about what you like about this architecture and what not... I do mlike it plenty... FINALLY CARMACK could code for a Sony console This seems calling for OpenGL 2.0 ( or Stanford Shading Language ) so badly

Well for one it's simple with a metric ton of execution resources which is cool (and what I expected). It's certainly another excuse to explore purely functional programming :p again... I'm also pretty sure (assuming reasonable execution utilization) that it would tear through decision trees and eat A* for lunch... I'm sure devs will take a look at genetic algorithms and neural nets (again!)... I wonder if we'll finally get to see something from all that robotics and distributed meta-object oriented OS and software research by CSL.

One thing I am curious about though is how soon will you see this architecture sneaking in (if at all) on other devices from Sony and Toshiba.
 
If this turns out to be the final hardware... and it can run Linux and OpenGL... Then I'll use this machine for 3D modeling :) Damn, even ILM would throw out their PCs and render farms :))
 
archie,

would you be suprised to see this CELL architecture ( not the Broadband Engine and the CELL based 256 GFLOPS rasterizer, but the PEs ), the PEs being used in PDAs, TVs and Stereo ? There are several parts of the patentt that really let me think this architecture was designed to scale in a whole different variety of products...


[0011] In accordance with the present invention, all members of a computer network, i.e., all computers and computing devices of the network, are constructed from a common computing module. This common computing module has a consistent structure and preferably employs the same ISA. The members of the network can be, e.g., clients, servers, PCs, mobile computers, game machines, PDAs, set top boxes, appliances, digital televisions and other devices using computer processors. The consistent modular structure enables efficient, high speed processing of applications and data by the network's members and the rapid transmission of applications and data over the network. This structure also simplifies the building of members of the network of various sizes and processing power and the preparation of applications for processing by these members.

[0012] In another aspect, the present invention provides a new programming model for transmitting data and applications over a network and for processing data and applications among the network's members. This programming model employs a software cell transmitted over the network for processing by any of the network's members. Each software cell has the same structure and can contain both applications and data. As a result of the high speed processing and transmission speed provided by the modular computer architecture, these cells can be rapidly processed. The code for the applications preferably is based upon the same common instruction set and ISA. Each software cell preferably contains a global identification (global ID) and information describing the amount of computing resources required for the cell's processing. Since all computing resources have the same basic structure and employ the same ISA, the particular resource performing this processing can be located anywhere on the network and dynamically assigned.

[0020] In another aspect, the present invention provides an absolute timer for the processing of tasks. This absolute timer is independent of the frequency of the clocks employed by the APUs for the processing of applications and data. Applications are written based upon the time period for tasks defined by the absolute timer. If the frequency of the clocks employed by the APUs increases because of, e.g., enhancements to the APUs, the time period for a given task as defined by the absolute timer remains the same. This scheme enables the implementation of enhanced processing times by newer versions of the APUs without disabling these newer APUs from processing older applications written for the slower processing times of older APUs.

[0021] The present invention also provides an alternative scheme to permit newer APUs having faster processing speeds to process older applications written for the slower processing speeds of older APUs. In this alternative scheme, the particular instructions or microcode employed by the APUs in processing these older applications are analyzed during processing for problems in the coordination of the APUs' parallel processing created by the enhanced speeds. "No operation" ("NOOP") instructions are inserted into the instructions executed by some of these APUs to maintain the sequential completion of processing by the APUs expected by the program. By inserting these NOOPs into these instructions, the correct timing for the APUs' execution of all instructions are maintained.

[0068] FIG. 4 illustrates the structure of an APU. APU 402 includes local memory 406, registers 410, four floating point units 412 and four integer units 414. Again, however, depending upon the processing power required, a greater or lesser number of floating points units 512 and integer units 414 can be employed. In a preferred embodiment, local memory 406 contains 128 kilobytes of storage, and the capacity of registers 410 is 128.times.128 bits. Floating point units 412 preferably operate at a speed of 32 billion floating point operations per second (32 GFLOPS), and integer units 414 preferably operate at a speed of 32 billion operations per second (32 GOPS).
 
Just imagine 'Visualcube', the PS3 equivalent of GSCube, with 16 or 64 sets of PS3 processors (Broadband Engines & Visualizers) with near-1TB
memory. :) :oops: :oops:
 
well I could see render-farms made with this kinda of architecture...

The way it works Sony might RENT huge renderfarms of CELL based servers you would send your shaders and data into and get back the rendered result...
 
if your workstation was CELL based then you could have the modelling/rendering package that has the IP address of the Sony's or whoever's big CELL Renderfarm when you press the "Render button"
 
a quick break from reality, someone must have had a little fun dreaming this up:

Sony Playstation 3 Mezzanine (Latin for Steel/Metal) Processor Data: Microprocessor: IBM Grid 256-bit at 550 Gigahertz Primary Cache: 2-way set associative,32KB instruction/32KB data cache Secondary Cache: 8MB DDR,Full Speed SDRAM Main RAM: 1GB system SDRAM Texture Memory: 104Mb/VRAM Graphics SubSystem: Pipes: 16 Graphics pipelines, up to 8 channels per pipe Raster Managers: 4 Raster Managers per pipe - 864M pixels/sec fill rate - Pixel-accurate synchronization (Genlock) and swap sync - 8.3M pixel display and 8 display channels - Full Scene Anti Aliasing, 8 subsample/pixel, 6.1B samples/second - 48-bit RGBA Color - 256MB Texture Memory with texture lookup tables for interactive volume visualization - Hardware clipmapping and real time high resolution texture paging - Subsample round points IBM Grid 256-bit Processor - High clock rate accelerates every system function (550 Ghz) - Four-way superscalar architecture, dynamic out-of-order instruction issue, and speculative execution maximize utilization of processing - Large unblocking cache keeps essential data in fast memory Playstation 3(Grid/EE4/GS4) Graphics Architecture Each PS3 graphics subsystem,or pipe, is composed of a Geometry Engine, four Raster Managers, and a display generator. - Geometry Engine: Four high performance Geometry Engine processors perform lighting calculations and geometric transformations such as translations, rotation, and scaling. Geometry Engine processor also execute image-processing functions such as convolution and histogram equalization-a more effective approach then that of CPUs. - Raster Managers: Raster Managers scan-convert data from Geometry Engine processors into digital images. Raster managers perform pixel operations, including Z buffer testing, color and transparency blending, and texture mapping- and they do so with multisample anti-aliasing at real-time rates. - Display Generator: The Display Generator converts digital data from the Raster managers into analog or digital video signals for a maximum display of more then 8 million pixels per PS3 graphics pipeline or more then 130 million pixels per system. - 1.76TB per second Bus Bandwidth/ultra low latency memory (raw memory) I thought it was kinda odd they sent it here, but I figured it was a media move by the PR to tell me to keep my tongue in check with the other divisions, has to not leak any real definitive specs at the Expo next weekend. Sometimes being the president of Nfactor Studios is hard work, but I enjoy the company here so I'll let this snippet float in here till I get some more hard information for myself, hell next I'll get order claims for dev units soon for the PS3 Mezzanine. Well till then, have fun with the possibilties of this new info. I figured it would shed a focusable attension on this new technology.

bwhahahahaha! looks like he took some real aspects of SGI's Onyx/Infinite Reality series and merged it with all kinds of silly things :LOL: :LOL: :LOL: :LOL:
a real Sony fan :)
 
Megadrive:

That fake specs list is like TERRIBLY old. It's so old it's not even funny, it must be over a year old by now.

It was AGES since I saw it the first time, from where the heck did you drag up that old crap? Wherever it was, kindly toss it back, it's NOT news! :)

I very much doubt it was made by a sony fan, probably just some random internet troll or something who wanted to create a bit of spotlight for himself.


*G*
 
Back
Top