PlayStation III Architecture

Of course the real work is going to be in the tool chain icon_wink.gif

Turning a sequential piece of C code into packets that can be effectively executed on these APUs is going to be a challenge for the compiler writers.

Gubbi, if Intel made x86 a performance king and of Itanium 2 a nice beast in Integer and especially FP processing, maybe Sony + IBM + Toshiba can do something nice with CELL too ;)
 
I am reading up on PVM since you mentioned it Maf... in this case part will be handled in software and part in HW... this is something thoguht when designing the architecture, not an afterthought...
 
Panajev2001a said:
archie, they're tiff... install a quicktime plugin or alternatiff a tif file viewer...

Yeah I realized that after I switched to using my Mac and realized my QT on my PC had Tiff viewing turned off... :oops:
 
That changes very little about the concept, not to say nothing.

Personally I think low level virtual machines are the wrong approach to computing in heterogenous network environments though, wether implemented in soft or hardware. Inside a single computer with a very limited set of architectures hiding below the single instruction set it is fine though, since the scheduler can be designed with intimate knowledge of all the different configurations it can direct instructions to.
 
I do not see it as totally Virtual Machine based thinking about it...

the ISA is the same as it is between a Pentium III and a Pentium IV for the most part... we could build a Pentium 4 with only one fast ALU, that would not make it not execute x86 code anymore or Pentium 4 optimized code... this is how I see it regarding the difference in execution units from different implementation of CELL for PDAs, consoles, TVs. etc...

The other trick about the absolute timer is not something that we should weep about either...

The rest is intelligent packet routing done by the PUs ( RISC CPUs ) and other embedded routing logic and packets passing... it is not like a VM... we're not interpreting another language on the fly... It might seem stupid, but I fail to see how this approach resemble using Virtual Machines and why it is bad in your opinion...
 
Panajev2001a said:
the ISA is the same as it is between a Pentium III and a Pentium IV for the most part...

High end x86 processors are a good example, the x86 instruction set presents nothing more than a clunky virtualization of their architecture.

we could build a Pentium 4 with only one fast ALU, that would not make it not execute x86 code anymore or Pentium 4 optimized code...

It would make the code no longer Pentium 4 optimized, at least not for that specific Pentium 4 ... that is the problem with low level VMs. There are always assumptions about the underlying architecture made during compilation, especially with parallelizing compilers, upset those assumptions too much and you are better off recompiling the code.

Of course that would not so much be a problem for the PS3 if it used this approach. If it had seperate configurations for the APUs at all it would only be a very limited set of them. The compiler, programmer and scheduler could take that into account.
 
the thing is that there would be limited code that is running all over different CELL based machines in which the configuration of PEs and APUs is different... each machine would run code which is optimized for it for the most part...




btw... look at this...


ps3.jpg


how come do we see 4 separate CRTC ? Does it work like the GScube merging the output of each big pixel pipeline ? ( we see only one Pixel Engine per pipeline I think that we could expect 4 customized PEs [with Pixel Engine] in the rasterizer ASIC... if it went to 2 GHz that would give us
8 GPixels/s of course the Shader performance would be quite fast... over 4 GPixels/s shader performance becomes the limiting factor compared to screen filling speed IMO )...
 
Glonk said:
Whoever said the Xbox could do "anything" obviously didn't know what they were talking about. :)

I guess that would include every young Xbox owner who hasn't the slightest grasp over hardware design or game development or thinks they know, but loves to gloat about their Xbox purchase (not saying anyone here, specifically). :) They are certainly quite vocal about it on virtually any other forum you could visit. Make one topic about something neat on the PS2/PS2 game, and invariably some Xbox guy has to "mark territory" and follow with, "Oh, Xbox can do that. Did I mention how great DC is?"
 
how come do we see 4 separate CRTC ?


/mode +facetious
Hehe, one for each image cache of course!
/mode -facetious

It's not too surprising if you're going to render directly off of internal buffers... Of course they could just be using CRTC as a placeholder label for a CRTC read/write circuit. Remember even the GS has 2 read and a write circuit...
 
Can you expand some more on your comment archie please ?

why would we need four separate CRTC if these are really full CRTC even if we were rendering from four image buffers ( directly as you said ) ?


This reminds me of the GScube a little where each GS had its own CRTC and then you would merge the output from each GS...

As far as performance is concerned... using e-DRAM I do not see the clock speed of this Visualizer getting near 4 GHz...

around 2 GHz that I might see it going... at that speed we would have 8 GPixels/s and thinking about the same APUs we use for the Broadband Engine... 4 FP Units would mean...

4 Units * 2 FP ops ( FMAC ) * 4 APUs * 2 GHz = 64 GFLOPS per Pixel Engine...

I do not see PS3's rasterizer having 1 single pixel pipeline...

according to that picture we have 4... thus we'd have 256 GFLOPS and 256 GOPS ( 4 Integer Unit as well in an APU ) for the rasterizer...

would this be enough for a 2005 GPU ( T&L not included )... hoew much would be enough ( 64 GFLOPS would be enough for the GS3 as far as pixel shader is concerned or not ? if so we might see the real PS3 having just one of this "special gfx" PEs instead of 4 like this picture leaving the latter for a gfx workstation, but who knows the GScube experiewnce might have given them ideas... )
 
Panajev2001a - correct me if this isnt true, but it seems that this version of the Broadband Engine/EE3, has 4 cores (with 8 vector units each) clocked at 4 GHz, as opposed to maybe what was thought to have 16 cores (thus 128 APUs) clocked at 1 Ghz, no? might save cost that way.

I'm afraid i havent read the whole thread so please overlook that if you already mentioned this or if someone else did.

Also, with regard to the Visualizer/GS3, could the 4 Pixel Engines really be clusters of more pixel pipelines? IIRC, GSCube 16 had 4 clusters of 4 GSs, and i guess the GSCube 64 had 4 clusters of 16 GSs or something like that. but with Visualizer, its all one one chip.

Also, could it be that the actual PS3 will have ARRAYs of these Broadband Engines and Visualizers? with each BE/VS getting scaled up or scaled down in number of processing units (all the various processing units) according to need.
 
btw, let us not forget how much pixel fillrate the GSCube has. ver 1 has
16 GSs * 16 pipelines * 150 Mhz = 38.400 Gpixels (half while texturing)

GSCube ver 2 has 64 GSs * 16 pipelines * 150 Mhz (i think its the same clockspeed) = 153.600 Gpixels (half while texturing)

of course there are probably HUGE inefficiences with that many chips, and even within the EE+GS itself.

PS3 isn't going to have anywhere near that filling speed, i dont think.
(unless by chance each PS3 used an array of many Visualizers and each Visualiser's 4 Pixel Engines had many smaller pipelines) which is all extremely doubtful, IMO. but PS3 should surpass the 4 billion vertices/sec of the GSCube 64, most likely :)
 
Panajev2001a - correct me if this isnt true, but it seems that this version of the Broadband Engine/EE3, has 4 cores (with 8 vector units each) clocked at 4 GHz, as opposed to maybe what was thought to have 16 cores (thus 128 APUs) clocked at 1 Ghz, no? might save cost that way.

I'm afraid i havent read the whole thread so please overlook that if you already mentioned this or if someone else did.

Also, with regard to the Visualizer/GS3, could the 4 Pixel Engines really be clusters of more pixel pipelines? IIRC, GSCube 16 had 4 clusters of 4 GSs, and i guess the GSCube 64 had 4 clusters of 16 GSs or something like that. but with Visualizer, its all one one chip.

If they can push the frequency up and still have good yelds please do so... they already have excellent parallel processing performance and increasing serial performance won't do bad plus you can save transistors and avoid making the chip even wider... since you have a fixed performance target... Each Pixel Engine is that a Pixel engine... I do not think it is clustered... each Pixel Engine has 1 RISC CPU and 8 APUs to process pixel programs for it and the 4 Pixel Engines would have quite a nice fill-rate thanks to clock-speed alone... one way it could be would have the GS 3 have again half the speed of the main chip the Broadband Engine, 2 GHz... this would yeld a very respectable 8 GPixels/s I'd be happy with 4 GPixels/s ( Visualizer clocked at 1 GHz ) supported by the MASSIVE bandwidth the DRAM of the Visualizer will have and all the Pixel Shading processing units avaialable for each Pixel Engine...


Also, could it be that the actual PS3 will have ARRAYs of these Broadband Engines and Visualizers? with each BE/VS getting scaled up or scaled down in number of processing units (all the various processing units) according to need.

arrays of BEs and Visualizer... I don;'t think we will see that... what can variate is the number of PEs and execution units in the APUs and the clock speed though...
 
1280x1024 * 60 fps * 4 ( 2x AA ) ~= 314 MPixels/s

With 4 GPixels/s we can fill each frame 12+ times

Shaders execution time WILL be THE limiting FACTOR and this is an area it seems they're working on quite nicely...

at 1 GHz this Visualizer with 4 special PEs would still reach a repectable 128 GFLOPS ( 256 GFLOPS if we up the speed to 2 GHz )...
 
ahh thanks for clearing that up about the number of pixel pipes.
1 pixel engine = 1 pipe, then.

so if Sony wanted to equal 1000x PS2 performance, they could
increase the number of Processing Elements, and/or number of
FPUs per APU, as you mentioned, I suppose
(i would hope this could apply to the Visualizer too!)

what are the chances that Visualizer could run at the same 4 Ghz
that the BE runs? not very high i suppose, so, if BE/EE3 is 4 Ghz and
Visualizer/GS3 is 2 GHz, and the number of units across the whole
configuration stays the same, we have roughly 1 TFLOP plus
256 GFLOPs.

EDIT: that's about 1/4th or 1/5th of Sony's claimed PS3 performance
(of 1000 times PS2) just in terms of floating point alone. of course, realworld, in-game, actual sustainable performance could already be
1000x that of PS2. And actually, it is not know what exactly Sony ment
by 1000x PS2's performance. it could be floating point, or FP and Integer
together, or in-game polygon count. or any number of things....

I say add more Processing Elements to BE and more Pixel Engines
to Visualizer! haha, that's just me wanting more. :)


but actually, I think this is IT. it will stay more or less the same.
we already have roughly a 200x jump beyond PS2. which is about
the same as PSX to PS2. Sony should back off the hype now, and
focus on ease of development and graphics features/quality (FSAA!)
for the Visualizer.
 
First, I am not excluding that the Pixel Engine might be a pakcage of pixel pipe-lines... the name just seem to imply otherwise though... a Pixel Engine writes 1 pixel I would think... and as my calculations showed 4 GPixels/s are quite enough ( prolly it should be around 8 GPixels/s with the Visualizer runnign at 2 GHz ) considering the shift of focus on Pixel programs ( and procedural textures ) and less on multi-texturing and multi-pass techniques...

if BE is 4 Ghz and VS is 2 GHz, and the number of units across the whole configuration stays the same, we have roughly 1 TFLOP plus 256 GFLOPs about 1/4th or 1/5th of PS2's performance in terms of floating point.

1/4th of PS2's performance ?


1.256 TFLOPS = > 202x PS2's FP performance... and you're not counting the Integer processing power...

Overall PS3's sustained performance should fly quite high... the 1,000x is not impossibly far... prolly not they will not reach 1,000x the max theoretical specs of PS2, but they should get closer when we talk about sustained in-game numbers...


I'd be happy if overall performance of PS3 was around 300-400x PS2's performance... that'd be neat :D

PS1 pushed at best 360k polygons per second... PS2 we should say peaks at around 36 usable MPolygons/s ( vertices )...

PSX to PS2 didn't bring more than a 100-200x increase afterall... reaching even 250-350x would be a success...
 
holy crap, i need to edit! - what i meant of course was, this current config with about 1 TFLOP plus 256 GFLOPs is about 1/4th or 1/5th of "1000x PS2" not actually less powerful than PS2, lol.....see what i mean? :eek:pps:


1/4th of PS2's performance ?

1/4th of Sony's overall target of 1000x PS2, is what i ment, but also, it might already be closer than that, when we figure things like interger performance, effiency, realworld, sustained performance, like you said.


1.256 TFLOPS = > 202x PS2's FP performance... and you're not counting the Integer processing power...

yes, of course! that is what i ment.

Overall PS3's sustained performance should fly quite high... the 1,000x is not impossibly far... prolly not they will not reach 1,000x the max theoretical specs of PS2, but they should get closer when we talk about sustained in-game numbers...

agreed!


I'd be happy if overall performance of PS3 was around 300-400x PS2's performance... that'd be neat

exactly what I was thinking. and it seems that with this configuration they are roughly 200-400x PS2 performance already.

PS1 pushed at best 360k polygons per second... PS2 we should say peaks at around 36 usable MPolygons/s ( vertices )...

yep. and in realworld texture mapped, g-shaded performance, PSX pushed about 180,000 polys/sec. PS2 somewhere under 20M polygons with texture mapping also (but lets not debate actual number as that has been done endlessly all over the internet)

PSX to PS2 didn't bring more than a 100-200x increase afterall... reaching even 250-350x would be a success...

again, i agree with you there. and this config of Broadband Engine/EE3 and Visualiser/GS3 is already in the area of being the same increase over PS2, as PS2 was over PSX.
 
PS3 with current specs jumps farther ahead of PS2 than 200x especially if you bring efficiency into the equation... think how much integer performance was lost due to small caches and no L2 and slow main memory accesses...

plus, next gen what is important is MOVING data around the system... it is said that processing speed of each unit of data will become less of an issue compared to moving data in and out of the system quickly...

The jump from PSX to PS2... since PSX didn't use FP Units comparing FLOPS will not be useful... let's comapre max number of polygons theoretically renderable by the GPU...

1.5 MVertices/s ( PSX ) vs 75 MVertices/s

less than 200x increase...

Pixel Fill rate... 66+ MPixels/s vs 2.4 GPixels/s

again less than 100x increase...



IMHO, a system which is >300x faster than its precedessor is something amazing... perhaps we do not think how much more power we are talking about... (edit: especially considering what the new architecture will allow, the increased programmability of the whole 3D pipeline... which could be nice indeed :D )


IMHO, PS3's leap from PS2 is a bit longer than the one from PSX to PS2... but what is more important is the FLEXIBILITY of T&L and Rendering pipelines... finally a FULLY programmable architecture for COMPLEX Vertex and Pixel shaders programs...
 
More power is nice and all, but i wonder how developers will have to cope with this maddness? :LOL:
 
They will have to learn how to use the High Level Shading Language Sony, IBM and Toshiba will provide plus how to use the tools they will be given...


Sony will try to force developers using middleware or their high-level libs in PS3's infancy...
 
Back
Top