PlayStation III Architecture

megadrive0088 said:
1/4th of Sony's overall target of 1000x PS2, is what i ment, but also, it might already be closer than that, when we figure things like interger performance, effiency, realworld, sustained performance, like you said.

When will this madness end?

Go ask a PS2 developer on this board, or Ben (heh), how much of an improvement the Cell-based raster solution will have in preformance over the EE+GS executing a moderatly long fragment shader.

Some people will never be happy, I sware it. How many times did I, and others, say that bigger nomenclature alone is useless.

What your saying, that they will have only reached "1/4th what they wanted" is like saying nVidia has only progressed ~10X because the Nv30 is only 10X the sheer speed of the early Riva128s. Thats ludicris, look at the architecture and not just numbers.

PS. I hear the graphic solution MS is looking for is quite a beast itself - this is going to be interesting indeed.
 
Vince... when I was counting efficiency I was thinking about long shaders...

I was not thinking at the increase in Pixel Fill rate ( as I think 4-8 GPixels/s would do just fine )... and 1+ TFLOPS would mean 200x increase already and IMHO this is more than enough... I was thinkign at the added flexibility this console will provide...

with 4 custom PEs ( 4 APUs each and a Pixel engine and image cache ) the Visualizer would have quite a LOT of processing power and most of all it will be more than flexible enough to execute long and complex shaders which to be adapted with multi-pass techniques on PS2 clqass HW or even Xbox would be SIGNIFICANTLY slower... we agree on that...
 
200-300x the FP rating of PS2 might mean 1,000x the power of PS2... sure its specs are not 1,000x the ones of PS2 but the final sustained performance while runnign complex vertex and pixel shaders will be of that magnitude ( or frigthenly close )...


What have you heard about MS spill the beans ?
 
:LOL: It is early 2003 and Sony is simply saying to MS,

"hey Bill! take a look at our core PS3 structure, see, this is roughly how it will work, how much processing power it will have, how we are going to use it over the network, how we will have it in the home entertainment setup.

You have like around 2 years to make yourself a better system! *Niak Niak Niak!* We are so smarto!"

:LOL: :LOL: :LOL:
 
chap said:
:LOL: It is early 2003 and Sony is simply saying to MS,

"hey Bill! take a look at our core PS3 structure, see, this is roughly how it will work, how much processing power it will have, how we are going to use it over the network, how we will have it in the home entertainment setup.

You have like around 2 years to make yourself a better system! *Niak Niak Niak!* We are so smarto!"

:LOL: :LOL: :LOL:

OMG!! Yuo is teh genious!! And Nintendo and Sony don't know that the XBox 2/Next will have a Prescott based CPU and Nv4x based GPU with the fastest DDRII based RAM in an UMA configuration!! OMG!! We're not just smart, but genious!!!

< add 53 smiley icons here >

Like their are any secrets in this industry when it comes to projected specs.
 
And Nintendo and Sony don't know that the XBox 2/Next will have a Prescott based CPU and Nv4x based GPU with the fastest DDRII based RAM in an UMA configuration!! OMG!! We're not just smart, but genious!!! < add 53 smiley icons here >

Like their are any secrets in this industry when it comes to projected specs.

How can you be sure of that? :oops: :?: :?:
After what Sony did with the EE+GS, i am less than confident that they will match up with Nvidia again. It might have raw brute power but will Sony deliver the next generation image quality?
 
Hey, people say competition is good, afterall. Maybe M$ will endeavor to do something unimaginable hardwarewise. Wouldn't that be great? ...Or they may just do an Xbox1.1 and insist that it is worlds better. ...Or they may just slap together whatever PC components are common in that day and call it the Xbox2.

The fact still remains that Sony has planned some attractive-sounding hardware for the future. We will all be the better off for it if it really is true, whether or not M$ decides to respond in kind. Do you get it now? Sony is doing this to explore its envelope, not to just outdo whatever M$ is doing. M$ is not part of the picture here, so your efforts to keep inserting it here in this topic are pretty stupid. Start your own Xbox2 topic, if you are that excited about what M$ is brewing. Seriously. Read the title of this topic, and then ponder what M$ has anything to do with it.
 
I still can't help but think Sony is doing it because they know they can get insane theoretical numbers out of a Cell type design which gets the hype machine rolling, and once it starts rolling it picks up momentum and dominates...see: PS2...
 
Glonk said:
I still can't help but think Sony is doing it because they know they can get insane theoretical numbers out of a Cell type design which gets the hype machine rolling, and once it starts rolling it picks up momentum and dominates...see: PS2...

My thoughts exactly. BAM! Outta nowhere came Xbox and HAM! the PS2 hype. :D

Marc, why are you asking me to leave? I thought you and i have a good relation? :cry:
 
200-300x the FP rating of PS2 might mean 1,000x the power of PS2... sure its specs are not 1,000x the ones of PS2 but the final sustained performance while runnign complex vertex and pixel shaders will be of that magnitude ( or frigthenly close )...

Now this is something that I can understand. If PS3 simply has 200-300 times the raw FP performance of PS2, the PS3 could still be 1000x the PS2's power when it comes to actual performance/actual gaming. for many reasons including: sustained performance (including because of memory) effeciency (again, including memory) the ability of developers to use more of PS3, squeezing more out of it than they could with PS2, rendering methods (the complex vertex & pixel shaders you mentioned) and other things like texture & geometry compressions as well as a whole host of things that have not been thought of, or are unknown. :)
 
APU 402 includes local memory 406, registers 410, four floating point units 412 and four integer units 414. Again, however, depending upon the processing power required, a greater or lesser number of floating points units 512 and integer units 414 can be employed. In a preferred embodiment, local memory 406 contains 128 kilobytes of storage, and the capacity of registers 410 is 128.times.128 bits. Floating point units 412 preferably operate at a speed of 32 billion floating point operations per second (32 GFLOPS), and integer units 414 preferably operate at a speed of 32 billion operations per second (32 GOPS).

oh my! This is something I had not previously understood--that each APU has not only floating point units (four) but four interger units as well.
In my mind, the way I thought of it was, the EE3/Broadband Engine got its interger performance from the 4 PPC CPU cores, while the 32 APUs provided the FP power. not the case. so it seems that normal APUs, or the ones in the BroadBand Engine anyway, are balanced for FP and Integer operations. And it's likely that the APUs in the Visualiser will be more geared toward FP. naturally, because it'll be doing calculations for polys/verts, lighting and FP pixel operations.

So perhaps each APU in the Visualiser will have maybe 6 FP units and 2 integer units, or 8 FP units or some other combination. btw, I notice that there are only 4 APUs per pixel pipeline/Pixel Engine. where as in the Broadband Engine/EE3, there are 8 APUs per Processing Element. well it only makes sense as there is only so much room in the Visualizer, and there is the Image Cache to concider as well.

Btw, an observation (and a completely obvious one) the Broadband Engine doesn't seem to have to pass geometry & lighting data to the Visualiser like the EE had to do for the GS, since (obviously) the Vis. has its own processors for T&L or vertex/lighting operations. this should save a huge amout of bandwidth. also textures/geometry data/graphic data could go directly from main system memory, to the Visualiser's eDRAM, be processed, then move to the image cache and on out to the screen. without having to be hamstrug by a bottleneck like EE<=>GS

IIRC, GS didnt have access to main memory. it had to go through the EE.
In GameCube, Flipper has its own bus to the 24 MB of main 1T-SRAM, and all i am saying is, the Visualiser seems to also have its own bus to main external memory (not talking about local eDRAM or Image Cache which are two seperate things) so even if I incorrectly discribed how data will move throughout PS3, at least it seems Visualiser will not have the bottlenecks of GS.

feel free to chop my statements to bits since i almost certainly am not understanding things even close to decently :)
 
Vince said:
chap said:
:LOL: It is early 2003 and Sony is simply saying to MS,

"hey Bill! take a look at our core PS3 structure, see, this is roughly how it will work, how much processing power it will have, how we are going to use it over the network, how we will have it in the home entertainment setup.

You have like around 2 years to make yourself a better system! *Niak Niak Niak!* We are so smarto!"

:LOL: :LOL: :LOL:

OMG!! Yuo is teh genious!! And Nintendo and Sony don't know that the XBox 2/Next will have a Prescott based CPU and Nv4x based GPU with the fastest DDRII based RAM in an UMA configuration!! OMG!! We're not just smart, but genious!!!

< add 53 smiley icons here >

Like their are any secrets in this industry when it comes to projected specs.


I agree, everyone (Microsoft, Nintendo, Sony, and their technology partners) has a general idea of what the others are shooting for in terms of specs. I think odds favor Nv making making the GPU once again along with the media chip (dsp audio and networking). I don't see how in any way knowing what Sony is building effects Nv. As far as the CPU goes I'm not so sure what MS will choose. And if Rambus has the best technology with Yellowstone, there is nothing to stop MS from getting a hardware license from Rambus. AFAIK Sony didn't sign some exclusive deal. Finding a partner to buy Rambus memory at a good price might stop this from happening. I'm pretty sure Crucial (Micron) is cold on Rambus technology.

I'm curious to know how the 3dfx culture might influence design proposals for the X-Box 2. They seemed to love the multi-chip approach. Maybe one proposal will be something like one SAGE chip and one Rasterizer that uses Gigapixel technology. To me much of the real battle is in fabrication of the chips and this is where Sony seems to have its ducks lined up.
 
Panajev2001a said:
how come do we see 4 separate CRTC ? Does it work like the GScube merging the output of each big pixel pipeline ? ( we see only one Pixel Engine per pipeline I think that we could expect 4 customized PEs [with Pixel Engine] in the rasterizer ASIC... if it went to 2 GHz that would give us
8 GPixels/s of course the Shader performance would be quite fast... over 4 GPixels/s shader performance becomes the limiting factor compared to screen filling speed IMO )...

My guess is that the pixel engine really only deals with texture look ups. The APUs then function as shaders.

The biggest problem with CELL is that it is built around a packetized streaming data paradigm. This is fine for rasterization until you start using different kind of advanced texture mapping, where you get random access patterns.

I could imagine this is what the CRTC, image cache and the pixel engine does. The CRTC would control arbitration to the central storage (which has strict locking), the texture data would then end up in the image cache. The pixel engine is then the unit that reads textures out of this cache (while applying bilinear, trilinear, anisotropic etc filtering), these data would eventually be worked upon in the APUs vector registers (128x128bit).

Or it could of course be a full blown rasterizer. But that would make Sony's approach much more conventional and ruing this good discussion. You also don't need 4 APUs to feed a pixel engine then.

Cheers
Gubbi
 
Random thoughts:

Doing some simple math reveals that in this version of the Broadband Engine alone, There are 256 units within the 32 APUs (128 interger, 128 FP) - and its seems, a further 128 units (of unknown combination) in the 16 APUs of the Visualizer, assuming there are in fact half as many APUs compared to the BE.

Panajev – IIRC, in another forum, awhile back (months ago) I think you mentioned The term Thread Unit (TU/TUs) --that there would be 8 TUs with every CPU core and maybe 16-32 cores PER Cell processor in one possible version of Cell. Ok, can I assume the thread units ARE the APUs? Or are the TUs the FP & Integer units within the APUs?

I remember the number 8192, it sticks out in my mind. I think I remember you saying that there might be as many as 8192 thread units in a single device (maybe even a console) back when you speculated on there being many many cell processors in a single system, perhaps several cell processors making up a Cell Chip, and a number of Cell Chips in a system. ...you guys remember the talk of DIMMs of Cells like memory?...

IIRC, you said a system could have maybe a 32 cell processors , which would contain upto a thousand simple ARM or PPC cpu cores (1024) or at least 512 if each cell had only 16 cores, and thousands of thread units. (4096-8192) What was that for again—a server, workstation, consumer PC, or even a possible iteration of PS3?
Now I realize that sounds absolutely LUDICRIS to many here, but I’m positive you mentioned this in another forum, perhaps here as well (haven’t searched through all the threads at Beyond3D) - ( heh I can see Panajev now, looking around going, who, me??? )

Of course that was when Cell Chips/Processors were going to clock at between 500 Mhz to 1 GHz. Now we are talking 4 GHz, so the amount of processors, cores, thread units are cut back significantly, but still, that was a totally HUGE number of processors, and sub-processors.

that said, with the actual patent and diagram, assuming it is for PS3, I am still amazed at the number of processors that are actually present, dispite the fantasy numbers in the speculation above. kinda reminds me of 3DLabs P10! :)
 
My guess is that the pixel engine really only deals with texture look ups. The APUs then function as shaders.

That the APUs are working as Pixel Shaders ( commanded by the 4 RISC PPC-like PUs ) is something I think we can agree on :) And what kind of nice Pixel Shaders I must say :D

The biggest problem with CELL is that it is built around a packetized streaming data paradigm. This is fine for rasterization until you start using different kind of advanced texture mapping, where you get random access patterns.

True that CELL was built from moving huge streams of data in and out the processor very quickly, but that doesn't mean that no emphasys is put on caching or local buffering to take pressure away from main memory thanks to the e-DRAM and then take also pressure away from the shared e-DRAM thanks to the local Image cache the SRAM based cache ( ~L1... ) and the HUGE amount of registers... each APU has one hundred twenty-eight 128 bits registers... this visualizer has 16 of them...

16 * 128 * 16 bytes = 32 KB of storage in REGISTERS alone... quite frightening in a way ;)

The SRAM cache is present for each APU IIRC... I know with very random access patterns the cache gets trashed quite a lot, but we can try to prefetch quite a bit of data in advance and keep it in local buffers... I think we will see more of procedural texturing to reduce the impact of texture fetches from external memory o e-DRAM... as we only load Texture programs

The Visualizer would probably have around 32 MB of e-DRAM and we could program a texture cache there ( the Image cache could be for textures, but I cannot commit to it yet, it seems it is used for something more, but I am not sure of what... )

I could imagine this is what the CRTC, image cache and the pixel engine does. The CRTC would control arbitration to the central storage (which has strict locking), the texture data would then end up in the image cache. The pixel engine is then the unit that reads textures out of this cache (while applying bilinear, trilinear, anisotropic etc filtering), these data would eventually be worked upon in the APUs vector registers (128x128bit).

This again makes sense, the Pixel Engine would not be a Pixel Shader... it does not need to be we already have 4 very powerful APUs directed by the PU ( in each pixel pipeline )

Or it could of course be a full blown rasterizer. But that would make Sony's approach much more conventional and ruing this good discussion. You also don't need 4 APUs to feed a pixel engine then.

Agreed :) I hope developers like Fafalada will like thsi Visualizer... they wanted 100% full programmability and it seem they got it...

Will Carmack be happy now finally ? He seemed to like the P10 ;)

I wonder what kind of occlusion culling this Visualizer has... With that kind of flexibility and power we could do deferred T&L using HOS ( sorting the control points and tesselating, transforming and lighting only the visible patches... ) that would solve the problem up front... that would waste a bit of bandwidth and processing power, but it's not like the Broadband Engine lacks either of them :) And this would save transistors dedicated to occlusion culling in the Visualizer...
 
*patiently waits for Panajev to get to his set of paragraphs for answers*
:D

*throws out another question* :)

what again are the constants of this architecture? I know you said the number of functional units (fp and integer) can change as needed, but are say, the number of APUs per PE always going to be 8? etc...
(not counting Visualizer which may or may not be a special case...or....not completely a Cell/PE)
 
Random thoughts:

Doing some simple math reveals that in this version of the Broadband Engine alone, There are 256 units within the 32 APUs (128 interger, 128 FP) - and a further 128 units in the 16 APUs of the Visualizer, assuming there are in fact half as many APUs compared to the BE.

Panajev – IIRC, in another forum, awhile back (months ago) I think you mentioned The term Thread Unit (TU/TUs) --that there would be 8 TUs with every CPU core and maybe 16-32 cores PER Cell processor in one possible version of Cell. Ok, can I assume the thread units ARE the APUs? Or are the TUs the FP & Integer units within the APUs?
No a thread unit is something that would be much simplier than a APU, if we could evolve the CELL architecture as it was in Blue Gene we could think at the Integer Unit as the TU... but In Blue Gene's CELL they didn't have a "host" PU to direct the Thread Units and the FPUs...

Also, they seem to have changed quite a bit of other things ( Sony, IBM and Toshiba didn't invest more than $400 Millions in R&D alone to simply take Blue Gene's architecture... )

I remember the number 8192, it sticks out in my mind. I think I remember you saying that there might be as many as 8192 thread units in a single device (maybe even a console) back when you speculated on there being many many cell processors in a single system, perhaps several cell processors making up a Cell Chip, and a number of Cell Chips in a system. ...you guys remember the talk of DIMMs of Cells like memory?...
yes and it seems I was wrong :(

IIRC, you said a system could have maybe a 32 cell processors , which would contain upto a thousand simple ARM or PPC cpu cores (1024) or at least 512 if each cell had only 16 cores, and thousands of thread units. (4096-8192) What was that for again—a server, workstation, consumer PC, or even a possible iteration of PS3?
Now I realize that sounds absolutely LUDICRIS to many here, but I’m positive you mentioned this in another forum, perhaps here as well (haven’t searched through all the threads at Beyond3D) - ( heh I can see Panajev now, looking around going, who, me??? )

twisting the knife around won't help the situation... :lol

yes I did mention that... my big bad mistake was thinking about taking Blue Gene's tech and trying to achieve 1 TFLOPS with that... And with 1 FPU per CELL and 8 TUs per CELL it was a bit unfeasible without increasing the number of chips used...

Whcih gets greatly reduced.... since my estimates, again, were thinking about 500 MHz or max 1 GHz... and only 1 FP Unit per CELL...

1 GHz * 1 FPU / CELL * 2 FP ops/FPU * 32 CELLs/chip = 64 GFLOPS...

We would need then 16 chips...

32 CELLs in a chip, that would mean 256 TUs and 32 FPUs in a chip...

16 chips would mean: 512 FPUs and 4,096 TUs...

Well, let's think about 4 GHz now...

we would have needed 4 chips... this means 128 FPUs and 1,024 TUs

Then we count we only have 4 TUs ( Integer units in the APU ) and we have 4 FPUs per CELL... and the numbers go nearer the ones we have now...

I was uneasy to imagine them changing the structure of the CELL this way, but if you look at the patent they took good care of compatibility across all CELLS letting you variate the number execution units and keeping the ISA constant...

Of course that was when Cell Chips/Processors were going to clock at between 500 Mhz to 1 GHz. Now we are talking 4 GHz, so the amount of processors, cores, thread units are cut back significantly, but still, that was a totally HUGE number of processors, and sub-processors.

Well the clock frequency is now 4-8x than the clock speed I was thinking about and we have 4x the FP units in each APU...



that said, with the actual patent and diagram, assuming it is for PS3, I am still amazed at the number of processors that are actually present, dispite the fantasy numbers in the speculation above. kinda reminds me of 3DLabs P10! :)
it does remind me of the P10 a bit too ;)
 
Back
Top