Real PS3 Architecture

Panajev2001a · Feb 20, 2003

http://www.rambus.co.jp/forum/downloads/MA/3rambus_nnaono.pdf

Pag. 24

It seems that to achieve 3.2-6.4 GHz ODR they mean a clock of 400-800 MHz...

Again, I am all for learning here not to show off ( or try to, as I do not see myself SO knowledgable to be THAT confident )...

Panajev2001a · Feb 21, 2003

http://www.rambus.com/rdf/rdf2002/pdf/rdf_consumer_track.pdf

Well,

page 23 of this pdf made me understand perhaps what you mean...

I see each connection having two data-busses ( I presume one is for writes and one is for reads ) and let me 0WN myself by noting that yes, there is no embedded clock going along with the data on the bus ( I was wrong on that )... we have the system clock arriving to the DRAM chips ( and the PLL that receives it and multiplies it ) and to the Memory Controller/ASIC ( and the PLL that will mutliplies the "clock" on that side too )...

Quite clever... IMHO

MfA · Feb 21, 2003

Ive never seen Rambus justify the duel databus aspect of the design, but yes I expect it is to cut down time lost due to bus turnaround by (sometimes) using parallel reads and writes (both are bidirectional, so in principle it should be able to use both for read or write when the traffic is mostly unidirectional).

Yes, the clock is multiplied by 8 ... and used as is for the signalling on the adress bus (which is not multiplexed with the data). When I said that Rambus is talking about rate I meant the range they expect Yellowstone performance to be in (up to 6.4 GHz) nothing more. Clock distribution does not interest me that much, wether a reference clock is doubled/quadrupled/whatever is a rather uninteresting part of the design to me. With DDR SDRAM the double clocking has a larger significance because it was a way of dealing with SDRAM legacy ... but with octal datarates it is just getting silly, making a big deal out of it is just PR.

Panajev2001a · Feb 21, 2003

I wanted to post this first ( regarding possible e-DRAM set-up for the Broadband Engine and the discussion [polite

] I had with vers and the realtions between DMACs and Memory Block Controllers and Memory blocks )...

[edit: I resized the pictures and compressed them again to reduce the size...]

Maf,

regarding Yellowstone I found this picture in the last PDF I posted the link for...

I understand now what you mean... each DRAM has two busses connecting it to the controller... each is a 8 bits bus ( it seems )...

That picture seems to suggest that each 8 bits bus is bi-directional ( so we could use either to write or read from the DRAM )...

So you need two DRAM macros per "channel" to have 32 bits per channel...

Using 2 channels we have our 64 bits configuration... Using two more DRAM macros per channel or adding other two channels ( 2x4 or 4x2 configurations ) should give us the 128 bits configuration...

IMHO the 64 bits configuration, which would provide a maximum of 25-50 GB/s of toal bandwidth, would be feasible for a 2005 PS3 and I do not see the need for 100 GB/s main memory as we have e-DRAM for a reason, to spend less resources increasing off-chip memory bandwidth ( ok, one of the several reasons we have e-DRAM

)...

25 GB/s would be with 3.2 GHz signaling and a 64 bits controller ( still acceptable... much much faster than today's fastest RDRAM... )

50 GB/s would require either a 128 bits configuration and 3.2 GHz signaling or a 64 bits configuration like the one presented in that picture and 6.4 GHz signaling...

And this picture is related to what I was saying about the quite nice ( IMHO ) clocking of Yellowstone devices ( no embedded clock in the transmission )...

V3 · Feb 21, 2003

I think External memory would be still RAM and not the optical disc...

That's possible. But note that external memory is link to the I/O chip not BE directly.

They didn't license Yellowstone just to increase their IP portfolio

I didn't know Sony licensed it already, but yeah that would be true, the only product they need this kind of tech is PS3.

Keeping everyhting in the e-DRAM could be feasible with sub 45 nm tech as I do not see 256-512 MB of e-DRAM EVER fitting with 65 nm technology... 64 is a bit of a stretch and using 4 BEs would make the PCB board too complex...

Using 65nm tech they can put 256MB worth of eDRAM without much problem. Even 512MB is possible.

Still they might use Yellowstone for the e-DRAM and Redstone for chip-to-chip interconnect...

That's possible as well.

I found this in the patent which actually supports your theory...

Which bit support which bit ?

The only reason I don't like external memory connected directly to BE, is that it wouldn't be too useful. Unless they change their whole memory controller to be able to create "sandbox" in that external memory as well.

Panajev2001a · Feb 21, 2003

The only reason I don't like external memory connected directly to BE, is that it wouldn't be too useful. Unless they change their whole memory controller to be able to create "sandbox" in that external memory as well.

The External Memory pool would not be connected to the BE, but to the I/O ASIC... of course that memory will still store data for both BE and Visualizer ( and for the I/O too... I suspect the EE ( in the I/O ASIC ) will be used for more than I/O... a 6.2 GFLOPS, 600 MIPS CPU... naah what am I thinking ?

How does doubling up as a Sound DSP ( also doing some DD 5.1/6.1 decoding/encoding ) sound ?

How about dealing with all the transfers from I/O devices ( we still have that 10 channels DMAC

) and implementing software sandboxes ?

This needs to be given some thought...

We want the Broadband Engine not to have to worry at all about the External Memory, we want the I/O ASIC to worry about it...

Sort of like Virtual Memory and the main RAM that uses basically software sandboxes ( protected memory for processes )... in our case External Memory could be what in the Virtual Memory analogy was the HDD ( this could actually proove the point I believe you made about the HDD, the External memory could be the HDD [which should not prevent from having Blu-Ray... the HDD will eventually fill and you cannot change it or store new data without erasing something else... You can buy as many Blu-Ray discs as you want though] )...

No, the I/O ASIC would not remain memory-less as for PS2 backward compatibility reasons we would probably end up adding the 32 MB or Direct RDRAM...

Backward Compatibility is an issue we have to wonder about when thinking about PS3...

Is it feasible to have both PS2 Direct RDRAM and Yellowstone RDRAM ?
Is it feasible to only have Yellowstone and have a secondary setting for the base clock ( that gets to be basically multiplied by 8, ODR ) reaching 100 MHz in PS2 backward compatibility mode ( PSX's CPU did have two speed settings) and using only one channel ( of the 64 bits configuration ) achieving 800 MHz signaling ( ODR ) * 4 bytes = 3.2 GB/s ?

That would mean we are matching EE's RDRAM bandwidth and meeting practically the same latency... RDRAM used 800 MHz signaling on PS2 and two 16 bits channels in parallel...

Regarding the HDD and not using external RAM ( external memory ) I think we can say that the HDD would not be THAT much faster than the Blu-Ray disc and that would create a HUGE performance drop if we have to pull data out of the e-DRAM... unless we have enough e-DRAM on chip...

I do not see a 3.6-4 GHz CPU with more than 64 MB of e-DRAM though... 256 MB is really pushing it too far...

Code:

                                                   1999  2002  2005  2008  2011 
-------------------------------------------------------------------------------
Technology                       nm                 180   130   100   70    50 
Gate length                      nm                 140  85-90   65   45  30-32 

Density      DRAM                Gb/cm[sup]2[/sup] 0.27  0.71  1.63  4.03  9.94
             SRAM                   Million          35    95   234  577  1423
             High-performance logic transistors      24    65   142  350   863 
             ASIC logic             per              20    54   133  328   811 
             High-volume logic      cm[sup]2[/sup]    7    18    41  100   247 

Local clock  High-performance       GHz            1.25   2.1   3.5  6.0  10.0
frequency    ASIC                                  0.5    0.7   0.9  1.2   1.5 
             High-volume                           0.6    0.8   1.1  1.4   1.8

This comes from a early-mid 2002 paper published by IBM... as you can see from recent events, it seems that we will see 65 nm by 2005 as 90 nm is a reality NOW ( for Intel it is, Prescott is shipping relaitvely soon )...

IBM previewed for 70 nm to be able to have 4 Gbits of e-DRAM per cm^2...

4 / 8 = 1/2 = 512 MB... still that is being a bit too generous as they also preview 6 GHz for such a large scale CPU ( ok, the e-DRAM would not be clocked that high, still... ) and still it would take 1 squared centimeter which is not a small amount...

Going back to reality, 256 MB of e-DRAM on the BE is IMHO too much: it would increase the size of the processor a lot and make it even more expensive to mass produce...

Be happy with 64 MB, the Visualizer could be packing 32-64 MB of e-DRAM too as it would need its own e-DRAM to sustain its fill-rate, store the frame-buffers and textures.

I think we can afford to see 64 MB of e-DRAM in the BE and 32-64 MB of e-DRAM ( prolly 32 MB... the Visualizer should be able to handle its won with good 3D and 2D texture compression )... 96 MB of e-DRAM with BE and Visualizer combined is not bad at all ( 64 + 64 = 128 MB of e-DRAM would be even better, but it might be too expensive )...

Btw, here is the RAMBUS Public Release that announces the license given to Sony and Toshiba for Redwood and Yellowstone...

Rambus Signs Technology License Agreements With Sony, Sony Computer Entertainment and Toshiba

Two new interfaces selected for logic-to-memory and logic-to-logic connectivity

Rambus Inc. (Nasdaq:RMBS), the leader in ultra high-speed interface technology, today announced new agreements with Sony Corporation (Sony), Sony Computer Entertainment Inc. (SCEI) and Toshiba Corporation (Toshiba) for the license and utilization of two new high-speed interfaces, codenamed "Yellowstone" and "Redwood." Offering unparalleled competitive advantages, these two interfaces are expected to be utilized for future broadband applications with "Cell."

The impact on Rambus' financials will be discussed during a conference call and webcast on January 6, 2003 at 1:30 p.m. Pacific Standard Time. The specific terms of the agreements are confidential.

Currently at 3.2GHz data rates, with a roadmap to higher performance, "Yellowstone" is much faster than the best available DDR memories. "Yellowstone" offers high performance in memory signaling while optimizing system cost through pin-count reduction and support for high volume PCBs and packages.

"Redwood," the ultra high-speed parallel interface between multiple chips, delivers a data rate about ten times faster than the latest processor busses. It maintains lower latency and lower power consumption than current solutions, while keeping high productivity and cost efficiency.

"The use of Direct Rambus technology in PlayStationÂ®2 was essential for its performance," said Ken Kutaragi, president and chief executive officer of Sony Computer Entertainment Inc. "Rambus is and will be the key player in the ultra high-speed interface technology. This enables us to create a wide range of applications and platforms from high-end systems to digital consumer electronics products within Sony Group."

"We recognize Rambus as the premier provider of high speed interface technology. We have already decided to integrate Rambus' interface technology into our next-generation high-value added DRAM, and we have now extended our partnership to the logic interface. These technologies will support us in delivering effective solutions to next-generation systems that require high-speed processing of large graphics and audio data," said Takeshi Nakagawa, corporate senior vice president of Toshiba Corporation and president and chief executive officer of Toshiba Corporation Semiconductor Company.

"We have had long and mutually beneficial relationships with Sony, Sony Computer Entertainment and Toshiba," said Geoff Tate, chief executive officer at Rambus. "Rambus' objective is to produce innovative solutions that will benefit our semiconductor and system partners. We are pleased that our ultra high-speed logic-to-memory and logic-to-logic solutions are key technologies to produce a wide range of future systems."
About Rambus Inc.

Rambus is a leading provider of chip-to-chip interface products and services. The company's breakthrough technology and engineering expertise have helped leading chip and system companies to solve their challenging interface problems and bring industry-leading products to market. Rambus interface solutions can be found in hundreds of computing, consumer electronic and networking products. Additional information is available at www.rambus.com.

This press release contains forward-looking statements. These statements are based on current expectations, estimates and projections about the Company's industry, management's beliefs, and certain assumptions made by the Company's management. You can identify these and other forward-looking statements by the use of words such as "may," "will," "should," "expects," "plans," "anticipates," "believes," "estimates," "predicts," "intends," "potential," "continue" or the negative of such terms, or other comparable terminology. Forward-looking statements also include the assumptions underlying or relating to the foregoing statements. Actual results could differ materially from those anticipated in these forward-looking statements as a result of various factors, including those identified in the Company's recent filings with the Securities and Exchange Commission, including its recently filed Form 10-Q, and also including the uncertainty of new technologies; and the uncertainty regarding the technical and market demands for such technologies. All forward-looking statements included in this press release are based on information available to Rambus on the date hereof. Rambus assumes no obligation to update any forward-looking statements.

Rambus is a registered trademark of Rambus Inc. PlayStation is a registered trademark of Sony Computer Entertainment Inc. Other trademarks that may be mentioned in this release are the intellectual property of their respective owners.

Contacts:
Linda Ashmore
Rambus Public Relations
650-947-5411
pr@rambus.com

Heather Carlson
The Hoffman Agency
408-975-3034
hcarlson@hoffman.com

Panajev2001a · Feb 21, 2003

I found this in the patent which actually supports your theory...

Quote:

[0064] PE 201 is closely associated with a dynamic random access memory (DRAM) 225 through a high bandwidth memory connection 227. DRAM 225 functions as the main memory for PE 201.

Still it says "for PE" and not for the system... I would expect some meory to be found in the I/O CPU at least to buffer from the optical disc ( it could be the 32 MB of Direct RAMBUS DRAM inherited from PS2 backward compatibility in the I/O ASIC )...

I was talking about External Memory maybe being an HDD... but as that bit supports your theory, other bits do not push me towards it...

... In the previous post ( my last one before this ) I ventured a bit into the land of guess the RAM configuration of PS3...

V3 · Feb 21, 2003

Backward Compatibility is an issue we have to wonder about when thinking about PS3...

Hmm, when you are going for outright performance, backward compatibility shouldn't be a priority.

How about dealing with all the transfers from I/O devices ( we still have that 10 channels DMAC ) and implementing software sandboxes ?

How would it work ? But that's an interesting idea.

IBM previewed for 70 nm to be able to have 4 Gbits of e-DRAM per cm^2...

4 / 8 = 1/2 = 512 MB... still that is being a bit too generous as they also preview 6 GHz for such a large scale CPU ( ok, the e-DRAM would not be clocked that high, still... ) and still it would take 1 squared centimeter which is not a small amount...

But its possible. If BE is the size of EE or GS on 0.25 process, I think they can afford to spend 0.5 cm2 on eDRAM.

Its better to put the money there instead of external memory. In the long run this will reduce in cost. External memory is less predictable.

This is of course if they are indeed going with 65nm process, it would not be possible on 90nm process. Or Sony is planning to include massive amount of external memory >1GB.

But like you said, its also possible that Yellowstone is the eDRAM. Afterall it is able to be clocked quite high.

( prolly 32 MB... the Visualizer should be able to handle its won with good 3D and 2D texture compression )

Do you think it will have 3D and 2D TC hardware support ? I think Sony might skip this again.

Thanks for the Rambus press release. Is Yellowstone tech capable of being embedded as well ?

I was talking about External Memory maybe being an HDD... but as that bit supports your theory, other bits do not push me towards it...

... In the previous post ( my last one before this ) I ventured a bit into the land of guess the RAM configuration of PS3...

Ohh, external memory could be anything from RAM to ROM. But in Vers diagram where he put the external RAM directly into BE, the patent doesn't suggest anything like that, but its a possibility.

But unless you can put sandbox into that external RAM or BE to at least address it, it doesn't really matter if that memory was a HDD or just Blue Ray.

At least with another BE, the extra memory can be used in more efficient manner than this external Yellowstone.

Panajev2001a · Feb 21, 2003

Part of the success of the platform is dependent on succesful backward compatibility: PS2 had quite excellent backward compatibility and many consumers ( I did ) appreciated to see this and will expect sony to deliver again with PS3... as Sony as mentioned, btw ( backward compatibility is one of their concerns )...

Let's take the AGP example... if we have an AGP 4x mother-board, we can from the BIOS change the AGP mode to fall back on AGP 1x and AGP 2x...

I was thinking about the PS3 BIOS being able to change Yellowstone base clock speed to 100 MHz: this would give us effective 800 MHz signaling, which is what we have on the current PS2 ( 400 MHz x 2 [DDR] )...

300 MHz for the EE embedded in the I/O ASIC and 800 MHz Yellowstone, when operating in PS2 backward compatibility mode: else the clock sent would be 400 MHz to achieve 3.2 GHz signaling ( the GFX is able to downclock itself when it doesn't detect the fan to be at full operation, we are not doping that [maybe they are thinking about that too... if the fan doesn't work and the temperature rises the PS3 could enter a low power state telling you the system is over-heating, so you could get it repaired instead of buying a new one] )...

If we had a 3.2 GHz signaling rate for Yellowstone we would have the memory being much faster than what the EE expects running PS2 games... we want to keep the effective latency close to what the EE in the PS2 used to expect...

And with two states: PS2 or PS3 mode ( PSX could be emulated all in software... yeah BE and Visualizer are powerful enough to do so

Sony has bought Connectix, they might as well use their technology )

The graphics enhancement would be on the GPU side... FSAA, anisotropic filtering, etc...

I think this would not cause as many problems as adding texture filtering to early PSX games does ( try it on Wipeout, alpha blended textures getr a bit messed up )... Sony did release some libraries for PSX that were intended to make the game work correctly even if texture smoothing and fast CD loading were enabled: they could be doing the same for PS2 titles...

The rasterization side can be done on the Visualizer ( like PSX rasterization was done on the GS )...

How about dealing with all the transfers from I/O devices ( we still have that 10 channels DMAC ) and implementing software sandboxes ?

Click to expand...

How would it work ? But that's an interesting idea.

It would work like protected memory, the EE would run a secure program ( authored by the Cell OS [it would be a Kernel space command only, the user programs would not have acces to this... similar procedure to the command given by a Cell OS program that can change the mask ID of an APU allowing it to access multiple sandboxes] ) that would regulate the access to the data present in the External Memory...

We could divide the external RAM in as many sandboxes as we have set-up on the e-DRAM memory bank controller for the Be and Visualizer and regulate the access to these bigger sandboxes in a very similar way as the way we do in HW in the memory bank controllers...

Let me take the patent for a moment...

[0113] The PU of a PE controls the sandboxes assigned to the APUs. Since the PU normally operates only trusted programs, such as an operating system, this scheme does not jeopardize security. In accordance with this scheme, the PU builds and maintains a key control table. This key control table is illustrated in FIG. 19. As shown in this figure, each entry in key control table 1902 contains an identification (ID) 1904 for an APU, an APU key 1906 for that APU and a key mask 1908.
[...]

When an APU requests the writing of data to, or the reading of data from, a particular storage location of the DRAM, the DMAC evaluates the APU key 1906 assigned to that APU in key control table 1902 against a memory access key associated with that storage location.

[0114] As shown in FIG. 20, a dedicated memory segment 2010 is assigned to each addressable storage location 2006 of a DRAM 2002. A memory access key 2012 for the storage location is stored in this dedicated memory segment. As discussed above, a further additional dedicated memory segment 2008, also associated with each addressable storage location 2006, stores synchronization information for writing data to, and reading data from, the storage-location.

[0115] In operation, an APU issues a DMA command to the DMAC. This command includes the address of a storage location 2006 of DRAM 2002. Before executing this command, the DMAC looks up the requesting APU's key 1906 in key control table 1902 using the APU's ID 1904. The DMAC then compares the APU key 1906 of the requesting APU to the memory access key 2012 stored in the dedicated memory segment 2010 associated with the storage location of the DRAM to which the APU seeks access. If the two keys do not match, the DMA command is not executed. On the other hand, if the two keys match, the DMA command proceeds and the requested memory access is executed.

This is how we make sure that an APU will not read into other memory sandboxes ( unless we play with the mask, but only a trusted program like the OS can initiate such a change in the mask )...

With External memory we can store in a certain small area of memory the information regarding the rest of the memory which gets divided in sandboxes ( which have all a fixed size that is larger than each sandbox in the e-DRAM ): we would keep in this section of the memory ( which only the OS could access ) the key control tables, the keys and other information ( start and end of each sandbox in the External memory )... we would mimic with the EE what the memory bank controllers do in HW on the BE and Visualizer... slower, but we are dealing with the 4th memory hieararchy level ( 1st == registers, 2nd == Local Storage and 3rd == e-DRAM )... and the speed will be MUCH faster than an HDD or Blu-Ray...

But its possible. If BE is the size of EE or GS on 0.25 process, I think they can afford to spend 0.5 cm2 on eDRAM.

Its better to put the money there instead of external memory. In the long run this will reduce in cost. External memory is less predictable.

BE and Visualizer contain a lot of transistors used for logic ( which we do not have exact numbers for, but it is not a small number [also add the space used by Local Storage which is SRAM and all the APU's registers] ), not only DRAM... and I do not want the logic portion of the chip to be crippled because of a bit of more e-DRAM...

Still we do not know the size of the BE... it might be even a bit bigger than the either GS or EE at 0.25 um, they might have wafers which would make impossible to include the logic and 256 MB of e-DRAM as you would have too few chips/wafer ( you have to think about that... you do not want shortages, you want to have smooth mass production )...

Who knows, maybe even with the new wafer they will use in 2005, having a chip of the same size or bigger than the GS in 0.25 um could be not optimal...

We have to think at ways top keep a high performance , but also high yelds... tough, but that is what should be at least aimed to...

Remember, at a later stage we could even embed the Yellowstone RDRAM into the I/O ASIC as they did for the PSX GPU ( went basically from off-chip VRAM to embedded VRAM ) as manufacturing processes will allow you to do so... maybe when they move to 45 nm... the day the reach 30 nm ( not near ) they could maybe try to put BE and Visualizer onto a single chip... like they are doing with 130 nm EEs and GSs

( prolly 32 MB... the Visualizer should be able to handle its won with good 3D and 2D texture compression )

Do you think it will have 3D and 2D TC hardware support ? I think Sony might skip this again.[/quote]

I think the Pixel Engine in the Visualizer PEs should be supporting TC ( both 2D and 3D/Volumetric )...

If they do not, well at 1-2 GHz I see 4 APUs per PE and 4 PEs...

4 APUs/PE * 4 PEs * 8 FP ops/APU ( 4 parallel FP MADD ) * 1-2 GHz = 128-256 GFLOPS

Plus I expect Dependent Texture reads to be supported as well as loopback... The Visualizer should be quite programmable ( the APU could also manage DOT3 products )...

Even if the Pixel Processor was relatively simple ( supporting thought advanced texture filtering [well tri-linear and anisotropic filtering] and deendent texture reads and single pass multi-texturing [we might move away from polygons and textures towards procedural textures, but we cannot FORCE it from the start, it has to be given as an option you can follow thanks to the power and flexibility of the system] ) we still have all the APUs and PUs to work with Fragment Shaders and Triangle Set-up ( so we can make it flexible and stop calling it triangle set-up but something-else-set-up

)...

The BE should worry about Physics, T&L and Vertex programs ( and could help the Visualizer with pixel programs too if there is the need, after all we use the same kind of APUs to do the calculations... ) while the Visualizer worries about Triangle set-up, Fragment programs, texturing ( that should be handled by the Pixel Engine and the Image Cache ) and particle effects...

The 3D pipeline on such an architecture has a kind of flexibility that is mindblowing ( this will be a coin with two faces, you HAVE to give developers HLSL and nice High Level libraries [at least at the beginnign force them to learn the PS3 Hw using them, like they did with PSX, leaving them free to explore as time progresses] else they will really go nuts ): BE and Visualizer can help each other with computing loads as the Cell architecture was designed around software Cells being able to also migrate from PE to PE to be executed... We could execute Fragment and Vertex Programs on either BE or Visualzier, we could have a PE in the Visualizer run some physics calculations if we wish to ( maybe to variate at the pixel level how the light affects the hit object )...

Thanks for the Rambus press release. Is Yellowstone tech capable of being embedded as well ?

I do not know as of yet, but I hope so as it could be a good idea to move PS3 manufacturing costs down...

V3 · Feb 22, 2003

Part of the success of the platform is dependent on succesful backward compatibility: PS2 had quite excellent backward compatibility and many consumers ( I did ) appreciated to see this and will expect sony to deliver again with PS3... as Sony as mentioned, btw ( backward compatibility is one of their concerns )...

They can go the software emulation route you know.

Let's take the AGP example... if we have an AGP 4x mother-board, we can from the BIOS change the AGP mode to fall back on AGP 1x and AGP 2x...

AGP is nearing its end.

I was thinking about the PS3 BIOS being able to change Yellowstone base clock speed to 100 MHz: this would give us effective 800 MHz signaling, which is what we have on the current PS2 ( 400 MHz x 2 [DDR] )...

300 MHz for the EE embedded in the I/O ASIC and 800 MHz Yellowstone, when operating in PS2 backward compatibility mode: else the clock sent would be 400 MHz to achieve 3.2 GHz signaling ( the GFX is able to downclock itself when it doesn't detect the fan to be at full operation, we are not doping that [maybe they are thinking about that too... if the fan doesn't work and the temperature rises the PS3 could enter a low power state telling you the system is over-heating, so you could get it repaired instead of buying a new one] )...

I am not sure what you said here, but it sounds interesting.

If we had a 3.2 GHz signaling rate for Yellowstone we would have the memory being much faster than what the EE expects running PS2 games... we want to keep the effective latency close to what the EE in the PS2 used to expect...

Yeah it would be out of synch.

And with two states: PS2 or PS3 mode ( PSX could be emulated all in software... yeah BE and Visualizer are powerful enough to do so Sony has bought Connectix, they might as well use their technology )

The graphics enhancement would be on the GPU side... FSAA, anisotropic filtering, etc...

PS2 can go the software route too.

It would work like protected memory, the EE would run a secure program ( authored by the Cell OS [it would be a Kernel space command only, the user programs would not have acces to this... similar procedure to the command given by a Cell OS program that can change the mask ID of an APU allowing it to access multiple sandboxes] ) that would regulate the access to the data present in the External Memory...

But it would be EE which access the data right ?

We could divide the external RAM in as many sandboxes as we have set-up on the e-DRAM memory bank controller for the Be and Visualizer and regulate the access to these bigger sandboxes in a very similar way as the way we do in HW in the memory bank controllers...

Click to expand...

But this way would be slow.

This is how we make sure that an APU will not read into other memory sandboxes ( unless we play with the mask, but only a trusted program like the OS can initiate such a change in the mask )...

With External memory we can store in a certain small area of memory the information regarding the rest of the memory which gets divided in sandboxes ( which have all a fixed size that is larger than each sandbox in the e-DRAM ): we would keep in this section of the memory ( which only the OS could access ) the key control tables, the keys and other information ( start and end of each sandbox in the External memory )... we would mimic with the EE what the memory bank controllers do in HW on the BE and Visualizer... slower, but we are dealing with the 4th memory hieararchy level ( 1st == registers, 2nd == Local Storage and 3rd == e-DRAM )... and the speed will be MUCH faster than an HDD or Blu-Ray...

Click to expand...

So you are making the sandboxes in eDRAM to be a subset of a larger sandboxes in the external memory ? The EE or cell OS would have a hard time of keeping things together. This is like bad cache system.

BE and Visualizer contain a lot of transistors used for logic ( which we do not have exact numbers for, but it is not a small number [also add the space used by Local Storage which is SRAM and all the APU's registers] ), not only DRAM... and I do not want the logic portion of the chip to be crippled because of a bit of more e-DRAM...

Click to expand...

I don't think the logic transistors would be that high on the BE or VS. Each of those APU probably has about the same logic as the two EE VU. Those thing aren't a Pentium4 core class that's for sure. The PU, could be totally new, or MIPS or PowerPC variants.

Using the chart you had,
64 MB of DRAM would take 0.125cm2
4 MB of SRAM would take 0.35cm2
communication would probably take from 0.5 to 1 cm2.
leave about 1 cm2 for the other logic and register.

I think they can put 256MB without too much worries.

Still we do not know the size of the BE... it might be even a bit bigger than the either GS or EE at 0.25 um, they might have wafers which would make impossible to include the logic and 256 MB of e-DRAM as you would have too few chips/wafer ( you have to think about that... you do not want shortages, you want to have smooth mass production )...

Who knows, maybe even with the new wafer they will use in 2005, having a chip of the same size or bigger than the GS in 0.25 um could be not optimal...

Click to expand...

Yeah depend on their fab I guess.

We have to think at ways top keep a high performance , but also high yelds... tough, but that is what should be at least aimed to...

Click to expand...

Well, since its modular, you can use some of the bad yields for something else. You know much like how ATI or NV turn off pixel pipe in their GPU and sold it off as something else.

Remember, at a later stage we could even embed the Yellowstone RDRAM into the I/O ASIC as they did for the PSX GPU ( went basically from off-chip VRAM to embedded VRAM ) as manufacturing processes will allow you to do so... maybe when they move to 45 nm... the day the reach 30 nm ( not near ) they could maybe try to put BE and Visualizer onto a single chip... like they are doing with 130 nm EEs and GSs

Click to expand...

Now is better than later.

I think the Pixel Engine in the Visualizer PEs should be supporting TC ( both 2D and 3D/Volumetric )...

If they do not, well at 1-2 GHz I see 4 APUs per PE and 4 PEs...

4 APUs/PE * 4 PEs * 8 FP ops/APU ( 4 parallel FP MADD ) * 1-2 GHz = 128-256 GFLOPS

Plus I expect Dependent Texture reads to be supported as well as loopback... The Visualizer should be quite programmable ( the APU could also manage DOT3 products )...

Even if the Pixel Processor was relatively simple ( supporting thought advanced texture filtering [well tri-linear and anisotropic filtering] and deendent texture reads and single pass multi-texturing [we might move away from polygons and textures towards procedural textures, but we cannot FORCE it from the start, it has to be given as an option you can follow thanks to the power and flexibility of the system] ) we still have all the APUs and PUs to work with Fragment Shaders and Triangle Set-up ( so we can make it flexible and stop calling it triangle set-up but something-else-set-up )...

The BE should worry about Physics, T&L and Vertex programs ( and could help the Visualizer with pixel programs too if there is the need, after all we use the same kind of APUs to do the calculations... ) while the Visualizer worries about Triangle set-up, Fragment programs, texturing ( that should be handled by the Pixel Engine and the Image Cache ) and particle effects...

Click to expand...

Sound good to me

The 3D pipeline on such an architecture has a kind of flexibility that is mindblowing ( this will be a coin with two faces, you HAVE to give developers HLSL and nice High Level libraries [at least at the beginnign force them to learn the PS3 Hw using them, like they did with PSX, leaving them free to explore as time progresses] else they will really go nuts ): BE and Visualizer can help each other with computing loads as the Cell architecture was designed around software Cells being able to also migrate from PE to PE to be executed... We could execute Fragment and Vertex Programs on either BE or Visualzier, we could have a PE in the Visualizer run some physics calculations if we wish to ( maybe to variate at the pixel level how the light affects the hit object )...

Click to expand...

Yeah should be interesting. 1.5 years to go.

Quote:
Thanks for the Rambus press release. Is Yellowstone tech capable of being embedded as well ?

I do not know as of yet, but I hope so as it could be a good idea to move PS3 manufacturing costs down...

Click to expand...

Panajev2001a · Feb 22, 2003

V3,

I was thinking about the PS3 BIOS being able to change Yellowstone base clock speed to 100 MHz: this would give us effective 800 MHz signaling, which is what we have on the current PS2 ( 400 MHz x 2 [DDR] )...

300 MHz for the EE embedded in the I/O ASIC and 800 MHz Yellowstone, when operating in PS2 backward compatibility mode: else the clock sent would be 400 MHz to achieve 3.2 GHz signaling ( the GFX is able to downclock itself when it doesn't detect the fan to be at full operation, we are not doping that [maybe they are thinking about that too... if the fan doesn't work and the temperature rises the PS3 could enter a low power state telling you the system is over-heating, so you could get it repaired instead of buying a new one] )...

Click to expand...

I am not sure what you said here, but it sounds interesting.

If we had a 3.2 GHz signaling rate for Yellowstone we would have the memory being much faster than what the EE expects running PS2 games... we want to keep the effective latency close to what the EE in the PS2 used to expect...

Click to expand...

Yeah it would be out of synch.

Which is exactly my point ( I was recalling Pentium 4 thermal throttling, GeForce FX and AGP to show examples of clock speed reduction: which is almost "simple", the though part is always increasing the clock speed [as you might run in thermal problems, clock skew problems, etc...] not lowering it from its normal/stable clock rate )...

We know how Yellowstone works ( a bit ): how the external clock is multiplied on the chip by a maximum factor of 4 and then read in DDR mode achieving ODR signaling...

I want to make Yellowstone to be as fast as Direct RDRAM you find in PS2 in both max bandwidth and cycle time... ( also we would need to change the memory controller embedded in the EE, the current one only supporst Direct RDRAM, but I do not see this as a major issue )...

We do not want the full 64 bits datapath configuration as seen in that picture, we have two choices:

1) when running in PS2 backward compatibility mode only one of the channels is used ( each channel provides 32 bits of data each clock in ODR mode ).

2) when running in PS2 backwad compatibility mode, only one RDRAM module per channel is used and we use both channels ( each channel would provide 16 bits of data in this operational mode and we have both channels enabled ).

What about clock speed ?

We have two choices there too:

1) You see the two clocks in the diagram arriving at the PLL on the memory controller side and at the PLL on the DRAM chip side, right ? Well what we can do is, when running in PS2 backward compatibility mode, set those clocks down to 100 MHz... each PLL will pick up the 100 MHz clock, multiply it by 4x and then sampling both edges of each clock ( DDR ) resulting in 800 MHz signaling rate, the same as the Direct RDRAM in the PS2.

2) we could have the PLL operate in a PS2 and PS3 mode... PS2 mode would pass the 400 MHz clock as it is ( no multiplication ), the DRAM chips and the memory controller would sample both edges of the clock ( DDR ) resulting in 800 MHz signaling, the same as the Direct RDRAM in PS2.

800 MHz signaling ( I prefer the second option regarding the clock speed setting and we would be only needing to have two settigns for the local clock generators: normal speed [400 MHz] and PS2-compatibility speed [100 MHz.. down-cloking is not that hard... this way we can avoid re-designing the PLL ) * 32 bits/8 = 3.2 GB/s

And with two states: PS2 or PS3 mode ( PSX could be emulated all in software... yeah BE and Visualizer are powerful enough to do so Sony has bought Connectix, they might as well use their technology )

The graphics enhancement would be on the GPU side... FSAA, anisotropic filtering, etc...

Click to expand...

PS2 can go the software route too.

On the GPU side it will surely go the software route ( for the most part ), but I see as too much of an hassle to have to work like crazy to produce a 99.8 % accurate EE software emulator when you could spend that development time on something more important like better tools, better documentation, PSX software emulation, better libraries...

The EE is not a slow CPU and emulating it by software will not be easy: we have no guarantee that the APUs are going to share the same ISA as the VUs ( it might be similar, but it will prolly be far from being the same ) and no guarantee we have on the PUs either... so it would be software emulation all the way...

They didn't choose this route before with PS2 and since again they're in need of a good I/O CPU and Sound Processor, I think the EE would really fit the bill

( also if you shrink it down to 90 nm or lower it will be a very small chip )...

But it would be EE which access the data right ?

Yes it would be the EE running the OS code that would access the external RAM, the BE and the Visualizer would be freed from that...
We could modify even further the EE's new memory controller ( as I was mentioning a few lines above ) to support the bigger sandboxes in HW and not in software ( the EE would not be too slow at managing that in Software, but the thing I fear would be how much of its power would be wasted due to the lack of cache in the EE: the job it would have to do would be even more of random accessing the memory and although in 3.2 GHz signaling mode the memory results 4x faster than what the Direct RDRAM used to be in PS2, the EE would still stall accessing memory )...

This HW modification would be most effective in PS3 mode, while in PS2 comabitbility mode, the EE would access the external RAM like it did in the PS2...

This is how we make sure that an APU will not read into other memory sandboxes ( unless we play with the mask, but only a trusted program like the OS can initiate such a change in the mask )...

With External memory we can store in a certain small area of memory the information regarding the rest of the memory which gets divided in sandboxes ( which have all a fixed size that is larger than each sandbox in the e-DRAM ): we would keep in this section of the memory ( which only the OS could access ) the key control tables, the keys and other information ( start and end of each sandbox in the External memory )... we would mimic with the EE what the memory bank controllers do in HW on the BE and Visualizer... slower, but we are dealing with the 4th memory hieararchy level ( 1st == registers, 2nd == Local Storage and 3rd == e-DRAM )... and the speed will be MUCH faster than an HDD or Blu-Ray...

Click to expand...

So you are making the sandboxes in eDRAM to be a subset of a larger sandboxes in the external memory ? The EE or cell OS would have a hard time of keeping things together. This is like bad cache system.

And we would also have to worry about the EE having enough power left to be as useful also as Sound DSP ( some crazy sound should be available ) and I/O controller ( it would not have more than those three jobs )...

There are other ways, people need to think about them...

Of course, we could spend the resources for a100 % software accurate PS2 emulator in a mini-OS module dedicated and optimized for the EE that would manage ( at least a good chunk of it ) External RAM as a super-set of the smaller snadboxes in the BE and Visualizer...

I don't think the logic transistors would be that high on the BE or VS. Each of those APU probably has about the same logic as the two EE VU. Those thing aren't a Pentium4 core class that's for sure. The PU, could be totally new, or MIPS or PowerPC variants.

I think the APUs will be a bit bigger than the VUs in the EE ( they might have a bit more complex instruction processing logic than the VUs [fetch, decode, issue, execute, ...] and they do have 4 Integer Units too ), but more research should be done to determine this... there are things we do not know: what we know is that the APU are very compact, the problem is understanding their point of reference ( I agree with our Pentium 4 core class comment though )...

Using the chart you had,
64 MB of DRAM would take 0.125cm2
4 MB of SRAM would take 0.35cm2
communication would probably take from 0.5 to 1 cm2.

Still we do not know the details on size of the units and communication and where the chart is pessimistic and where optimistic...

And we do not know if their fabs support wafers big enough to produce enough BEs for each Wafer...

I think they could go with 64 MB for both BE and Visualizer... the design would share many similarities then, you would only change few things like 4 APU replaced by a Pixel Engine and Image Cache ( in the Visualizer ): it could make manufacturing easier...

Panajev2001a · Feb 23, 2003

bump

V3 · Feb 25, 2003

On the GPU side it will surely go the software route ( for the most part ), but I see as too much of an hassle to have to work like crazy to produce a 99.8 % accurate EE software emulator when you could spend that development time on something more important like better tools, better documentation, PSX software emulation, better libraries...

The EE is not a slow CPU and emulating it by software will not be easy: we have no guarantee that the APUs are going to share the same ISA as the VUs ( it might be similar, but it will prolly be far from being the same ) and no guarantee we have on the PUs either... so it would be software emulation all the way...

When you have complete documents, emulator aren't hard at all. While something like P4 can't emulate EE, the BE with all those APUs shouldn't have problem. Even if they ran on different ISA. Even if the ISA is different, the function of those APUs will be similar. I think even the visualiser could do it alone.

They didn't choose this route before with PS2 and since again they're in need of a good I/O CPU and Sound Processor, I think the EE would really fit the bill ( also if you shrink it down to 90 nm or lower it will be a very small chip )...

Sure, the EE would do just nicely as well.

Yes it would be the EE running the OS code that would access the external RAM, the BE and the Visualizer would be freed from that...
We could modify even further the EE's new memory controller ( as I was mentioning a few lines above ) to support the bigger sandboxes in HW and not in software ( the EE would not be too slow at managing that in Software, but the thing I fear would be how much of its power would be wasted due to the lack of cache in the EE: the job it would have to do would be even more of random accessing the memory and although in 3.2 GHz signaling mode the memory results 4x faster than what the Direct RDRAM used to be in PS2, the EE would still stall accessing memory )...

This HW modification would be most effective in PS3 mode, while in PS2 comabitbility mode, the EE would access the external RAM like it did in the PS2...

I guess they can modify the EE to suit these new conditions.

And we would also have to worry about the EE having enough power left to be as useful also as Sound DSP ( some crazy sound should be available ) and I/O controller ( it would not have more than those three jobs )...

I think its no problem to keep Sound out of EE. Sony will probably already make a new sound processor anyway.

I think the APUs will be a bit bigger than the VUs in the EE ( they might have a bit more complex instruction processing logic than the VUs [fetch, decode, issue, execute, ...] and they do have 4 Integer Units too ), but more research should be done to determine this... there are things we do not know: what we know is that the APU are very compact, the problem is understanding their point of reference ( I agree with our Pentium 4 core class comment though )...

But they won't be much bigger than the two EE VUs combine. Due to APUs arrangement, the transistors would alse be better spend than those EE VUs, which is abit of a mess.

Still we do not know the details on size of the units and communication and where the chart is pessimistic and where optimistic...

If they want it clock @ 4GHz, the logic can't take up most of the die space. It'll be meltdown.

And we do not know if their fabs support wafers big enough to produce enough BEs for each Wafer...

Well like, Sony invested alot on plans for producing EE and GS. I assumed they'll upgrade most of these for these new chips. Besides Sony has the lead they can take their time.

BTW PS2 was meant to be for 1997, with PS3 for 2003 and PS4 for 2011. This was Kutaragi plan during PSX era. I think they are somewhat behind on their schedule. So I guess they are taking their time.

I think they could go with 64 MB for both BE and Visualizer... the design would share many similarities then, you would only change few things like 4 APU replaced by a Pixel Engine and Image Cache ( in the Visualizer ): it could make manufacturing easier...

Its modular afterall. They can even turn the bad BE or Visualizer for used in mid range TV

Blade · Feb 25, 2003

BTW PS2 was meant to be for 1997, with PS3 for 2003 and PS4 for 2011. This was Kutaragi plan during PSX era. I think they are somewhat behind on their schedule. So I guess they are taking their time.

By "taking their time" they're also not alienating the consumer.

Putting out PS2 2 years after PS1 might've been a bad, bad move.

The 6 years between PS2 and PS3 would've been just right, though.

Real PS3 Architecture

Similar threads