Real PS3 Architecture

128 bytes * 1.8-0.9 GHz = 230.4-115.2 GB/s... *4 ! for 4 CELL =

= 920-460 GB/s

or *4 for 4 CELL , *2 READ/WRITE =

= 1840-920 Gb/s
 
V3... I imagine that the patent does not mention controllers in specific... should we assume it does not use 'em ?

Who knows ? It might used some new tech ;) But I was just curious, when you said optical disc. But just basing from the patent without any speculation, external memory could be optical disc as well.

The patent has no mentioned of BE capable of adressing external memory. It does however able to adress eDRAM of another BE nearby. So the way things are, if there is external memory, if would be the memory of I/O processor, as the BE wouldn't be able to see it. If that's not the case than Sony need to redo the memory controller.

My take is not to use Yellowstone, but to put eDRAM into the I/O processor. They can embed Sound chip in there as well or keep seperate. You can put in alot of eDRAM with that I/O processor.

If I were Sony, I wouldn't rely too much on Rambus. Keeping everything on eDRAM will reduce cost in the long run.
 
this is a real IBM's GRID processor

pic1561752.jpg
 
V3,

I think External memory would be still RAM and not the optical disc...

They didn't license Yellowstone just to increase their IP portfolio ;)

Keeping everyhting in the e-DRAM could be feasible with sub 45 nm tech as I do not see 256-512 MB of e-DRAM EVER fitting with 65 nm technology... 64 is a bit of a stretch and using 4 BEs would make the PCB board too complex...

Still they might use Yellowstone for the e-DRAM and Redstone for chip-to-chip interconnect...

I found this in the patent which actually supports your theory...

[0064] PE 201 is closely associated with a dynamic random access memory (DRAM) 225 through a high bandwidth memory connection 227. DRAM 225 functions as the main memory for PE 201.

Still it says "for PE" and not for the system... I would expect some meory to be found in the I/O CPU at least to buffer from the optical disc ( it could be the 32 MB of Direct RAMBUS DRAM inherited from PS2 backward compatibility in the I/O ASIC )...

Still the Hybrid UMA approach makes sense... and their menaing of functioning as main memory might mean something realted to the fact the DMAC's do see the e-DRAM, but not the external memory... the e-DRAM is main RAM as far as the PEs are concerned...

The Visualizer would have some e-DRAM too I'd think and so will the I/O CPU... having a decent sized RAM pool attached to the I/O ASIC as external memory could be interesting ( external is referred to compare it with internal/embedded DRAM that the BE does habe on chip... ) as it would follow quite well the Hybrid UMA principle ( shared memory, but each accessing processor has a bit of local memory [Local Storage] to buffer data and work locally while the bus is not available )






Vers,

I think I owe you an apology... I re-read the patent ( again :) ) and found this... ( took the time to look at it well )...

[0081] FIG. 12A illustrates the control system and structure for the DRAM of a BE. A similar control system and structure is employed in processors having other sizes and containing more or less PEs. As shown in this figure, a cross-bar switch connects each DMAC 1210 of the four PEs comprising BE 1201 to eight bank controls 1206. Each bank control 1206 controls eight banks 1208 (only four are shown in the figure) of DRAM 1204. DRAM 1204, therefore, comprises a total of sixty-four banks. In a preferred embodiment, DRAM 1204 has a capacity of 64 megabytes, and each bank has a capacity of 1 megabyte. The smallest addressable unit within each bank, in this preferred embodiment, is a block of 1024 bits.

64 banks, 1 MB each...

Each DMAC of the four PEs connects to 8 bank controls through a crossbar switch ( only one request at a time from each bank :) )...

Each bank control controls 8 banks...

You transfer in 128 bytes chunks ( 1,024 bits ) from each bank...

Each bank control is connected to the Switch ( Switching logic ) which, for each bank controller, allows one transaction...

So we can have a maximum of 8 transactions active at any given time...

We have 4 PEs each with it's own DMAC which means each PE should be able to have 2 memory operations active at a given moment in time... parallel READs and WRITEs for each DMAC could be possible I'd say...

Each clock cycle to each PE can arrive+leave ( READ+WRITE ) a maximum of 2,048 bits ( 1,024 bits x 2 ) or 256 bytes ( 128 bytes x 2 ).

It makes more sense to count the bandwidth for each PE and then the aggregated BE's bandwidth...

Running at 1.8-0.9 GHz this means 460.8-230.4 GB/s between PE and DRAM...

and yes the total aggregate bandwidth for the BE is 4x higher as we have 4 PEs... ~1.8-0.9 TB/s


Still each PE would get 1/4th of that...

In addition the 1,024 bits PE bus ( inside each PU... connecting APUs and PU ) can be implemented in two possible ways...

1) one 1,024 bits bus ( one request at a time, bi-directional, but not FULL-DUPLEX )...

2) Packet Switched Network ( another switch :) )... this last approach would make better use of the 460.8-240.4 GB/s the DRAM can provide to each PE... while using a normal 1,024 bits bus would have us alternate WRITEs and READs ( the DMAC could do both at the same time: we could still have a small FIFO on the DMAC )...

Approach number 1 would be cheaper and some small fixes could be implemented to increase its efficiency... a FIFO on the DMAC ( or "next to it ;) ) would allow us to queue READs and WRITEs and, since due to latency in memory operations still existing we would see READs and WRITEs requests accumulating in the FIFO... we could speed up things by doing paired WRITEs and READs from the FIFO then :)
 
why in 2003 we will see in 90 nm a 100+ MTransistor consumer processor ( Prescott ) launching at 3.06 GHz... two years and the move at 65 nm should allow us to meet that kind of clock-frequency and Transistor requirement ( when they move to 45 nm then they can start making nice money on the HW sold... at 65 n they will still have to take a loss on the HW sold, and it won't be insignificant, but necessary to push the platform )...

Else, we can just add one or two more PEs :)
 
Remember we are talking about sub 300 here . Its a diffrent story to sell a 3 ghz chp for 300 but a whole system ? thats a bit crazy in my mind. Look at the x box . It came out with a 700mhz cpu and we were already at 2 ghz at that time. What did sony have and what was out at the time ?
 
64 MB EDRAM = 500 millions transistor
1 PU =4 mill transistor , 4 PU= 15 mill
1 APU= 3mill trans , 32 APU =100 mill
dma+fifo+bus+.... 10-50 mill

600-800 millions transistor in cpu core at 3600Mhz , incredible...
 
Remember we are talking about sub 300 here . Its a diffrent story to sell a 3 ghz chp for 300 but a whole system ?
PSOne 30mhz -> PS2 300mhz
If trend is followed, PS3 must be at least 3ghz :p
 
Panajev2001a said:
why in 2003 we will see in 90 nm a 100+ MTransistor consumer processor ( Prescott ) launching at 3.06 GHz... two years and the move at 65 nm should allow us to meet that kind of clock-frequency and Transistor requirement ( when they move to 45 nm then they can start making nice money on the HW sold... at 65 n they will still have to take a loss on the HW sold, and it won't be insignificant, but necessary to push the platform )...

Else, we can just add one or two more PEs :)


Here is a quote from Mark Bohr, director of process architecture at Intel.

The 90-nanometer manufacturing process, though, isn't a panacea. Chips made on this process will be more subject to gate leakage, or random energy dissipation, a phenomenon that can reduce battery life and other problems, Bohr said.

Coming up with a system for manufacturing 65-nanometer chips in 2005 will be even more difficult. The gate oxide, for example, will have to go below five atomic layers, which measures only 1.2 nanometers to begin with. The 65-nanometer generation will also be the last made with conventional lithography techniques. In 2007, the industry will switch to Extreme Ultraviolet (EUV) lithography, developed by a consortium of national laboratories and private companies.

"Shrinking it (the gate oxide) to 65 nanometers is going to be pretty tough," he said.

http://news.com.com/2100-1001-949493.html

It may be possible for Sony/IBM to have .65 nm in 2005, but they will be certainlly challenged to pull it off.
 
Brimstone... still, Intel projects 65 nm for 2005...

Sony and Toshiba in early 2003 already announced they completed their libraries for 65 nm... two years to implement 65 nm is not rushing it too much...



Jvd,

to Sony, IBM and Toshiba, at launch, PS3 is going to cost more than $300 ;)
 
Panajev2001a said:
Brimstone... still, Intel projects 65 nm for 2005...

Sony and Toshiba in early 2003 already announced they completed their libraries for 65 nm... two years to implement 65 nm is not rushing it too much...



Jvd,

to Sony, IBM and Toshiba, at launch, PS3 is going to cost more than $300 ;)



well i guess it will cost them A LOT more than Ă‚ÂŁ300 at launch and for a long time afterwards... that is, if all u peaople are speculating about is true... u never know, they might just pull out an underpowered system... personally i do not think so since (IIRC) they're pretty much the only ones who havent downgraded their systems... i mean unlike Nintendo's decision to downgrade the GPU clock speed (althought increase CPU clock)... and also microsoft downgrading the GPU clock for NV2A... but i guess it could be possible that Sony have downgraded the systems specs without telling us :LOL:
 
AFAIK, Sony upgraded them... I think the first published spec had something with 5+ GFLOPs. After that, it went up to the 6.2 number it is today. I think ArsTechnica covered it at the time..
 
They presented the EE at 250 MHz at the IEEE Conference in 1999 ( January ) and later on they pushed it up at 300 MHz...

PS3 will be sold quite a bit below cost initially, but what Sony is counting on is smoother than PS2 laucnh manufacturing ( no bad shortages ) and when the plants are at full speed, the cost will start to decrease as new manufacturing technology are available and the PS3's HW get ported to those...

PS2 was sold at loss... not much time later they sell it for good profit, they slash the price at $199 and they still make profit... and now they are going to put both EE and GS in a single chip ( helps reducing PCB costs too )...
 
Panajev I never said anything about a clock, I said rate. I did say how I interpreted what he said, I dont mind that you reply with "the one true meaning" of it ... but I hope you forgive me if I dont take your opinion for gospel. A yellowstone connection always has 2 databusses. So depending on your point of view saying a yellowstone connection has a 32 bit buswidth, can be interpreted in two ways without too much of a stretch of the imagination IMO. Im a nice guy so I just took the interpretation which is not in fairy tale country, which still left him with the assumption that yellowstone will perform in the top end of the projected performance by Rambus. Which was unduly optimistic given Rambus's track record.
 
I still don't see it hitting more than 3 ghz . Yes there are p4s that clock that fast right now , but it was designed to and its actually very simplistic compared to the cell chips. Also the p4s have been tweaked to get that high. I just don't see it happening. I see 2.5 - 3 ghz chips in these things at the most. The gpu i can see hitting 1 ghz though.
 
Gathering from what I picked up in the other thread (thanks Vince, Panajev2001a) I think that it is very crucial that Sony does get the chip running at at least 3 GHz+ to reach 1 TFLOPs, is that correct?
 
Panajev I never said anything about a clock, I said rate. I did say how I interpreted what he said, I dont mind that you reply with "the one true meaning" of it ... but I hope you forgive me if I dont take your opinion for gospel. A yellowstone connection always has 2 databusses.

Me forgiving you ? hehe... do not act like that, you know your stuff very well and you have showed it in the past and prolly you have a couple or more years of experience on your shoulders... I wasn't trying to act like I know it all and my word is gospel, sorry if it appeared like that...


I wasn't encouranging vers speculation assuming we can hit that rate...

I am interested into understanding more about a Yellowstone connection having 2 data-busses... is that to allow parallel READs+WRITEs ?

I was just trying to understand Yellowstone based on the PR RAMBUS provided...

If you have more info and feel like "owning" me like no tomorrow on mistakes I am making regarding Yellowstone, please go ahead... as long as it is more than "I know it and that is as much as I am telling you"...

The way I am understanding the Yellowstone technology is that they use an external clock speed of 400-800 MHz, at least going by their PR work, I have no other sources ( I'd like to learn more, so if you have some public links I can follow I will, ( you corrected me telling that 12.8 GHz was signaling rate and I understand that )... the clock on chip is used to produce a signaling rate that is 4 * clock-speed ( through the PLL ) and the result is multiplied by 2 ( DDR ) achieving ODR ( 8 bits/"clock" )...

When he said 102.4 GB/s I was a bit surprised because for 2005 I expected 50 GB/s to be the top of what we can expect for PS3's external RAM ( Yellowstone based )... 32 GB/s would be 10x the current Direct RDRAM bandwidth, which would be alreqady a good jump considering that we have e-DRAM on the BE and the Visualizer...

128 bits as data bus width was IMHO usable as we use GS's 2,560 bits width to calculate total bandwidth ( we assume parallel frame-buffer READs+WRITEs and Texture fetches for every DRAM Macro [64 bits for READ, 64 bits for WRITE and 32 bits for texture access, looking at the Pixel Engine side of things ... )...

If I understand your thought correctly 128 bits would be 2x64 bits ( each conection has two busses )... If we schedule WRITEs and READs well we can execute them in parallel and achieve bandwidth we would have if we considered the transfer of 16 bytes ( 128 bits ) each cycle... this would require 6.4 GHz on chip signaling ( As far as I understand it )... since we know how they get that level of on-chip signaling I derived the real clock speed for the Yellowstone connection...

6.4 / 2 = 3.2 / 4 = 0.8 GHz = 800 MHz

And I found this to be a bit high... Again I'd be happy with 51.2 GB/s and that in theory would bring the external routed clock to 400 MHz for the Yellowstone connection ( 3.2 GHz signaling on chip )...

Are you saying that the 400 MHz they say it is a system clock it is again signaling rate ( like they took a 200 MHz clock and multiplied it by 2 with a PLL ? ) ? Could you explain your point of view a bit more ? I will promise I will listen and ponder on your reply...

Thank you Mfa for pafrtecipating in the thread,

Panajev
 
Back
Top