PS4 to use Cell, NOT PS3?

V3 I was looking for over the top figures: bigger die size was going to lead to lower amount of chips per wafer and lower yelds ( more processor failing the validation process ) so I ran the calculations using a big die size ( even if it is not ridicously big... just very big ;) ) and the rest came accordingly...

If you put ~250 mm^2 ( original PlayStation 2 chips were around 229+ mm^2 IIRC and that was using 250 nm technology... ) value for die size you obtain more chips per wafer... but I tried not to be overly positive to have the calculations try to include negative effect of lots of factors that might come up in play or not...

The basic building block, the APU, is repeated over and over on the die so it should be a bit easier to verify, debug and manufacture... it is also true that transistors wise those APU should not be too huge... I expect them to be 1.5-2x the PlayStation 2's VUs due to added Integer Units, more complex FP units ( maybe for FDIV power ), wider busses and the 128 KB of Local Storage...

14 mm^2 for 64 MB of e-DRAM ? even considering the 1,024 bits datapaths ? ( meaning that you have limits how close everything can be together without cross-talk and other issues... )

I am just curious... I am trying to get an idea of the Broadband Engine size...
 
That would be 12-13 mm squared ie. 144-196 mm^2.

With superwide buses at super high speed.

... and 128 FMACs @ 4 GHz.

Remove the HSF and use it instead of your GAS TORCH WELDING kit.

Cheers
Gubbi
 
Gubbi said:
That would be 12-13 mm squared ie. 144-196 mm^2.

With superwide buses at super high speed.

... and 128 FMACs @ 4 GHz.

Remove the HSF and use it instead of your GAS TORCH WELDING kit.

Cheers
Gubbi

Wow... will I be able to use it instead of my Itanium based OVEN ?

Seriously... 144 mm^2 is not incredibly bad... adding another 56 mm^2 for SRAM ( Local Storage... 4 MB divided in128 KB blocks for every APU ) and other stuff we get 100-150 mm^2 ( out of 300-350 mm^2 that I assumed to be the total processor's size ) for pure logic ( execution units and stuff... VUs, in the later revisions of the EE, except the very first 250 nm EE, were quite compact and I do not expect iether the APUs or the PUs to be bloated units )...

I expect the Broadband Engine and the Visualizer to be quite tight chips as using so many of these identical blocks ( APUs ) can be a benefit in that regard too...



BTW, nobody expects the e-DRAM to run at 4 GHz... I expect the local busses and the e-DRAM to operate at 1 GHz )...

Also 1 FMAC is not this HUGELY gigantic thing... it is not that big...
 
LoL, the old die size unit confusion.

You've been MIA for a while, glad to see you still around, Gubbi.

nVidia also got bent-over by TSMC who didn't get their 130nm process with dielectrics working at an acceptable level. nVidia now must suffer the fate of higher thermal dissapation than designed for, pushing the tolerances and ultimatly having a chip that yeilds for shit and can double as a spade-heater, vacuum cleaner and/or sexual toy.

I'm not conVINCEd by this arguement. ;) I think Nvidia's design is partly to blame.

PS. To someone like Marco, if a Dave's in here... Do CPU makers call it a netlist handoff or GDSII tape-out aswell? I really don't know much about CPU design, are their tools similiar to 3D IHVs? Verilog anyone?

FWIW, I was reading a RWT discussion about this a while back. Off the top of my head, the logic design is done using a lot of the automated tools, but then this design is used for a basis to make the actual CPU design which is a lot of custom work (Read: limited automation) to get the behaviour of the previous design. IIRC this was in the context of high performance CPUs such at those by the Alpha team.

Unfortunately, the thread is very old (a year maybe) and is basically impossible to find because the RWT forums aren't very user friendly in that regard.
 
Seriously... 144 mm^2 is not incredibly bad... adding another 56 mm^2 for SRAM ( Local Storage... 4 MB divided in128 KB blocks for every APU ) and other stuff we get 100-150 mm^2 ( out of 300-350 mm^2 that I assumed to be the total processor's size ) for pure logic ( execution units and stuff... VUs, in the later revisions of the EE, except the very first 250 nm EE, were quite compact and I do not expect iether the APUs or the PUs to be bloated units )...

I don't think I can agree with this. I think the heat energy density on such a tiny chip would be VERY high. So the cooling mechanism would have to to be VERY good at extracting heat. I wonder what kinda HS/fan combo will be required, just hope it isn't a BlowFX.
 
Gubbi...

From prof. Nair research paper on IBM's web site:

Code:
 Technology 	 	  nm 	180 	130 	100 	70 	50
  Gate length 	 	  nm 	140 	85–90 	65 	45 	30–32
  Density 	  DRAM 	  Gb/cm2 	0.27 	0.71 	1.63 	4.03 	9.94
  SRAM 	  Million
  transistors
  per cm2 	35 	95 	234 	577 	1423
  High-performance logic 	24 	65 	142 	350 	863
  ASIC logic 	20 	54 	133 	328 	811
  High-volume logic 	7 	18 	41 	100 	247
  Local clock
  frequency 	  High-performance 	  GHz 	1.25 	2.1 	3.5 	6.0 	10.0
  ASIC 	0.5 	0.7 	0.9 	1.2 	1.5
  High-volume 	0.6 	0.8 	1.1 	1.4 	1.8

(64 MB * 8 ) / (1,024 ) = 0.5 Gbits which going by their road-map ( which is outdated and pushed forward going by IBM's succesful research on 90 nm and 65 nm ) with a 70 nm like process would put 4.03 Gbits/cm^2

4.03 is 8 times what we need... so we would take ~1/8th of 1 cm^2

1 cm^2 = 100 mm^2 / 8 = 12.5 mm^2

Even if due to the bus size and all we had to use 1.63 Gbits/cm^2

This would mean 100 mm^2 / 3.26 = ~30 mm^2
 
Wait... you call 350 mm^2 a tiny chip ? ;)

Still, they will be using some nice cooling mechanism and I expect them to try to cure the heat dissipation issue as well with 45 nm and smaller technologies once they can shrink the die.
 
Also, Saem do not expect the whole chip to be running at 4 GHz... all the 1,024 bits busses and the e-DRAM will run IMHO at speeds of 1 GHz ( 2 GHz maximum, but not very probable )...
 
Saem said:
Seriously... 144 mm^2 is not incredibly bad... adding another 56 mm^2 for SRAM ( Local Storage... 4 MB divided in128 KB blocks for every APU ) and other stuff we get 100-150 mm^2 ( out of 300-350 mm^2 that I assumed to be the total processor's size ) for pure logic ( execution units and stuff... VUs, in the later revisions of the EE, except the very first 250 nm EE, were quite compact and I do not expect iether the APUs or the PUs to be bloated units )...

I don't think I can agree with this. I think the heat energy density on such a tiny chip would be VERY high. So the cooling mechanism would have to to be VERY good at extracting heat. I wonder what kinda HS/fan combo will be required, just hope it isn't a BlowFX.

Sorry... last reply... ;)

Saem, I was not directly commenting about the heat issue ( thanks for raising the point though, it is an interesting issue and it should be discussed, feel free to elaborate more if you want ), but I was commenting about the chip's size and the space taken by e-DRAM, etc...
 
Everything about .65nm coming out in 2007, was escallated when Sony and Toshiba came on board. More research money and more team work, to shave off a year and bring the process up faster.

Sony & SCE would not invest 1.6billion into a product that will not be the successor to the item that brings in 40% of Sony profits.

The only question is whether it will be 2005, or 2006.


Speng.
 
Thank you for the figures, I will try to coem up with a very rough estimate of the Broadband Engine size and transistor count... I was doing some calculation OTOH and I was expecting to see 300 mm^2 for the die, but I think the number can and should be lower... well until I run some more hand calculations I won't know ;)
 
Panajev2001a said:
4.03 is 8 times what we need... so we would take ~1/8th of 1 cm^2

1 cm^2 = 100 mm^2 / 8 = 12.5 mm^2
Even if due to the bus size and all we had to use 1.63 Gbits/cm^2
This would mean 100 mm^2 / 3.26 = ~30 mm^2

Ahh, very good... I thought Gubbi/V3's estimate was just a bit off... :)

Everything about .65nm coming out in 2007, was escallated when Sony and Toshiba came on board. More research money and more team work, to shave off a year and bring the process up faster.

Agreed. Toshiba has pubically stated 65nm by late 2004, Intel's roadmap follows with it in 2005, TSMC... who knows. Also, Toshiba has a great reputation when it comes to shrinking the EE/GS.
 
Ahh, very good... I thought Gubbi/V3's estimate was just a bit off...

Me off ??, It was Gubbi who was off, he took an area and squared it, beats me why he did that. Anyway, its an estimate.

Anyway buses goes into communication logic, when estimating.
 
Thanks V3, I will take into account when I look at transistor count ( busses should go into communication, I agree )... I was only examining the die area used by 64 MB of e-DRAM and if we have big fat pipes ( busses ) it is difficult to reduce the size of the e-DRAM as much as if we had tiny little busses... I am simplifying the argument, but the point should still get across without being invalid...
 
Very rough estimates ahead :D

Ok... so our budget is 350 mm^2

Ok... according to prof. Nair's research at IBM with 70 nm technology you should be able to embed 4.03 Gbits/cm^2.

64 MB = 0.5 Gbits which is 1/8th of 4.03 Gbits.

So we would need 1/8th of 1 cm^2 and this means:

1 cm^2 = ( 100 mm^2 ) / ( 8 ) = 12.5 mm^2

I originally also took the fugures for 100 nm, but that it is a bit too much considering their new process is 65 nm [and they seem pretty happy about DRAM cell's size and not 70 nm and that should also take into account the wide busses for the e-DRAM )...

350 - 12.5 = 337.5 mm^2

Now... basically the Broadband Engine has 32 APUs, 4 PUs ( very tight and compact cores ) + 4 DMAC and 4 MB of Local Storage ( LOS, SRAM based )...

Edit: each APU has 128 KB of Local Storage... I am just summing it up all together to simplify the discussion...

Let's assume the DRAM cells were only 1/4th the size of the Local Storage SRAM cells ( we assume something more than the simple 1 Transistor and 1 Capacitor concept )...

4 MB = 64 MB / 16... the total LS memory is 1/16th of the total e-DRAM... but the LS has 4x as big memory cells as it is SRAM ( as we said before )...

so... ( 12.5 / 16 ) * 4 = 3.125 mm^2 for the whole Local Storage ( always considering 65 nm technology... for those who read one line here and there ;) )

We have 32 Local Storages each 128 KB so we can assume that each Local Storage takes 3.125 / 8 = 0.40 mm^2

Let's assume the 4 PUs + 4 DMACs take all together 12 mm^2...

337.5 - 12 = 325.5 mm^2

We have 32 APUs...

This would leave:

(325.5 / 32 ) = ~10.1 mm^2 for each APU.

If we thought about the e-DRAM taking 30 mm^2 as the Nair paper suggested, regarding 100 nm technology, then we would have: ~9.6 mm^2 for each APU and each 128 KB of Local Storage would now take 0.9375 mm^2.

So, according to the "good scenario"...

10.1 - 0.40 = 9.7 mm^2 for the 4 FP Units and the 4 Integer Units and the thirty-two 128 bits registers.

According to the "bad scenario"...

9.6 - 0.9375 = 8.6625 mm^2 for the 4 FP Units and the 4 Integer Units and the thirty-two 128 bits registers.

VU0+VU1 in 250 nm take 70 mm^2 and we know VU1 is bigger than VU0 ( 2x the micro-memory, 1 more FMAC and one more FDIV )... so let's assume that VU1 measures around 40-44 mm^2.

Using 65 nm technology we should be able to shrink it to less than 10.35-11.44 mm^2 ( considerably less, assuming that redesigning the layout of the chip [in the shrinking process] would allow better die area usage optimizations... and that the SRAM cells in the Local Storage might be smaller than the SRAM cell used in the VU's micro-memories ) and that includes 32 KB of SRAM ( micro-memories ) and thirty-two 128 bits registers and sixteen 16 bits GPRs...
 
Panajev,

Why don't you estimate using the chart you give, through out the estimation ?

That way it will be more consistent.
 
maybe they'll give MS the head start nex-gen,just enough to crush them 6-8 months later hardware side +really descent software support ? :)
 
Back
Top