V3 said:
To keep it on topic,
Here is my breakdown of the size of Cell on 65 nm
64 eDRAM + controller, 120 mm2
4 PUs, 20 mm2 each, total 80 mm2
36 APUs, 5 mm2 each, 180 mm2
Buses, 4 DMACs, other stuffs, 40 mm2
Total 420 mm2, that's my speculation anyway. Toshiba eDRAM and SRAM cells, though the claimed to be the smallest in the world, might be not densed enough to be economical. But we shall see, I guess. If that's too costly, than we might not hear the end of DMGA gloating
So, 24-32 MB of e-DRAM + Controller + 4 MB of SRAM...
2) Embedded DRAM cell:
High-speed data processing requires a single-chip solution integrating a microprocessor and embedded large volume memory. Toshiba is the only semiconductor vendor able to offer commercial trench-capacitor DRAM technology for 90nm-generation DRAM-embedded System LSI. Toshiba and Sony have utilized 65nm process to technology to fabricate an embedded DRAM with a cell size of 0.11um2, the world's smallest, which will allow DRAM with a capacity of more than 256Mbit to be integrated on a single chip.
3) Embedded SRAM cell:
SRAM is sometimes used as cache memory in SoC systems. The Hi-NA193-nm lithography with alternating phase shift mask and the slimming process combined with the non-slimming trim mask process will achieve the world's smallest embedded SRAM cell in the 65nm generation an areas of only 0.6um2.
4 MB of SRAM = 4 MB * 8 bits/bytes * 0.6 um^2 = 4 MB * 8 bits/byte * ( ( 0.6 * (10^-6)) mm^2 ) = 20.132 mm^2
Let's say 22 mm^2 to account for some inefficiencies, tags, LS interfaces ( ~2 mm^2 for that, a bit of an exageration: there are some MIPS cores that fit in less space ).
24-32 MB of e-DRAM = 32 MB * 8 bits/bytes * 0.11 um^2 = 32 MB * 8 bits/byte * ( ( 0.11 * (10^-6)) mm^2 ) = 22.146-29.528 mm^2
Again, let's say 23-30 mm^2 without the memory banks controllers, the bank access switch, data and address busses and extra tags for flags as outlined in the patent.
Let's say all that ( memory bank controllers, bank access switch, data + address busses, extra tags, etc... ) brings it to 33-40 mm^2: still, think those addition are not very optimized density wise and we get a ~43-50 mm^2 area utilization ( this is almost 2x the area used by the e-DRAM cells themselves ).
This mean about 65-72 mm^2 for both SRAM and e-DRAM.
32 APUs at 5.5 mm^2 each ( to take into better account the nice Register file I added 0.5 mm^2 to your estimate [I also thought about a 65 nm EE and how big would be something like VU0 and adjusted for things like more Registers and stuff: from VU0 I can take out the 8 KB of SRAM based micro-memories as we are counting the SRAM somewhere else] ) would be 176 mm^2
PUs being 20 mm^2 each ? That is quite big man.
EE+GS@90 nm is 86 mm^2 and the EE part is porbably around 40-42 mm^2.
This would mean that each 65 nm PU is approximately half the size of the 90 nm EE which in turn means that a 65 nm PU is about as big as a 65 nm shrinked EE.
I'd say 6 mm^2 is big enough.
I do not expect anything more than compact cores with like 16 KB of Instruction Cache and 16 KB of Data Cache and a two-way super-scalar, in-order execution engine: MIPS cores in 130 nm, without caches, can be smaller than 2 mm^2.
The PUs should be relatively simple and still we would have 4 of them in parallel ( 4 processes at the same time ) running at around 2 GHz each.
Look at these ARM11 cores in 130 nm:
Performance Characteristics 0.13µ
ARM1136J-S
ARM1136JF-S
Area (mm2)
7.75
9.25
Frequency (MHz) *
333-550
333-550
Dhrystone 2.1 MIPS/MHz
1.2
1.2
Power Consumption (mW/MHz) **
0.75
0.75
0.13µ silicon foundry process. * Worst case: Vdd(nom)-10%, 125C, slow silicon. ** Typical: Vdd(nom), 25C, nominal silicon
Area includes 16k instruction cache and 16k data cache
I think that 10 mm^2 in 130 nm does not sound bad and those ARM11 cores are not slow and they include 32 KB of total L1 Cache.
4 PUs * 6 mm^2/APU = 24 mm^2.
So far we have,
1.) 24 + 72 + 176 = 272 mm^2. ( 32 MB of e-DRAM ).
2.) 24 + 65 + 176 = 265 mm^2. ( 24 MB of e-DRAM ).
Busses ( we took some already in acount ), 4 DMAC, Redwood interface, etc... should take around 20 mm^2.
A total area of 285-292 mm^2: not impossible to realize for SCE: the 250 nm GS in the first PlayStation 2 consoles was 279 mm^2.
I have to say that I was not trying to be too optimistic regarding the e-DRAM and SRAM area untilization as you can see so they might reduce the area used or reduce the e-DRAM to 16 MB and upgrade XDR to 51.2 GB/s as my calculations that put 24-32 MB of e-DRAM looked at 25.6 GB/s XDR ( 400 MHz base clock, 64 bits memory controller hence 128 data pins, etc... 51.2 GB/s would be obtained by either doubling the base clock or the data pins to 256 which would mean a 128 bits memory controller.
So if control, busses, Redwood interface, DMACs, take more space there is a good amount of head-room in the e-DRAM area usage.
Using 16 MB of e-DRAM ( still quite a lot if you add the 4 MB of SRAM: we would have 20 MB of total on-chip memory ) we would reduce the total area to 275 mm^2 ( 16 MB of e-DRAM would only take, following the same calculations I did above for 24-32 MB of e-DRAM, 35 mm^2 with the bank controllers, busses, bank access switch, etc... take into consieration ).