Corrected version...
Very rough estimates ahead
Ok... so our budget is 350 mm^2
Ok... according to prof. Nair's research at IBM with 70 nm technology you should be able to embed 4.03 Gbits/cm^2.
64 MB = 0.5 Gbits which is 1/8th of 4.03 Gbits.
So we would need 1/8th of 1 cm^2 and this means:
1 cm^2 = ( 100 mm^2 ) / ( 8 ) = 12.5 mm^2
I originally also took the fugures for 100 nm, but that it is a bit too much considering their new process is 65 nm [and they seem pretty happy about DRAM cell's size and not 70 nm and that should also take into account the wide busses for the e-DRAM )...
350 - 12.5 = 337.5 mm^2
Now... basically the Broadband Engine has 32 APUs, 4 PUs ( very tight and compact cores ) + 4 DMAC and 4 MB of Local Storage ( LOS, SRAM based )...
Edit: each APU has 128 KB of Local Storage... I am just summing it up all together to simplify the discussion...
IBM's paper predicts around 577 MTransistors/cm^2 with 70 nm for SRAM... We can upgrade it to 600 MTransistors/cm^2 as they are using 65 nm technology.
4 MB ( total amount of SRAM based Local Storage ) = 32 Mbits = 0.03125 Gbits = 192 MTransistors. ( using 6 Transistors per bit )
( 600 / 192 ) = 3.125
1 cm^2 = ( ( 100 mm^2 ) / ( 3.125 ) ) = 32 mm^2
We have 32 Local Storages, each 128 KB, so we can assume that each Local Storage takes 32 / 32 = 1 mm^2
Let's assume the 4 PUs + 4 DMACs take all together 12 mm^2...
337.5 - 12 = 325.5 mm^2
We have 32 APUs...
This would leave:
(325.5 / 32 ) = ~10.1 mm^2 for each APU.
If we thought about the e-DRAM taking 30 mm^2 as the Nair paper suggested, regarding 100 nm technology, then we would have: ~9.6 mm^2 for each APU.
So, according to the "good scenario"...
10.1 - 1 = 9.1 mm^2 for the 4 FP Units and the 4 Integer Units and the thirty-two 128 bits registers.
According to the "bad scenario"...
9.6 - 1 = 8.6 mm^2 for the 4 FP Units and the 4 Integer Units and the thirty-two 128 bits registers.
VU0+VU1 in 250 nm take 70 mm^2 and we know VU1 is bigger than VU0 ( 2x the micro-memory, 1 more FMAC and one more FDIV )... so let's assume that VU1 measures around 40-44 mm^2.
Using 65 nm technology we should be able to shrink it to less than 10.35-11.44 mm^2 ( considerably less, assuming that redesigning the layout of the chip [in the shrinking process] would allow better die area usage optimizations... and that the SRAM cells in the Local Storage might be smaller than the SRAM cell used in the VU's micro-memories ) and that includes 32 KB of SRAM ( micro-memories ) and thirty-two 128 bits registers and sixteen 16 bits GPRs...