Thermal issue : Will EE3 be able to survive the heat?

Panajev2001a,

Are you talking about a global capacitance of the IC or individual capacitance of a transistor? I'm only envisioning that if you are shrinking the size of a transistor, the capacitance of that transistor will decrease in like (as described by nondescript), not increase. What's the real story?
 
...

Emotion Engine's die analysis(First Generation 250 nm part).

VU0/1 : 31%
R5900 : 29%
Everything else : 39%

This means at 43mm2@90nm, VU0/1 are taking up 13.33 mm2 of PSX2OAC's real estate. While the size of VU0/VU1 aren't identical, I will just divide them by half to obtain a rough estimate 6.7 mm2 per VU.

Compared to older VU, the new CELL APU(I prefer the term VU2, but will use Sony terminology this time) will see a 2.5x increase in die size count because of several enhancements, namely the new 128 bit integer unit(Older VUs had a 16 bit one), larger register file, interAPU communications, and 128 KB local memory(VU1 had 32 KB in total). The logic transistor count will double, while the SRAM transistor count will increase 4x.

APU die area = 2.5x VU1 = 16.675 mm2 @ 90 nm

Since each PE has 9 APUs(8 + 1 spare), the total die size of APU block is 150.075 mm2 @ 90 nm. Throw in a PPC core of say, 20 mm2, and the support circuitary of 20 mm2 and you are looking at the total die size of around 190 mm2 per PE.

Of course, EE3 isn't going to be fabricated on 90 nm, so moving it to 65 nm halves its die size to 95 mm2. Have two of them on a die and you have a die area of 190 mm2 + I/O area. Have four of them(like the CELLserver) and you are looking at a massive die of 380 mm2 without any eDRAM and I/O. These are very optimistic numbers and the actual die could be larger.

Let me summarize.

PE die size = ~95 mm2@65 nm
VS die size = ~95 mm2@65 nm(The patent says they are interchangable with PE so this means a similar size).

EE3 die size = 2 PEs + Networking > 200 mm2@65 nm
 
Also, even if PS3 CPU managed 1 TFLOP or even 2-3 TFLOPs, it would still only sustain several hundred GFLOPs in realworld, in real applications, in all likelyhood. So the real question is, how many hundreds of GFLOPs will the CPU be able to do / sustain in realworld use.


Dreamcast's SH-4 CPU peaked at 1.4 GFLOPs but could only sustain 900 MFLOPs, and of that, realworld in-game use was no doubt significantly less.
 
...

Is the 72-processor PS3 CPU dead? the one with 8 PPC / POWER cores plus 64 APUs (thus 8 PEs on one die)
There never was one to begin with.

Also, even if PS3 CPU managed 1 TFLOP or even 2-3 TFLOPs,
Do not be concerned, the theoretical peak of EE3 will be well-below 500 GFLOPS.

how many hundreds of GFLOPs will the CPU be able to do / sustain in realworld use.
Depends on the skill of developer. Remember that CELL development environment does not provide any kind of auto-parallelization technology and it is entirely upto the developers to break down their engines into a dozen microprocesses and pipe them to make the most use of EE3.
 
Let's study Xscale as a power requirement example.

600 Mhz : 0.5 Watts
1000 Mha : 1.5 Watts

The power usage trippled, even though the clock rate increated by 66%. There is almost an quadratic relationship between the clock rate and power consumption. If you clock EE3 at 3 Ghz, then it will burn 80~130 Watts just like any other gigahertz CPUs, and worse yet, it has to sustain that clockrate under heavy processing load for a prolonged period.(GT6) Can the EE3 really take that much abuse? I am not sure. Eventually, SCEI might need to permanantly cap the EE3 clockrate to limit the power consumption to 50 watts or less just to prevent the melt-down.

Hold on a second, you're not looking at the voltage increase. An increase in frequency has a linear effect on the wattage, voltage has a quadratic effect.

Also who said that leakage with STI's 45-65 nm processes will be as bad as it was for several 90 nm processes: they are developing the new manufacturing processes with CELL in mind afterall ) and they are investing quite a lot of money in this area as well ).

Eh? There is only so much they can do. And thus far I've only heard of Intel doing well -- created a new transistor(s) -- to deal with leakage.

Deadmeat,

Serial Instruction = 1 FPU OR 1 FXU used out of 4 FPUs and 4 FXUs... they can do tricks like that and more as the Suzuoki patent also described.

I'm not sure about this. CPUs have their temprature vary significantly since tasks will spike in their computational demands and then suddendly calm down, this rapid change in temprature isn't good. To combat this CPUs will be fed "busy work" to keep variations at a minimum. Not all CPUs do this, but I believe the Power 4 does.
 
For everybody's sake, it would be nice if DMGA was required to put "IMO" at the end of everyone of his sentences. That's just a wish, though.

As for what the PS3 CPU will sustain, I'm sure it will be uncharacteristically more percentage-wise than what you would normally expect in an x86 architecture. The configuration of local cache, memory bus, and main main memory is a radical departure from the typical PC architecture, so that opens the door to radically different performance characteristics for the PS3 CPU. I don't know too many x86 Intel CPU's that host 64 MB of EDRAM on-chip, nor software applications designed for that configuration in mind from the start. That, IMO, will make a big difference in effective peak to average GFLOP performance, from what we have observed in existing computer products.
 
Thank you for taking some time and doing what other ECE guys should have done too ( me = lazy :( ).

Well, remember that I only do this to procrastinate, I'm pretty sure I'm the lazy one, not you ;)

Capacitance does increase a bit shrinking to a smaller manufacturing process as you are not only shortening the gate, but the wires that connect transistors.

You tend to create more capacitance as transistors will be closer together and the capacitance effect generated will be higher.
Are you talking about a global capacitance of the IC or individual capacitance of a transistor? I'm only envisioning that if you are shrinking the size of a transistor, the capacitance of that transistor will decrease in like (as described by nondescript), not increase. What's the real story?

I think I can answer this one. I believe Panajev is talking about the increase in capacitance between transistors, the signal lines, that kind of thing. Capacitance does increase when the capacitor elements are closer together. However, capacitance in the transistors themselves is lower with smaller device size (like I said earlier). But I would like to add that it is precisely the capacitance between devices that causes crosstalk (well it is not really that simple, but again, in the first approximation). Since chip designers specifically try to minimize crosstalk, this capacitance increase Panajev is talking about is also minimal.

Reducing resistance, by using low-k dielectrics, can allow the operating voltage to be reduced without reducing transistor switching speed. Lowering the operating voltage proportionally lowers power consumption.

After sleeping on it, I realized what I wrote earlier isn't entirely correct, must have been too tired or something. It the effect is still correct, low-k means lower power consumption, but the physical mechanism isn't right. The "k" is the electric permittivity of a material. As the name suggests, a low-k material allows an electric fleld to pass through it with less loss. This means a weaker signal can survive longer - so the operating voltage can be lower.
 
nondescript said:
Capacitance does increase when the capacitor elements are closer together. However, capacitance in the transistors themselves is lower with smaller device size (like I said earlier). But I would like to add that it is precisely the capacitance between devices that causes crosstalk (well it is not really that simple, but again, in the first approximation). Since chip designers specifically try to minimize crosstalk, this capacitance increase Panajev is talking about is also minimal.

I take it the crosstalk is reduced as the transistor-native capacitances are decreased, but if you go to higher clockrates (since the transistor capacitances are now lower), then the crosstalk (being frequency-based) creeps back onto those smaller capacitances (barring other strategies that may be used to address/reduce crosstalk directly)?
 
....
Gelsinger was heavily involved in the company's 486 chip at the end of the eighties. It began life running at 25MHz but it took three years to get to 50MHz.
"I was proud of that 25MHz," said Gelsinger. "But now, we are adding 25MHz a week. One day, we'll add 25MHz a day."
...

Gelsinger predicted a 30GHz processor will be available...

...
20nm transistor...
...
 
I take it the crosstalk is reduced as the transistor-native capacitances are decreased, but if you go to higher clockrates (since the transistor capacitances are now lower), then the crosstalk (being frequency-based) creeps back onto those smaller capacitances (barring other strategies that may be used to address/reduce crosstalk directly)?

I think crosstalk is independent of clock speed. However, the effects of crosstalk are harder to ignore at higher clock speeds, since timing tolerances are lower.

Remember what crosstalk is: an unwanted signal in a device coming from capacitive coupling (or inductive coupling) with another device.

After a little googling, here's a EEtimes article with more detail about what I'm talking about.

http://www.eetimes.com/in_focus/mixed_signals/OEG20020322S0062
 
Oh well- I was figuring that higher clockrate implied higher frequency radiation which would electrostatically couple to neighboring smaller capacitances better. Alternately, a certain size capacitance will be more vulnerable to a certain frequency range of electromagnetic radiation/conduction.
 
[Capacitance] Not a problem though, add this with their SOI process, Toshiba is now able to omit the capacitor in their 45 nm e-DRAM cell designs.

Yeah! I know. The 1-T DRAM designs (e-DRAM, 1-T, same basic thing as far as I can tell) are very clever. SOI DRAM has always been a problem, and its finally been solved. Non-destructive read too, so power consumption is even lower, no need to refresh as much. The write times are a little slow, I think, but that will improve quickly.

I've been reading some papers on this, I was thinking of putting together a little mini-series for B3D on this stuff, when I have the time...if there's enough interest, I'll find the time to write it up.
 
Back
Top