Name: David Wang 4/27/05
In the ISSCC 2005 article about the CELL processor, the die size of the processor was reported to be 221 mm2. It was thus interesting to see that the Microprocessor Reports article on the CELL processor states that IBM plans to ship the CELL processor with a die size of 235 mm2 [1]. In the MPR article, the die size differential wasn't explained, but it has since emerged that IBM went back and re-engineered the PPE, and the PPE (and the CELL processor as a whole) grew bigger.
In the die photo of the CELL processor released at ISSCC, the width of the PPE and the "512K L2" cache has the same width, but the PPE shares that width with the self test unit[2].
In the latest die photos, the PPE and the "512K L2" still share the same width, but the test unit has been moved to shared the same width with the L2 cache block rather than the PPE. Basically, the PPE grew by 2X of the width of the test block in two different photos.
I noticed this interesting issue about a week ago, since I have high resolution photos of not only the new die photo, but also of the prototype discussed at ISSCC 2005. I've been working on it off and on, but it seems that I've been "scooped". A separate discussion has been going on at Beyond3D in looking at the growth of the PPE and speculating as to why IBM went back and re-engineered the PPE, and what IBM did[3].
A separate issue is that the "512K L2" in the CELL processor is significantly larger than the 512K L2 in the PPC970FX processor, to the tune of 2X larger. It looks like that the data and tag arrays are about the same size, but the block labelled as "512K L2" in the CELL processor has a lot of other structures in it, and I was working to figure out what they are. Collectively, the die photo analysis would have been a third article in the series here, but since the cat's out of the bag and I don't have that much time, that third article looks less likely by the minute. Regardless, the fact that IBM appears to have significantly re-engineered the PPE (sort of explains the reluctance wrt the discussions about the PPE at ISSCC) is an interesting tidbit deserving of some discussion.
[1] IBM PDF
[2] IBM Research
[3] Beyond3D discussion
psurge said:See page 3-4 of this PDF for some PPE details.
The multithreading design supports fine-grained multithreading with round-robin thread scheduling. If both threads are active, the processor will fetch an instruction from each thread in turn. When one thread cannot issue a new instruction or is not active, the other active thread will be allowed to issue an instruction every cycle.
The wording is confusing (fetch != issue) - at first glance this seems to imply the PPE is single issue.
Anyway, realworldtech has thread on this very same topic here Hopefully David Wang will write up his thoughts on the die photos...
aaaaa00 said:I built a DShow filtergraph outputting a SD MPEG2 to the Null renderer (to remove the video card from the equation), and on this old 1.4 ghz P4 400 mhz FSB (it's an engineering sample I got from Intel many years ago), it consumed about 20% of the CPU. That's with all the overhead of DirectShow/Kernel Streaming/etc.
psurge said:Gubbi - the only reason I can think of for single fetch/dual issue is this :
there are enough pipeline bubbles from high instruction latency and issue restrictions to make IPC above 1 extremely rare (even with 100% cache-hits), making dual fetch pointless (in the average case).
Dual issue could still bring up IPC by enough to make sense:
IPC could exceed 1 for short bursts, and independent producers could issue in parallel, providing results to consumers in less time...
Tacitblue said:Seems David himself is puzzled by the changes between the 2 variants. Here was his question from an email he sent in correspondance.
David Wang quote:
"Let me know if you figure out why some of the
execution units look to have been "flipped", as mirror
images of each other between DD1 and DD2. It's
interesting and puzzling at the same time."
Archetecturally there's differences, the functionality of those differences is still open to debate, we might have to wait for the next chip conference when IBM decides to let more leak out about it. There's not much to be gained by looking at G5's and the like because of the differences in a fat core Power core implementation and this one. I'm not even sure these differences can be put down to just a single or dual issue situation all by themselves.
version said:amazing
Jaws said:version said:amazing
What?
version said:Jaws said:version said:amazing
What?
cell
Jaws said:version said:Jaws said:version said:amazing
What?
cell
yes