DeadmeatGA
Banned
Kutaragi's presentation PDF has given me a lot of thoughts about the true nature of CELL architecture, and has made me to reconsider some of my previous estimations based on the Suzuoki patent application.
All my estimations were based on the Suzuki patent application, which specified that APUs be "preferably rated at 32 GFLOPS", which made everyone jump and scream "4 Ghz!!!". I did my estimation around this clockspeed, expecting the pipeline to be stretched beyond 20 states and consume something like 200 Watts to support such a high-clock speed, which in turn led me to conclude that SCEI could include no more than 2 PE cores even at the massive die size of 250 mm2, but still unable to reach 4 Ghz at consumer level application, more like 2 Ghz to be realistic.
What changed my mind was Kutaragi's presentation of 2 Teraflop rack; Kutaragi intended to reach 2 teraflops by putting 64 chips on a rack, and you CANNOT POSSIBLY PUT 64 CHIPS EACH BURNING 200 watts on a rack!!!(That would be 13 kw per rack my friend).
Let us go back and rethink the original motivation behind IBM's cellular architecture; it was an attempt to extract big compute power by summing up lots of simple and inexpensive processors running at a relatively slow clockspeed and burning little power. Blue Gene/L, from which CELL is based on, in fact uses a couple of PPC440s packed into one die. Now, why would Kutaragi suddenly decide to go against the very philosophy behind cellular computing and build his dream chip around a massive hyperpipelined processor burning 200 watts to clock at 4 Ghz?? It didn't make sense.
Now, if we were to understand that Suzuoki's APU GFLOPS rating was a "preference", in other word a long term goal, and not the rating of what's to come next year, then everything start falling in place. CELL, like BlueGene/L, would be built around something simple like PPC440, to which 8 recycled EE2-style VUs are attached. Such PE won't take up hundreds of millions of transistor to build because they are very simple; I expect such PE to be no larger than PSX2OAC. You can in fact cram 4 of these on a die @ 65 nm. Such a device should be able to peak around 1 Ghz at leasonable power consumption of say, 10~15 watts. The peak flops rating will be around 250 GFLOPS per chip. A very respectable number indeed.
So which sounds more realistic to you, a 20+ stage hyperpipelined processor chip burning 200 Watts at 4 Ghz, or a 7 stage pipelined processor chip burning 10~15 watts @ 1 Ghz???
All my estimations were based on the Suzuki patent application, which specified that APUs be "preferably rated at 32 GFLOPS", which made everyone jump and scream "4 Ghz!!!". I did my estimation around this clockspeed, expecting the pipeline to be stretched beyond 20 states and consume something like 200 Watts to support such a high-clock speed, which in turn led me to conclude that SCEI could include no more than 2 PE cores even at the massive die size of 250 mm2, but still unable to reach 4 Ghz at consumer level application, more like 2 Ghz to be realistic.
What changed my mind was Kutaragi's presentation of 2 Teraflop rack; Kutaragi intended to reach 2 teraflops by putting 64 chips on a rack, and you CANNOT POSSIBLY PUT 64 CHIPS EACH BURNING 200 watts on a rack!!!(That would be 13 kw per rack my friend).
Let us go back and rethink the original motivation behind IBM's cellular architecture; it was an attempt to extract big compute power by summing up lots of simple and inexpensive processors running at a relatively slow clockspeed and burning little power. Blue Gene/L, from which CELL is based on, in fact uses a couple of PPC440s packed into one die. Now, why would Kutaragi suddenly decide to go against the very philosophy behind cellular computing and build his dream chip around a massive hyperpipelined processor burning 200 watts to clock at 4 Ghz?? It didn't make sense.
Now, if we were to understand that Suzuoki's APU GFLOPS rating was a "preference", in other word a long term goal, and not the rating of what's to come next year, then everything start falling in place. CELL, like BlueGene/L, would be built around something simple like PPC440, to which 8 recycled EE2-style VUs are attached. Such PE won't take up hundreds of millions of transistor to build because they are very simple; I expect such PE to be no larger than PSX2OAC. You can in fact cram 4 of these on a die @ 65 nm. Such a device should be able to peak around 1 Ghz at leasonable power consumption of say, 10~15 watts. The peak flops rating will be around 250 GFLOPS per chip. A very respectable number indeed.
So which sounds more realistic to you, a 20+ stage hyperpipelined processor chip burning 200 Watts at 4 Ghz, or a 7 stage pipelined processor chip burning 10~15 watts @ 1 Ghz???