Discuss: Is a CPU, derived from IBM's POWER7 architecture, viable for consoles?
Sources:
Wikipedia
Ars
Anand
Information Week
Some quick facts:
The Rumor: A recent unsubstantiated rumor suggested Microsoft's third Xbox edition (code name "Durango") will use an IBM processor with 16 "cores."
The size and power requirements for a Power7 chip, as they current stand, are far and away outside the design limitations of a console. Considering the Xbox 360 and PS3 had a total power draw in the low 200W range a Power7 chip far exceeds the budgets for a console CPU. Furthermore the silicon budget of an 8 core (32 thread) Power7 chip is equal to, or greater than, the total silicon budget of both past generation consoles.
Making POWER7 work for Consoles?: If Microsoft has decided on a POWER7 derivative, what would they need to do to fit it into a console in 2013? Some thoughts...
First in regards to getting the die size within console budgets:
Moving on to power:
Let's say you work for IBM and are trying to sell Microsoft on Power7 for a 2013 console. Your spec sheet looks something like this for producubg the following in 2013:
Question #1: Is this even remotely possible? Is this far too optimistic or a roughly accurate ball park for what IBM could fit within that silicon/power budget?
Question #2: Would this make a good console CPU?
Question #3: What would you reduce? Frequency, L3, memory controller, execution units, etc? What execution units and why?
Question #4: What would you add? VMX128 support? At what cost?
Question #5: To my knowledge IBM only sells Power7 chips in complete server packages for tens of thousands of dollars for the low end. Would IBM even be interested in creating a console variant of POWER7?
Question #6: How is the POWER7's real code performance compared to an AMD Bulldozer core? Per-mm^2? Per-Watt?
Question #7: Does IBM have a better CPU architecture/solution that can be used in the 1500mm^2 / 60W range? (Preferrably something that is already in that range or can be scaled DOWN... just scaling chips up, especially the idea of throwing 16 single cores on a chip as if that "just works" is a non-starter. If you don't know why "just" throwing 16 Xenon cores on a die and calling it a day is a non-starter please skip this question. I want to know what other many-core architectures IBM has actively discussed that may work, not theoretical new designs connected with fanboy duct tape.)
Question #8: How does this theoretical POWER7 design compare against a 2 module / 4 int. core / 480 SP AMD APU at 3.0GHz?
Question #9: As a developer, thinking of the 5-7 year window of console development, would you prefer 4 cores/16 threads in a robust CPU (IBM design) or the shift of budgets to a 2m/4c AMD design but with on-die Shader Array? Why?
Question #10. Would this IBM design need a beefed up vector unit or is the real world performance/thoroughput on POWER7 chips more than sufficient?
Question #11. Thinking in console contexts, if you could change one thing about POWER7, what would it be?
Question #12. Does a POWER7 design indicate a split memory design?
Question #13. Would TurboCore be a feature valuable to consoles? e.g. For Arcade games that may be single threaded?
Sources:
Wikipedia
Ars
Anand
Information Week
Some quick facts:
- Launched in 2010 on 45nm SOI
- 567mm^2, 1.2B transistors
- 3.0GHz to 4.14GHz
- 33 GFLOPs (peak) per core at 4.14GHz
- 100 to 170W TDP; IBM has fit both 4 core and 8 core Power7 variants operating at 3.0GHz, the BladeCenter PS700 and PS701 respectively, into a single Blade Slot.
- 4, 6, and 8 Core Variants (EDIT: Possible correction, almost 250W for 8 cores at 4.14GHz)
- 4 SMT Threads per Core (Power6 was 2 way SMT per core)
- 32+32 KB L1 Cache per Core (Power7: 2 cycles latency; Power6: 4 cycles)
- 256 KB L2 Cache per Core (Power7: 8 cycles latency; Power6: 26 cycles)
- 4MB L3 Cache (eDRAM) per Core; up to 32MB per Chip
- 12 execution units per core (2 fixed-point units; 2 load/store units; 4 double-precision floating-point units; 1 vector unit supporting VSX (AltiVec); 1 decimal floating-point unit; 1 branch unit; 1 condition register unit)
- Aggressive OOOe. per IBM via Wiki, ""Each POWER7 processor core implements aggressive out-of-order (OoO) instruction execution to drive high efficiency in the use of available execution paths. The POWER7 processor has an Instruction Sequence Unit that is capable of dispatching up to six instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to the Instruction Execution units. The POWER7 processor has a set of twelve execution units as [described above"
- TurboCore: Half of the cores can be disabled so frequency is ramped up for remaining cores; remaining cores have full access to all of the chip cache and full memory controller
- Although the Power7 architecture operates at lower frequencies than Power6 a Power7 core is "up to twice" the performance of a Power6 core
- POWER7 features two DDR3 memory controllers that can do up to 100GB/s
The Rumor: A recent unsubstantiated rumor suggested Microsoft's third Xbox edition (code name "Durango") will use an IBM processor with 16 "cores."
The size and power requirements for a Power7 chip, as they current stand, are far and away outside the design limitations of a console. Considering the Xbox 360 and PS3 had a total power draw in the low 200W range a Power7 chip far exceeds the budgets for a console CPU. Furthermore the silicon budget of an 8 core (32 thread) Power7 chip is equal to, or greater than, the total silicon budget of both past generation consoles.
Making POWER7 work for Consoles?: If Microsoft has decided on a POWER7 derivative, what would they need to do to fit it into a console in 2013? Some thoughts...
First in regards to getting the die size within console budgets:
- 4 cores (16 threads). This should cut the die size in nearly half from 567mm^2 to just under 300mm^2 on 45nm. Still too large for a console.
- Migration to 32nm (or 28nm). iirc IBM has been working with Global Foundries on 32nm. This could see the total die size reduce by 30-50%, depending on the memory controller and how dense the logic can go. As caches scale better than logic there is the potential for a 32nm variant scaling closer to 50% size reduction.
- Elimination of some under-utilized (for game code) execution units.
- Memory controller re-design. POWER7 has a (max) 100GB/s on two DDR3 memory controllers. IBM used a shared controller on the Xbox 360 with the GPU; minimally it would seem a 4 core variant would only need 1 memory controller.
- Reduction in L3 cache size; e.g. a move from 4MB per core to 2MB per core (16MB down to 8MB). This will impact performance, and eDRAM is both fairly small and fairly power efficient, but it may be determined a fair sacrifice to reduce area budgets and not be a significant impact to console game code.
Moving on to power:
- With the reduction in (a) cores from 8 to 4 and (b) migration to the 32nm node there should be a significant drop in power usage. A move to 32nm should provide a 30-40% power efficiency per transistor. Assuming a 3.0GHz, 4 core POWER7 chip is 100W (unconfirmed, but it is the low end of the range) on the 45nm process a 32nm variant could come in as low as 60-70W.
- Reduction in frequency. POWER7 is much, much faster than POWER6 per clock. Further sacrificing frequency for a lower voltage design may be possible while keeping performance in an acceptible range.
- Reduction in Execution Units, Features. As a server oriented chip the POWER7 has a number of features that may be expendible in a console environment.
- Reduction in eDRAM. While eDRAM requires much less power than SRAM by cutting eDRAM in half (from 16MB ro 8MB for a 4 core design) there could be some additional power savings.
- Interposer. I know it is all the rage but the power required to power the traces from a CPU to memory are significant. IBM has been developing 4 chip POWER7 interposer designs (up to 32 cores, 128 threads). The chances of such are slim to none for an interposer for the CPU/Memory.
Let's say you work for IBM and are trying to sell Microsoft on Power7 for a 2013 console. Your spec sheet looks something like this for producubg the following in 2013:
- POWER7 derivative
- About 150mm^2 on 32nm
- 2.8-3.2GHz, 60W TDP
- 4 Cores, 16 SMT threads
- 32+32 KB L1, 256KB L2, 8MB L3 eDRAM (2MB per core)
Question #1: Is this even remotely possible? Is this far too optimistic or a roughly accurate ball park for what IBM could fit within that silicon/power budget?
Question #2: Would this make a good console CPU?
Question #3: What would you reduce? Frequency, L3, memory controller, execution units, etc? What execution units and why?
Question #4: What would you add? VMX128 support? At what cost?
Question #5: To my knowledge IBM only sells Power7 chips in complete server packages for tens of thousands of dollars for the low end. Would IBM even be interested in creating a console variant of POWER7?
Question #6: How is the POWER7's real code performance compared to an AMD Bulldozer core? Per-mm^2? Per-Watt?
Question #7: Does IBM have a better CPU architecture/solution that can be used in the 1500mm^2 / 60W range? (Preferrably something that is already in that range or can be scaled DOWN... just scaling chips up, especially the idea of throwing 16 single cores on a chip as if that "just works" is a non-starter. If you don't know why "just" throwing 16 Xenon cores on a die and calling it a day is a non-starter please skip this question. I want to know what other many-core architectures IBM has actively discussed that may work, not theoretical new designs connected with fanboy duct tape.)
Question #8: How does this theoretical POWER7 design compare against a 2 module / 4 int. core / 480 SP AMD APU at 3.0GHz?
Question #9: As a developer, thinking of the 5-7 year window of console development, would you prefer 4 cores/16 threads in a robust CPU (IBM design) or the shift of budgets to a 2m/4c AMD design but with on-die Shader Array? Why?
Question #10. Would this IBM design need a beefed up vector unit or is the real world performance/thoroughput on POWER7 chips more than sufficient?
Question #11. Thinking in console contexts, if you could change one thing about POWER7, what would it be?
Question #12. Does a POWER7 design indicate a split memory design?
Question #13. Would TurboCore be a feature valuable to consoles? e.g. For Arcade games that may be single threaded?