IBM Power7 Derivative: A Viable Console CPU?

liolio · Apr 15, 2012

Acert93 said:
Xenon v.2 may good excuse for IBM to make a 256bit SIMD unit. And they could later make it part of the PPC spec, just like Altivec

It could thought especially after TUnafish answer I wonder about the odds of "reasonably" wide OoO CPU which would include 2 4wide units. It may be easier in regard to data path, etc.

I won't ask Carmack is POV but may be some members like Sebbbi that deals with CPU too could give us his take on such a design

(tough without detail of the implementation but rough ideas about it).

fehu · Apr 15, 2012

It's the right time for a big.little like design?
Maybe 2 fat power7 class cores with a lot of A2 class ones working as a traditional multicore instead than a cell

3dilettante · Apr 16, 2012

big.LITTLE starts to matter when you have a TDP measured in the single-digit watt range and will be running on battery power for days.

The idea is more restrictive since the it is meant for significant power savings on low-performance and near-idle loads.
With console TDPs still in the hundreds of watts, the general inefficiency of a power supply and voltage regulators specced for that order of magnitude would probably waste more power than a big.Little chip would save.

fehu · Apr 16, 2012

I was thinking a more traditional implementation in which you can have all the cores at max frequency at the same time.
You can't have a 16 cores Power7, but a 16 cores A2 can be good for parallel task but bad for single thread, so I was thinking about an heterogeneous design in which you have some powerful cores to run demanding tasks, and a lot of decent cores to run the remaining.
Considering that all ibm designs at the moment share the same instruction set it's not out of reality, but something like this can be really useful for a developer?

3dilettante · Apr 16, 2012

big.LITTLE is concerned with migrating all work off of the big cores to power-optimized low-power cores when performance is not needed.

The more general case with different cores that run different loads based on their strengths is some form of asymmetric or heterogenous mupltiprocessing.

darkblu · Apr 16, 2012

I'm entirely with Gubbi here. While 47x might seem like a very nice embedded choice, I'm not convinced it would scale up that well, not if the 'should sorta meet the former gen's throughput' part of the order was heavy-weighed (apparently speaking primarily of Xenon here rather than any geometry-culling-busy SPU flocks). Basically, the way I see IBM's options this gen is:
(1) Carry on with some Xenon/PPE iteration, trying to strike some sort of a balanced design, or..
(2) Go P7-based. Viewed from my armchair here, P7 seems to offer a lot for trimming.

fehu said:
just a dumb question
Freescale is a complete different company, or ibm can sell it's power based designs?

A completely different company. Freescale are the former Motorola Semiconductor (of the AIM alliance fame).

3dilettante · Apr 16, 2012

We will have to wait and see.
Taking apart a multibillion transistor chip to make a derivative isn't free either.

The floating point resources would likely need to be redesigned, as well as the memory hiearchy and cache coherence protocol. The front end and scheduling logic would likely face cuts, and this is all assuming IBM even wants its best tech licensed to Microsoft to produce on its own.

It's not a simple task to cut chunks out of a complex core whose components are balanced and verified for that design point. I wouldn't be certain it would any easier to cut a POWER7 down than it would be to speed an A2 type design up.

I'm not saying it's impossible, but it might not be desirable.

anexanhume · Apr 17, 2012

I guess it all comes down to perceived return on investment. I'm going to guess the trade isn't going to show a highly customized CPU with high floating point throughput that is easy to program to isn't going to necessarily translate to more market share in any scenario. No matter how hard the PS4 is to program for, it's still going to get multi-platform games, and those games aren't going to look a lot more fantastic on the xbox just because the hardware is vastly superior because optimization takes time. It will also depend if microsoft gets to own the design (360 GPU) versus has to buy it (original xbox gpu).

I would expect a lightly modified Power7 core as a trade between optimization and development cost.

Gubbi · Apr 17, 2012

anexanhume said:
No matter how hard the PS4 is to program for, it's still going to get multi-platform games, and those games aren't going to look a lot more fantastic on the xbox just because the hardware is vastly superior because optimization takes time

It took multi platform games on the PS3 three years to reach parity with the 360, largely because of added complexities in architecture (split memory, CPU).

Developers do the bulk of their development on workstations, - bog standard PC hardware. The fewer pathological cases and gotchas developers experience when moving to prodution hardware, the better the end result will be.

Cheers

hoho · Apr 17, 2012

Gubbi said:
It took multi platform games on the PS3 three years to reach parity with the 360, largely because of added complexities in architecture (split memory, CPU).

Wasn't the huge difference between CPU and GPU powers in each console a much bigger contribution than memory model or complexity of Cell? I would imagine it's far harder to offload some GPU stuff to Cell than to balance memory usage between pools

MrFox · Apr 18, 2012

(I hope this is not completely off-topic, I didn't know where to post this question)

Comparing the A2(bluegene/Q) and Power7, I noticed the path to pass data between cores looks very different.

The A2 has a massive crossbar switch and any core can access any of the L2 banks. I remember IBM wanted a crossbar on the cell but there wasn't enough silicon area available. On the A2 die it takes up a very large area so I would guess IBM saw a great need for it this time. However on the Power7, the data has to get to the global L3 cache and then can be accessed by another core and there's a truckload of L3. On the surface there seems to be a major advantage in total bandwidth and latency with the A2, so what's the catch? If IBM developed both of these architechtures in paralel, why is the approach so different between the two?

Would one approach benefit a game console more than the other?

EDIT: nevermind, I missed the elephant in the room, the L3 is also in banks, and is even more flexible.

Mobius1aic · May 11, 2012

Does anyone have a link to an A2 spec sheet, or could someone produce everything we know about it's basic specs? It would be appreciated very much

IIRC it has 16 in-order dual issue cores, with I think 4 way SMT per core? Any info on the Altivec/VMX/VSX unit?

Ninjaprime · May 12, 2012

Mobius1aic said:
Does anyone have a link to an A2 spec sheet, or could someone produce everything we know about it's basic specs? It would be appreciated very much

IIRC it has 16 in-order dual issue cores, with I think 4 way SMT per core? Any info on the Altivec/VMX/VSX unit?

Heres a link to the original IBM presentation: https://www.power.org/events/2010_ISSCC/Wire_Speed_Presentation_5.5_-_Final4.pdf

IBM Power7 Derivative: A Viable Console CPU?

liolio

Aquoiboniste

fehu

3dilettante

fehu

3dilettante

darkblu

3dilettante

anexanhume

Gubbi

hoho

MrFox

Deludedly Fantastic

Mobius1aic

Quo vadis?

Ninjaprime

Similar threads