If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Artist formerly known as Acert93
Join Date: Dec 2004
Location: Seattle
Posts: 7,704
|
Discuss: Is a CPU, derived from IBM's POWER7 architecture, viable for consoles?
Sources: Wikipedia Ars Anand Information Week Some quick facts:
The Rumor: A recent unsubstantiated rumor suggested Microsoft's third Xbox edition (code name "Durango") will use an IBM processor with 16 "cores." The size and power requirements for a Power7 chip, as they current stand, are far and away outside the design limitations of a console. Considering the Xbox 360 and PS3 had a total power draw in the low 200W range a Power7 chip far exceeds the budgets for a console CPU. Furthermore the silicon budget of an 8 core (32 thread) Power7 chip is equal to, or greater than, the total silicon budget of both past generation consoles. Making POWER7 work for Consoles?: If Microsoft has decided on a POWER7 derivative, what would they need to do to fit it into a console in 2013? Some thoughts... First in regards to getting the die size within console budgets:
Moving on to power:
Let's say you work for IBM and are trying to sell Microsoft on Power7 for a 2013 console. Your spec sheet looks something like this for producubg the following in 2013:
Question #1: Is this even remotely possible? Is this far too optimistic or a roughly accurate ball park for what IBM could fit within that silicon/power budget? Question #2: Would this make a good console CPU? Question #3: What would you reduce? Frequency, L3, memory controller, execution units, etc? What execution units and why? Question #4: What would you add? VMX128 support? At what cost? Question #5: To my knowledge IBM only sells Power7 chips in complete server packages for tens of thousands of dollars for the low end. Would IBM even be interested in creating a console variant of POWER7? Question #6: How is the POWER7's real code performance compared to an AMD Bulldozer core? Per-mm^2? Per-Watt? Question #7: Does IBM have a better CPU architecture/solution that can be used in the 1500mm^2 / 60W range? (Preferrably something that is already in that range or can be scaled DOWN... just scaling chips up, especially the idea of throwing 16 single cores on a chip as if that "just works" is a non-starter. If you don't know why "just" throwing 16 Xenon cores on a die and calling it a day is a non-starter please skip this question. I want to know what other many-core architectures IBM has actively discussed that may work, not theoretical new designs connected with fanboy duct tape.) Question #8: How does this theoretical POWER7 design compare against a 2 module / 4 int. core / 480 SP AMD APU at 3.0GHz? Question #9: As a developer, thinking of the 5-7 year window of console development, would you prefer 4 cores/16 threads in a robust CPU (IBM design) or the shift of budgets to a 2m/4c AMD design but with on-die Shader Array? Why? Question #10. Would this IBM design need a beefed up vector unit or is the real world performance/thoroughput on POWER7 chips more than sufficient? Question #11. Thinking in console contexts, if you could change one thing about POWER7, what would it be? Question #12. Does a POWER7 design indicate a split memory design? Question #13. Would TurboCore be a feature valuable to consoles? e.g. For Arcade games that may be single threaded?
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate |
|
|
|
|
|
#2 | |
|
Member
Join Date: Sep 2011
Posts: 132
|
Quote:
Although I don't think it would be a good choice for the PS4/Next Xbox for the reasons outlined: http://semiaccurate.com/forums/showp...&postcount=171 The 476FP was launched in 2009 though, so Microsoft/Sony would probably be considering its successor and whatever improvements that makes to the design. IBM is an APU and SoC innovator as well, there's no reason you couldn't mix Power7 CPU cores and GCN(+) shader arrays on a future console APU. |
|
|
|
|
|
|
#3 | |||
|
Artist formerly known as Acert93
Join Date: Dec 2004
Location: Seattle
Posts: 7,704
|
This post is appropriate for over here. I added Ninja's link to RealWorldTech about the TDP of a 8 core / 4.14GHz POWER7 being over 240W.
Quote:
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate |
|||
|
|
|
|
|
#4 |
|
Invisible Member
Join Date: Apr 2002
Location: La-la land
Posts: 5,030
|
It would of course need a much-beefed-up FPU, since even the 8-core, 4.4GHz version doesn't even approach the now over half-decade old Cell processor. Cutting this CPU in half for a 4-core version and downclocking to save power, and it's no faster at floating point calculations than the even older (from an on-the-market point-of-view), IBM-developed Xenon CPU from the 360.
That'd be rather anticlimactic I would think, and wouldn't please developers very much. They have a reasonable expectation of power increase, not the opposite (even if that power ought to be considerably easier to tap compared to current console hardware.)
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)." -Phil Plait |
|
|
|
|
|
#5 | |
|
Artist formerly known as Acert93
Join Date: Dec 2004
Location: Seattle
Posts: 7,704
|
Quote:
A lot of architectural issues impact utilization. As for the POWER7, just look at something like Intel's i5 (4 core, 8 thread) which just trashes Cell in almost any application. Sure, there are specific segments of code that run better on Cell than said i5, but I don't think you will even find the most ardent Cell supports who would say that, given a choice of "what is faster for game code?" would pick the PS3 Cell -- higher peak FLOPs and all -- over said i5. Take a peak at the NV/AMD architectures prior to GCN where the FLOPs a GPU didn't dictate which performed better on real code. We saw the same situation with Cell versus Xenon; not every problem played to Cell's strengths (more cores, fast local memory, SIMD). It wasn't simply an issue of lazy developers or not enough time to extract performance but not all solutions map well to an architecture--this is why, afterall, we have discreet chips for graphics (GPU) and another discrete chip for general purpose code (CPU). All that to say that Power7 per core is a LOT faster than Xenon (Waternoose). Having a ton of eDRAM is going to avoid a lot of 600+ cycle penalties for a cache miss and the L2 (8 cycles) is very fast. Latencies and penalties were a big draw back in Xenon. So mitigating many of these by bigger eDRAM to avoid cache hits and calls to system memory, diminishing penalties, and improving L2 performance are all architectural changes that make a big improvement. That is not to mention the fact POWER7 is OOOe with more execution units and more threads per core (4) to hide stalls. The links I posted in the OP actually have information from IBM comparing the Power6 architecture and showing how, even though it had a higher frequency, architectural issues (e.g. a longer pipeline) lead to significantly less performance. FLOPs are no different than Frequency. Most, by now, understand frequency alone does not determine performance. Peak FLOPs is the same as it won't tell you what is a faster/better processor for game code. And ... what if ... a developer had code that was embarrassingly parallel and mapped well to SIMD? Sure, a sea of SPEs would be nice for those situations but it is going to be very fast on a Power7 (or i5) also but if a developer was demanding a huge performance leap you would think at that point the code would be shuffled to the GPU as a compute task as that sort of problem will many times work well there. Chances are an embarrassingly parallel problem that maps well to SIMD that requires significant resources is actually a graphics problem anyways
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate |
|
|
|
|
|
|
#6 |
|
Senior Member
Join Date: Jan 2012
Location: Leicestershire - England
Posts: 1,470
|
ACERT; Fantastic intro..thats gotta be the most detailed start off ive seen.good stuff!..
Personally i think you have hit the nail on the head with that...the only thing i would question is like Grall says FPU...you would expect an upgraded VMX 256 or something that could fit into the budget....that processor 4 x OoOe with 4x SMT...8mb cache + VMX 256 on 32nm...yes i think thats certainly possible....and boy would that be awesome for games! If they could find a way to get a Tahiti Pro..+ 4gb ram in there...well we would all be laughing! Edit; Question, im not too hot on these things, so would the 4x SMT apply to FPU instructions as well?? or does that count as 1 VMX thread per core..ie seperate from integer? |
|
|
|
|
|
#7 |
|
Invisible Member
Join Date: Apr 2002
Location: La-la land
Posts: 5,030
|
I'd like to see dual 256-bit VMX units per core. It would be really interesting to see what talented developers could do with some truly astounding, easy-to-use float performance. Shoving off work to the GPU is all well and good for some tasks perhaps, but it takes away a lot of rendering performance. Time spent doing calculations for...whatever, is time not spent drawing stuff that goes on the screen.
And if it's one thing history has shown us since the era of 3D graphics consoles began, it's that persistent 60Hz screen updates in every game is NOT something we'll see the next generation. So I don't want that GPU spending time on anything else other than actually drawing graphics, if it is at all possible to avoid it....
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)." -Phil Plait |
|
|
|
|
|
#8 | |
|
Member
Join Date: Jun 2008
Posts: 335
|
Quote:
|
|
|
|
|
|
|
#9 | |
|
Artist formerly known as Acert93
Join Date: Dec 2004
Location: Seattle
Posts: 7,704
|
Quote:
POWER7 is a max 33.12 GFLOPS per core at 4.14GHz (8GFLOPs/GHz) http://en.wikipedia.org/wiki/POWER7#Specifications Cell SPEs are a max of 25.6 GFLOPs at 3.2GHz (8GFLOPs/GHz) http://en.wikipedia.org/wiki/Cell_Pr...ents_.28SPE.29 Even assuming 1 disabled SPE and 1 reserved, 6 SPEs + 1 PPE (another 25.6GFLOPs) is 179.2GFLOPs for Cell. Seeing as there is no way we will see an 8 core POWER7, let alone one at 4.14GHz, I think it is safe to say Cell's peak is better than what we would find in a console (e.g. a 3.2GHz 4 core, 102.4 GFLOPs). Even an 8 core 2 4.14GHz is "only" 265GFLOPs and once you apply the criteria (a) 1 core disabled and (b) 1 core reserved for the OS it drops down to 199GFLOPs. And of course Cell variants went up to 4.0GHz so there you go, if you are taking a top end Cell versus a top end POWER7, adding similar restrictions, Cell wins in peak flops. But I agree that in most situations the POWER7 is going to be a lot faster and those problems that mapped well to Cell to hit peak rates would seem to be good candidates in general to move to the GPU.
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate |
|
|
|
|
|
|
#10 |
|
Senior Member
Join Date: Feb 2002
Location: San Francisco, CA
Posts: 1,571
|
Why not include SPE's on the chip itself? A cut down 4 core Power 7 with 4
SPE's per core is something I've been drooling at for some time now. The chip would still be Power7, but also a refined CELL where it provides the best of both worlds. It would be a monster of a CPU and would bring so much power to the table. It's not like it would be unworkable and devs wouldn't be able to get the hang of it, just might take a while. Of course having 16 SPE's might be overkill, maybe 3 per core would be better in terms of transistor count and manageability. But still, I guess the CPU going into Wii U is either based off of Power7 or A2. If it's Power7 based then cool, I look forward to seeing how it will compete with the 360 and PS3 in terms of programming and all that. The rumors from last year stated 4 core with edram, so 4 MB per core is great, but seems overkill for a console. The 2 MB you suggested before sounds good. |
|
|
|
|
|
#11 | |
|
Member
Join Date: Jun 2008
Posts: 335
|
Quote:
|
|
|
|
|
|
|
#12 | ||
|
B3D Shockwave Rider
Join Date: Feb 2002
Posts: 1,810
|
Quote:
I'm not sure Tim Sweeney would desire VMX units. Quote:
__________________
When God plays an online shooter he plays Shadowrun. He buys resurrection first round and selects Dwarf. www.shadowrunshow.com |
||
|
|
|
|
|
#13 |
|
Invisible Member
Join Date: Apr 2002
Location: La-la land
Posts: 5,030
|
Well, whatever you wanna do to reach the neccessary goal, as long as it does not involve dumping CPU processing on the GPU. There should be no particular bias towards any one particular technology/implementation; it's the end result that matters. If CPUs need scatter/gather; give it to them. And so on.
Then again, I'm not sure I'd listen all that much to Tim Sweeney's predictions of the future; the guy's very good at what he's actually doing (UE3 is the most flexible, powerful and technically impressive 3D engine out there), but his soothsaying powers have proven to be fairly weaksauce.
__________________
"If I were a science teacher and a student said the Universe is 6000 years old, I would mark that answer as wrong (why? Because it is)." -Phil Plait |
|
|
|
|
|
#14 |
|
B3D Scallywag
|
How do they get 100GB/s out of 2 DRR3 memory controller? Unless each controller is quad channel?
FLOPS wise assuming it has the same throughput per core/clock as Sandybridge a 4 core 3.2 Ghz version would come in at 204.8 GFLOPS. It would probably use at most a single memory controller as well. Drop off a load of L3 and maybe you're getting something approaching usable in a console (although probably still not a great choice).
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB |
|
|
|
|
|
#15 | |
|
Senior Member
|
Quote:
Power 7 would in my opinion be a relatively bad CPU to pick as a basis for a console CPU. You'd get much nicer thing by taking an already relatively simple design and add console-specific stuff to it (wide SIMD) than taking a huge behemoth with enormous amounts of resources put to improving mainframe-style workloads (huge internal and external buses, lots of wiring to get energy around, ...) and cut it down to something usable. Even then the P7 will need a SIMD unit added to it as I don't think it really has something good enough for consoles. |
|
|
|
|
|
|
#16 | |
|
Quo vadis?
Join Date: Oct 2007
Location: Texas, USA
Posts: 1,338
|
Quote:
Assuming VSX in Power7 is 128........ With the rich and storied history of PowerPC processors the past decade, there are a number of hypothetical candidates for the Wii U CPU. How about a quad core Power5 with expanded L2 cache, VMX 128 or VSX 256, and GDDR5? IIRC Power5 is a dual issue OoO architecture. I assume a quad version could be approached in a similar manner to Xenon but devs wouldn't have to worry about the anemic L2 cache and in-order processing related problems. It would be a bit limiting compared to today's best solutions, but it would be very familiar territory for current developers, with hugely expanded real world usable GFLOPS. Even in quad configuration with 4 MB L2 cache and increased vector processing capability, it would probably come in under 150 mm². Power5+ @ 90 nm was 243 mm². 32 nm would bring that under 100 mm² easy, hence my assumption for under 150 mm² for improved quad. Lastly I would ramp the clocks up to 3.2 GHz for parity with the other systems. It would be pricey to develop a new processor, or even a current one with "bolted on" features. A quad core Power7 on 32 nm with 256 bit VSX and memory controllers adapted to run GDDR5 makes sense to me if the power and TDW can be brought down. It's clock efficiency, 4 threads per core and brilliant integer performance would be good for the current crop of developers who are used to such wide cores on PC and sick of the narrow in-order ones on the 360 and PS3. Last edited by Mobius1aic; 09-Apr-2012 at 18:47. |
|
|
|
|
|
|
#17 |
|
Member
Join Date: Aug 2011
Posts: 370
|
Power7 contains a lot of stuff that is completely useless in a console. The core is balanced for single element double precision throughput, with 4 individual double precision execution units (and even one decimal FPU!). This is essentially completely wasted in a console. Any power7 cpu cut down to console use cases would no longer resemble a power7 cpu very much.
Also, the Power7 line is not designed to be modular and embeddable. In that way, it's no worse than any previous IBM cpu. It's just that in this generation IBM does have a cpu designed to be modular and embeddable. I am, of course, talking of the PowerPC 470S. It's floating point unit is designed to be swappable, so you can switch out the double precision one for anything from the VMX line you fancy. It's bus design is built so that it can work as a part of a cache-coherent whole with parts not built by IBM, so all the game dev gods get what they want. It's a very energy and die space efficient design, so it produces admirable performance while leaving most of the design TDP and space for the GPU. And while it's single-threaded performance is nothing approaching a Power7, it would still be a huge, huge improvement over the present gen, especially in the worst-case situations. Also, the 1.6GHz is not the absolute maximum the design can stand, it's just the frequency IBM decided to pimp it out as a power-efficient embedded CPU. Give it a modern process, and just a tiny bit of more power budget, and we are talking frequencies that near the "magical" 3GHz barrier last gen shipped at. With a 4 issue CPU (compared to the 2-issue ones last gen), and enough OOOe resources that it shouldn't hopelessly stall on every L1 and L2 miss. I really, honestly think that 470S and it's successors are not just the best available options, but, considering all design constraints, really very close to being the best possible options. I'm really hoping that the "16 cores" leak means that MS is shipping with a full 470S solution. |
|
|
|
|
|
#18 |
|
B3D Scallywag
|
Thanks tunafish, that pretty much puts the Power 7 theories to bed. So we could actually be looking at a genuine 16 core CPU using customised 470S cores.
Any idea how 16 stock 470S's would perform vs say a quad Sandybridge with hyperthreading? I assume the 470 is single threaded so still comes in at twice the threads of a quad Sandybridge with HT?
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB |
|
|
|
|
|
#19 |
|
Member
Join Date: Nov 2006
Location: Somewhere over the ocean
Posts: 634
|
Isn't 470 32bit only?
how it compare to ppc a2? |
|
|
|
|
|
#20 | |||
|
Member
Join Date: Aug 2011
Posts: 370
|
Quote:
Quote:
Other than that, I'd expect really nice IPC. Short pipeline, 32 instructions wide instruction window (not really, it's actually 8*4 wide instruction window, which is not quite as good), and 2-cycle access to a 32kB L1i cache (so twice as large per thread as SNB or Xenon), should be enough to mask all L1 accesses, and get some real work done during L2 ones. No, and I have no idea how this one started. There hasn't been a new 32-bit power chip for quite some time -- the 470 is 64-bit, with 42 bits of real address space and 49 bits of virtual address space. (I really, really hope they allow putting tags into the upper 16 bits of the pointers. That is death for forward compatibility on the pc, so I can see why it's disallowed there, but why not for consoles?) Quote:
|
|||
|
|
|
|
|
#21 |
|
Member
Join Date: Sep 2011
Posts: 132
|
The PowerPC 470 series is 32bit:
http://www-03.ibm.com/technology/logic/powerpc.html |
|
|
|
|
|
#22 |
|
French frog
Join Date: Jun 2005
Location: France
Posts: 4,172
|
As you are here, I may ask you your pov on something.
You may have read and take part to the old "next generation CPU will they go back to OoO execution etc." thread which I couldn't find after multiple researches. It seems that pretty much everybody agrees now that OoO execution should be part of the next generation CPU. I'm straying away a bit from this thread topic but I wonder if throughput is still a relevant design goal for next generation CPU? What is your opinion on the matter? From your posts I would assert that you think that big OoO cores akin to Intel one are the way to go but I wonder about how a more (fp) throughput oriented CPUs would be perceived by the one with actual knowledge on those things. Lately I wondered about the relevance for a pretty "big" cpu cores to feed more than one SIMD units. I figurred that it could have benefit especially with a chip supporting 4 way SMT. Basically it would be like bulldozer in the concept sharing the cost of the front end , OoO engine, etc. not on multiple "cores" but SIMD. My idea is that it may be easier to feed a 2 SIMD units than a bigger ones (load and stores on the 2 units are unlikely to happen at the same time, it could though) and that it could be overall more efficient than having a SIMD unit twice as big (both are not exclusive through). Is that a complete misunderstanding ? If not do you think it could be something desirable for a next gen CPU?
__________________
What's trying to be a bunch of presentations PS360 youtube channel Sebbbi about virtual texturing Tuned EADGCF and liking it :) |
|
|
|
|
|
#23 | |
|
Artist formerly known as Acert93
Join Date: Dec 2004
Location: Seattle
Posts: 7,704
|
Quote:
EDIT: Looks like the discussion and links got ahead of me. At least I outlined my notes :P
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate |
|
|
|
|
|
|
#24 | ||||
|
Member
Join Date: Aug 2011
Posts: 370
|
Quote:
(And now I have no idea whatsoever what they mean when they claim 49 bits of virtual address space. Are they counting process tags or something?) Quote:
Quote:
Quote:
Then again, I don't actually develop low-level game engine code for a living, I leave that part to the professinals. You could ask them? Carmack is pretty responsive on twitter. (ID_AA_Carmack) |
||||
|
|
|
|
|
#25 |
|
Member
Join Date: Nov 2006
Location: Somewhere over the ocean
Posts: 634
|
just a dumb question
Freescale is a complete different company, or ibm can sell it's power based designs? |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|