The PowerPC 470 series is 32bit:
http://www-03.ibm.com/technology/logic/powerpc.html
http://www-03.ibm.com/technology/logic/powerpc.html
As you are here, I may ask you your pov on something."Pleasure to read as usual
Power7 contains a lot of stuff that is completely useless in a console. The core is balanced for single element double precision throughput, with 4 individual double precision execution units (and even one decimal FPU!). This is essentially completely wasted in a console. Any power7 cpu cut down to console use cases would no longer resemble a power7 cpu very much.
Also, the Power7 line is not designed to be modular and embeddable. In that way, it's no worse than any previous IBM cpu. It's just that in this generation IBM does have a cpu designed to be modular and embeddable. I am, of course, talking of the PowerPC 470S. It's floating point unit is designed to be swappable, so you can switch out the double precision one for anything from the VMX line you fancy. It's bus design is built so that it can work as a part of a cache-coherent whole with parts not built by IBM, so all the game dev gods get what they want. It's a very energy and die space efficient design, so it produces admirable performance while leaving most of the design TDP and space for the GPU. And while it's single-threaded performance is nothing approaching a Power7, it would still be a huge, huge improvement over the present gen, especially in the worst-case situations.
Also, the 1.6GHz is not the absolute maximum the design can stand, it's just the frequency IBM decided to pimp it out as a power-efficient embedded CPU. Give it a modern process, and just a tiny bit of more power budget, and we are talking frequencies that near the "magical" 3GHz barrier last gen shipped at. With a 4 issue CPU (compared to the 2-issue ones last gen), and enough OOOe resources that it shouldn't hopelessly stall on every L1 and L2 miss.
I really, honestly think that 470S and it's successors are not just the best available options, but, considering all design constraints, really very close to being the best possible options. I'm really hoping that the "16 cores" leak means that MS is shipping with a full 470S solution.
Isn't 470 32bit only?
how it compare to ppc a2?
The PowerPC 470 series is 32bit:
http://www-03.ibm.com/technology/logic/powerpc.html
As you are here, I may ask you your pov on something.
You may have read and take part to the old "next generation CPU will they go back to OoO execution etc." thread which I couldn't find after multiple researches.
It seems that pretty much everybody agrees now that OoO execution should be part of the next generation CPU. I'm straying away a bit from this thread topic but I wonder if throughput is still a relevant design goal for next generation CPU?
I actually have a bit of a hard-on for small OoO cores. Big fat cores like Intel's and the Power7 push the single-threaded perf way past the knee of the curve. If that's what you need, then fine. For a game console, I'm probably willing to put up with the complication of twice the threads for an order of magnitude simpler and cheaper cores.What is your opinion on the matter? From your posts I would assert that you think that big OoO cores akin to Intel one are the way to go but I wonder about how a more (fp) throughput oriented CPUs would be perceived by the one with actual knowledge on those things.
That is a valid design, and sort of where x86 ended up. IBM seems to prefer simple self-contained VMX units, without the complication of dealing with forwarding and it's ilk. Probably for cheaper and safer design as much as any performance reasons.Lately I wondered about the relevance for a pretty "big" cpu cores to feed more than one SIMD units. I figurred that it could have benefit especially with a chip supporting 4 way SMT.
Basically it would be like bulldozer in the concept sharing the cost of the front end , OoO engine, etc. not on multiple "cores" but SIMD.
My idea is that it may be easier to feed a 2 SIMD units than a bigger ones (load and stores on the 2 units are unlikely to happen at the same time, it could though) and that it could be overall more efficient than having a SIMD unit twice as big (both are not exclusive through).
Is that a complete misunderstanding ? If not do you think it could be something desirable for a next gen CPU?
I think so.Some random questions, to stimulate discussion, for those who may actually know something about POWER7.
Question #1: Is this even remotely possible? Is this far too optimistic or a roughly accurate ball park for what IBM could fit within that silicon/power budget?
Question #2: Would this make a good console CPU?
Reduce L3 cache to 8MB. Remove the memory controllers and have the CPU interface with the GPU through a fast interface (similar to how the 360 does it). Remove the decimal floating point units. Reduce the 4 issue ports for floating point to two. Use SIMD to get FP throughput. Get rid of all the RAS features.Question #3: What would you reduce? Frequency, L3, memory controller, execution units, etc? What execution units and why?
Question #4: What would you add? VMX128 support? At what cost?
Why wouldn't they? It wouldn't canibalize any of IBMs product lines and would be revenue for their chip design unit (and possibly fab). Also bragging rights.Question #5: To my knowledge IBM only sells Power7 chips in complete server packages for tens of thousands of dollars for the low end. Would IBM even be interested in creating a console variant of POWER7?
Question #6: How is the POWER7's real code performance compared to an AMD Bulldozer core? Per-mm^2? Per-Watt?
Question #9: As a developer, thinking of the 5-7 year window of console development, would you prefer 4 cores/16 threads in a robust CPU (IBM design) or the shift of budgets to a 2m/4c AMD design but with on-die Shader Array? Why?
Question #10. Would this IBM design need a beefed up vector unit or is the real world performance/thoroughput on POWER7 chips more than sufficient?
Question #11. Thinking in console contexts, if you could change one thing about POWER7, what would it be?
Question #12. Does a POWER7 design indicate a split memory design?
Wasn't Xenon already a highly custom part? Why couldn't the next gen Xbox CPU just inherent some of the better parts of the POWER7 architecture like cache latency and throw out the functional units they don't need (while adding the ones they do)? Surely Microsoft is prepared to pay for the custom design behind a CPU that will likely move 50+ million units over the course of 8+ years if the cycle goes like it did this time.
Wasn't Xenon already a highly custom part?
Actually it was codeveloped by sony. Sony not knowing it XD
Magic of the IBM's R&D management
Hey Gubbi, it seems that you along with quite a few of the other serious developers & industry experienced guys here aren't overly enthusiastic about the performance of Xenon as a console CPU. Possibly Cell too although I'm never quite sure of the general opinion there.
Anyway, there was a link posted recently from the developers of Metro 2033 which talked about Xenon (all 3 cores) being equivalent in power to about 75-85% of a single Nehalem core at the same clockspeed. That is unless you properly vectorise the code in which case Xenon can actually be faster than a Nehalem on a clock/thread basis. Or in other words, In properly vectorised code, Xenon could have roughly the performance of a quad Nehalem at 3.2Ghz.
What's your take on this? Is it possible to vectorise a significant portion of CPU gaming code to extract that level of performance out of something like Xenon (or Cell)? If so then it seems that a scaled up version of either of those CPU's could be pretty potent for a next gen console.
Not all code can be vectorized. Some things, like physics or media processing, can be very neatly vectorized with almost a linear speedup for the vector width. But some things, like "game script", or ai, really gains absolutely nothing from vectorization. Generally, simple "smooth" loads vectorize nicely, but if your problem needs to branch a lot, the vectorized code path would suffer a combinatorial explosion in paths it needs to take, so really it's two ifs and then you'd be better off not bothering at all.What's your take on this? Is it possible to vectorise a significant portion of CPU gaming code to extract that level of performance out of something like Xenon (or Cell)?
Then again Nehalem doesn't support AVX, that came with Sandy BridgeAre we talking vectorized code on Xenon vs non-AVX vector code on Nehalem, or with AVX involved, because I'm pretty sure, with AVX involved, it won't be pretty for Xenon.
Then again Nehalem doesn't support AVX, that came with Sandy Bridge
Though yes, it does have 128bit SSE but I'm quite sure that with extra registers and functions in Xenon it still probably can't catch up with Xenon at per-core basis (as long as the problem isn't heavily cache-memory latency bound).
The confusing thing with the vmx 128 units is that 128 refers to the number of registers.I'm trying to understand here, is VSX only 128 bit wide, or is it IBM's 256 competitor to AVX?
Assuming VSX in Power7 is 128........