Future console CPUs: will they go back to OoOE, and other questions.

ERP · Sep 12, 2006

SPM said:
Are you saying the PC beats Cell and Xenon on FP intensive benchmarks? Are you saying that a PC beats a de/compression program running in SPU local store? Or are you just running standard PC benchmarks recompiled to run on a single Xenon or PPE core without utilising the Xenon's or Cell's strengths?

We've run a massive number of benchmarks from LZ type compression through video codecs, animation, scripting languages etc etc etc.

I'm saying on most tasks the X86 wins performance wise on a single thread by a large margin. and that includes what most would consider FP intensive tasks like animation.

There are cases where the x86 looses where the FP to memory op ratio is extremely high, cloth animation say.

Generally the larger the program the bigger the margin. I've seen as much as 10x over the PPU, but that's a rare case, it was a memory bound test and it wasn't recent (might even have bveen DD1).

pc999 · Sep 12, 2006

Just by curiosity, can you tell which CPU(S) is that?

DemoCoder · Sep 12, 2006

I don't think anyone would claim that C code recompiled for PPE would lead to comparable performance. I think it is a specious comparison to take code written for other platforms, recompile it, and claim that PPE can't win in the benchmark, without optimizing the benchmark for the PPE architecture.

This reminds me of the Internet programming language shoutout where people write an algorithm in 20-30 different languages. Languages other than C originally lost out big time. Why? Because the original author simply ported the C algorithms to OCaml, Haskell, Java, et al as a direct translation without concern for the different optimality of idioms in each language. After OCaml, Java, and even Haskell experts got into the fray, the gap was closed significantly. Haskell especially, is very sensitive to the style code is written.

SPM · Sep 12, 2006

ERP said:
We've run a massive number of benchmarks from LZ type compression through video codecs, animation, scripting languages etc etc etc.

I'm saying on most tasks the X86 wins performance wise on a single thread by a large margin. and that includes what most would consider FP intensive tasks like animation.

There are cases where the x86 looses where the FP to memory op ratio is extremely high, cloth animation say.

Generally the larger the program the bigger the margin. I've seen as much as 10x over the PPU, but that's a rare case, it was a memory bound test and it wasn't recent (might even have bveen DD1).

Ah. Single thread - that explains it. Presumably you ran the same benchmark C code on the PPU for larger programs and on a single SPU for small programs.

NRP · Sep 12, 2006

Face it, boys. The console CPUs eat ass sandwiches. We've heard this from a few devs.

Carl B · Sep 12, 2006

NRP said:
Face it, boys. The console CPUs eat ass sandwiches. We've heard this from a few devs.

It's all about the code that's run. There are plenty of PS3 devs out there that would prefer working with the Cell rather than an available x86 chip.

ERP · Sep 13, 2006

DemoCoder said:
I don't think anyone would claim that C code recompiled for PPE would lead to comparable performance. I think it is a specious comparison to take code written for other platforms, recompile it, and claim that PPE can't win in the benchmark, without optimizing the benchmark for the PPE architecture.

This reminds me of the Internet programming language shoutout where people write an algorithm in 20-30 different languages. Languages other than C originally lost out big time. Why? Because the original author simply ported the C algorithms to OCaml, Haskell, Java, et al as a direct translation without concern for the different optimality of idioms in each language. After OCaml, Java, and even Haskell experts got into the fray, the gap was closed significantly. Haskell especially, is very sensitive to the style code is written.

It's optimised in general for the platform, at least as much as practical, that includes using native vector code where possible. These are benchmarks on platform specific versions of our core tech.

I Should clarify here, I'm not saying that these CPU's are crap, just that they have rather severe trade offs for the peak FPU performance. As much pounding as x86 and other conventional architectures get, they do a lot of things very well. Also porting from PC is going to be a lot less than painless this gen again.

archie4oz · Sep 13, 2006

MrWibble said:
I'm not going to say I prefer vi over Visual Studio however... I did (and still do) most of my development in VS, and did a lot of PS2 stuff on Codewarrior too. Give me an IDE with integrated debugger over command-lines and makefiles any day.

Oh I don't mind a decent IDE, and I'm not exactly the biggest fan of makefiles either. I usually prefer a more simplistic IDE with a good debugger, and some kick ass profiling tools (honestly I'm most critical of profiling tools followed by debugger, the compiler, beyond that I can work pretty much with bare bones tools). The one thing a good IDE can offer me is good build management for prototyping (e.g. granular build settings with custom overrides, dealing with pre-compiled headers (or even better, dynamic predictive compiling w/run-time linking), and distributed building). But those are surprisingly easy features to add to a good text editor.

fafalada said:
And let's not forget the the ratio of good : bad version is about 1:3-4. Looking back - I can only pick 3 usable versions of VS (1.52, 6.0, and 2005) everything in between was a mess in my experience, and what makes it worse is that nowadays I don't even require extensive use of advanced features - it's the basic stuff that tends to be bad (2005 still maintains a couple stupid basic bugs - but at least the GUI has evolved beyond the stupidity of original .NET incarnation).
But don't mind me - I just like to rant about devtools

VS 6.0 was the only one I could really tolerate...

fafalada said:
Afaik stock single threaded performance was considerably higher - but that much should be obvious. The 360 PPC requires specialized coding practices and lots of hand tuning if you want your "general purpose code" to run well. IBM's take on general purpose computing I guess...

Well to be fair, most games are still being written in PDP-11 assembler that thinks it's an object system. We're still mainly using compilers that are based around assumptions that still think we're all using shallow register accumulator architectures that can read, write, and move data from one address to another with no penalty, then rely on all sorts of tricks to convince the compiler that we know better.

nAo said:
imho your source is not worth that much, Sony and Toshiba guys were not minor players at all, they co-signed with IBM guys dozen of patents since 2002, japanese guys even move to US to closely work with IBM guys.

That being said, the SPE ISA bears a lot more IBM philosophy (dare I say Motorola) than it does Sony/Toshiba. The EE and ALLEGREX reflect more of Sony's SIMD design philosophy than does the SPEs (which are clearly AltiVec derived).

ADEX said:
The only other design with OOO is the 970 but it's a completely different design produced in a completely different way. You'd have to design a new OOO section for this chip.

Actually, the PPE/Cell/Waternoose cores are about the only PowerPC (or old skool POWER) design that I can think of that *doesn't* have OoOE...

Entropy · Sep 13, 2006

archie4oz said:
Actually, the PPE/Cell/Waternoose cores are about the only PowerPC (or old skool POWER) design that I can think of that *doesn't* have OoOE...

Original POWER started out with OOO back in 1990. (As did the first PowerPC, the 601 in, uhm, 92/93?.)
However, IBM has produced PPC cores that more suitable for embedded use, where OOO may not have been implemented. But here we are still talking produced and marketed PPC microprocessors, God knows how many CPU design experiments they have lying around at IBM as a whole.

Quick edit: That is, IBM has a couple of decades worth of designing OOO PPC and POWER architecture cores. Whether the PPC cores that have targeted other markets have all been OOO, I daren't say. And it is safe to say that IBM has large pile of experimental and test designs lying around that never were commercialised. They have vast experience, and some pretty deep thinkers to boot. I would be wary of arguing from the standpoint that they didn't know what they were doing from an architectural standpoint.
So - did they in fact misjudge in some way?
Are the time-to market constraints to severe to allow rewriting code? Are developers too inflexible or the tools unworkable?
Or are there fundamental problems in the sense that the architecture is a bad fit to the problem area?

I find it difficult to put a value on testaments that porting single threaded code to the PPE yields lower performance than a current x86 processor. Well, yes that's expected. But how relevant is that data point? Does it describe the code the processor will eventually run?

Fafalada · Sep 13, 2006

ERP said:
It's optimised in general for the platform, at least as much as practical, that includes using native vector code where possible. These are benchmarks on platform specific versions of our core tech.

While I know this is affected by having very different constraints then most of us, given the sizes of your teams etc. - I would use "platform optimizations" from EA early on in last generation, only as examples of what "not to do" on certain platforms.

I'm sure(hope) things should be better nowadays, but you can understand my skepticism about the way crossplatform works over at EA.

just that they have rather severe trade offs for the peak FPU performance.

Definately, although I don't think in-order execution is one of the significant tradeoffs made.

archie4oz said:
VS 6.0 was the only one I could really tolerate...

2005 is big, horribly slow, and buggy - but it does a surprising amount of things right as far as GUI and customizing goes (custom build rules is a great idea, though they still need to polish up some rough edges about it).
And at least size and speed are "user-fixable"

I think you'd have liked 1.52 though - it was the first tool that made me feel like windows programming can actually work, plus it was fast and relatively efficient, last time that happened with VS series...

We're still mainly using compilers that are based around assumptions that still think we're all using shallow register accumulator architectures

I know, compiler tech tends to be way behind the hw curve.

That being said, the SPE ISA bears a lot more IBM philosophy (dare I say Motorola) than it does Sony/Toshiba. The EE and ALLEGREX reflect more of Sony's SIMD design philosophy than does the SPEs (which are clearly AltiVec derived).

Indeed, and I maintain this is one area where we regressed a lot this generation. Should make for an interesting argument - if we had the choice between a good ISA and OOOe, which would would win out in cost/performance.
Maybe I should start a poll

Carl B · Sep 13, 2006

archie4oz said:
That being said, the SPE ISA bears a lot more IBM philosophy (dare I say Motorola) than it does Sony/Toshiba. The EE and ALLEGREX reflect more of Sony's SIMD design philosophy than does the SPEs (which are clearly AltiVec derived).

Good opportunity here for some excerpts from that old 'Engineers' thread:

+ The basic architecture of Cell shaped up in the fall of 2000. It's unprecedent that APU has no cache. Many in the development team doubted the usefulness of the tiny 128k dedicated memory called Local Store. But Takeshi Yamazaki of SCE insisted that realtime response is essential for games while cache interferes with it, as a result LS was adopted for APU. Then the APU ISA was discussed between generalized VLIW and object-code-efficient SIMD, at a hotel in NY. Peter Hofstee of IBM succeeded to persuade Toshiba engineers into SIMD.

+ Masakazu Suzuoki of SCE proposed about 200 instructions for APU based on the experience of EE VU at a meeting at Austin. In the room, among engineers in their 30-40s, one old man had been writing something on a paper. He was Marty Hopkins, one of the architect of IBM 801 RISC machine. What he wrote on a paper was a sample program written in machine language and he argued that 200 is too many, 100 is sufficient for a compiler to work. Younger engineers verified it with simulation. Actually the definition of APU ISA was what he chose for his last job at IBM.

I'll add these as well for general back-story assist:

+ Later 3 companies had meetings to discuss the architecture of CELL. The target performance of the project was 1 TFLOPS. Toshiba proposed Force System that has many simple RISC cores and a main core as the controller. Jim Kahle, the POWER4 architect, from IBM proposed an architecture which has just multiple identical POWER4 cores. When a Toshiba engineer said maybe Force System doesn't need a main core, Kahle was greatly pissed off (thus the title of this chapter) as without a main core POWER has no role in the new architecture.

+ Meetings continued several months and Yamazaki of SCE was inlined toward the IBM plan and voted for it. But Kutaragi turned down it. Eventually Yamazaki and Kahle talked about the new architecture and agreed to coalesce the Toshiba plan and the IBM plan. Finally IBM proposed the new plan where a Power core is surrounded by multiple APUs. The backer of APU at IBM Austin was Peter Hofstee, one of the architects of the 1Ghz Power processor. It was adopted as the CELL architecture.

So what I'm seeing is Toshiba had the idea for 'Force' system, but IBM countered with a straight Power design. Toshiba countered with an idea excluding even the controller core of their original design, and IBM got ready to walk away. After a bunch of meetings, Kutaragi said 'no' to the multiple Power plan, and the thinking went back to Toshiba's original Force concept, with the compromise that the central core would be Power. For the surrounding cores, Sony pushed and suceeded in getting local storage instead of cache, and IBM succeeded in talking Toshiba into SIMD. Sony proposed an SPE ISA based on the EE VU instruction set, but IBM proposed and got a leaner ISA.

ban25 · Sep 13, 2006

xbdestroya said:
So what I'm seeing is Toshiba had the idea for 'Force' system, but IBM countered with a straight Power design. Toshiba countered with an idea excluding even the controller core of their original design, and IBM got ready to walk away. After a bunch of meetings, Kutaragi said 'no' to the multiple Power plan, and the thinking went back to Toshiba's original Force concept, with the compromise that the central core would be Power. For the surrounding cores, Sony pushed and suceeded in getting local storage instead of cache, and IBM succeeded in talking Toshiba into SIMD. Sony proposed an SPE ISA based on the EE VU instruction set, but IBM proposed and got a leaner ISA.

Design by committee...

Gubbi · Sep 13, 2006

3dilettante said:
In a standard Tomasulo OO core, the ROB may scale linearly in terms of rename registers and even remain fixed in terms of register ports and result buses.

What does not scale linearly is the cost of dependency checking, which can be done with hardware coupled closely with the ROB or in scheduling hardware. That will scale quadratically. N^2-N is the trend in the number of necessary checks, though it usually is less by some fixed factor.

In a modern data-capture scheduler the number of register-tag comparators scale with N.

But store queues would be a problem.

Cheers

Shifty Geezer · Sep 13, 2006

archie4oz said:
That being said, the SPE ISA bears a lot more IBM philosophy (dare I say Motorola) than it does Sony/Toshiba. The EE and ALLEGREX reflect more of Sony's SIMD design philosophy than does the SPEs (which are clearly AltiVec derived).

XBD's provided the details, but it's not surprising the ISA bears a resemblence to IBM philosophy when in the meetings, an IBM guy penciled out an example ISA that worked so well.

ban25 said:
Design by committee... :smile:

Yep. And I dare say we got a more balanced, better thought-out processor as a result (which is probably an rarity for committee led projects!)

Fran · Sep 13, 2006

archie4oz said:
Oh I don't mind a decent IDE, and I'm not exactly the biggest fan of makefiles either. I usually prefer a more simplistic IDE with a good debugger, and some kick ass profiling tools (honestly I'm most critical of profiling tools followed by debugger, the compiler, beyond that I can work pretty much with bare bones tools). The one thing a good IDE can offer me is good build management for prototyping (e.g. granular build settings with custom overrides, dealing with pre-compiled headers (or even better, dynamic predictive compiling w/run-time linking), and distributed building). But those are surprisingly easy features to add to a good text editor.

I can barely live without a good IDE. I miss Eclipse and Java every day of my life I'm forced to write C++ in VS2003. And I don't really like Java at all. But Eclipse refactoring tool (among the rest) is the best human invention closely after the wheel: I can't live without it, I don't write code in Eclipse, I just click here and there and see the code unfolding on the screen. My productivity skyrockets. Such joy.

Bring this power to C++ IDEs, now!

Fran/Fable2

ADEX · Sep 13, 2006

Actually, the PPE/Cell/Waternoose cores are about the only PowerPC (or old skool POWER) design that I can think of that *doesn't* have OoOE...

Yes, but most are 32 bit. The POWER series are 64bit and have been for some time but these were designed using automated tools whereas the PPE was designed by hand. None of them were designed for the type of frequencies PPE / Xenon run at.

I can barely live without a good IDE. I miss Eclipse and Java every day of my life I'm forced to write C++ in VS2003. And I don't really like Java at all. But Eclipse refactoring tool (among the rest) is the best human invention closely after the wheel: I can't live without it, I don't write code in Eclipse, I just click here and there and see the code unfolding on the screen. My productivity skyrockets. Such joy.

I've been playing around with Eclipse/MyEclipse for J2EE work recently after a long break. I have to agree, it's not so much knowing what code to type as knowing which buttons to press!

Design by committee...

Interestingly the committee came up with a design which closely mirrors a design by one guy - Semour Cray. Cell has remarkable similarities to the Cray 2.

Quick edit: That is, IBM has a couple of decades worth of designing OOO PPC and POWER architecture cores.

More than that, OOO was first proposed by an IBM guy back in the 1960's.

I find it difficult to put a value on testaments that porting single threaded code to the PPE yields lower performance than a current x86 processor. Well, yes that's expected. But how relevant is that data point? Does it describe the code the processor will eventually run?

There seem to be a lot of gotchas in the PPE though the compiler can probably fix many of these. The designers have in effect moved the complexity out of the hardware and onto the shoulders of the developers.

ban25 · Sep 13, 2006

Fran said:
I can barely live without a good IDE. I miss Eclipse and Java every day of my life I'm forced to write C++ in VS2003. And I don't really like Java at all. But Eclipse refactoring tool (among the rest) is the best human invention closely after the wheel: I can't live without it, I don't write code in Eclipse, I just click here and there and see the code unfolding on the screen. My productivity skyrockets. Such joy.

Bring this power to C++ IDEs, now!

Fran/Fable2

I've always preferred NetBeans for Java development, though before that I used JBuilder (back when it wasn't so bloated). Code-completion in NetBeans is fantastic. These days it's almost strictly C/C++ in VS2005, though I have been writing some .NET code recently. If you want to improve VS, check out Visual Assist, as well as the aforementioned ViEmu.

ban25 · Sep 13, 2006

ADEX said:
Yes, but most are 32 bit. The POWER series are 64bit and have been for some time but these were designed using automated tools whereas the PPE was designed by hand. None of them were designed for the type of frequencies PPE / Xenon run at.

...but frequency is only part of the equation, and as several have already pointed out, both Xenon and PPE are easily outclassed by the PPC 970 in single-threaded performance -- and very likely Conroe across the board.

Arwin · Sep 13, 2006

ban25 said:
...but frequency is only part of the equation, and as several have already pointed out, both Xenon and PPE are easily outclassed by the PPC 970 in single-threaded performance -- and very likely Conroe across the board.

You are comparing the PPC970 'whole-chip' to the PPE being 1/9th of a chip?

Fran · Sep 13, 2006

ban25 said:
I've always preferred NetBeans for Java development, though before that I used JBuilder (back when it wasn't so bloated). Code-completion in NetBeans is fantastic. These days it's almost strictly C/C++ in VS2005, though I have been writing some .NET code recently. If you want to improve VS, check out Visual Assist, as well as the aforementioned ViEmu.

I use the latest VA, with its refactoring tool, but it's not even remotely as good as Eclipse (and NetBeans as I've heard of it).

Fran/Fable2

Future console CPUs: will they go back to OoOE, and other questions.

ERP

pc999

DemoCoder

SPM

NRP

Carl B

Friends call me xbd

ERP

archie4oz

ea_spouse is H4WT!

Entropy

Fafalada

Carl B

Friends call me xbd

ban25

Gubbi

Shifty Geezer

uber-Troll!

Fran

Dev

ADEX

ban25

ban25

Arwin

Now Officially a Top 10 Poster

Fran

Dev

Similar threads