Future console CPUs: will they go back to OoOE, and other questions.

I think Carmack, Sweeney, and Newell, in their complaints, were voicing their very real concerns about cross-platform ports. They are all trying to sell game engines, and making these engines work on such vastly different platforms is a very large and ugly problem, like we've seen with some games.

Face it. Consoles of today aren't playing the same hand-tuned games of yesteryear. This generation especially because of just how much the cost escalates with prettier graphics. Ports are very popular now, and the new consoles are very unfriendly in that respect. Especially compared to the old Xbox. The previous generation was IMO sort of a testing grounds for massive porting.
 
Choosing between performance which is either higher in peak or more forgiving is not a matter of future proofing; it's a trade-off between having more room to grow (or, conversely, to falter to when applications are less optimized) or having a higher norm for performance.
 
Imagine a PC now with a .65nm Xenon CPU but only dual core. It could probably run at 4.5GHz with 40 watts and have less than 80mm2 die size. Could be really cheap, cool and fast. Just dreaming :p
 
Imagine a PC now with a .65nm Xenon CPU but only dual core. It could probably run at 4.5GHz with 40 watts and have less than 80mm2 die size. Could be really cheap, cool and fast. Just dreaming :p

And it would suck ass at running modern desktop productivity applications consisting of difficualt to predict branchy object orientated code...
 
And it would suck ass at running modern desktop productivity applications consisting of difficualt to predict branchy object orientated code...

But the idea discussed here (and taking the idea of this dual-core XeCPU PC chip out of the equation) is that these chips are allowed to suck on those dektop productivity apps, because as far as the end-user is concerned, it should still run 'fast enough.' Like, Excel's not going to be slowing to a crawl, y'know?

Rather though, the apps that media-centric* consumers often do feel the slugishness on, should/could enjoy a substantial speedup.

Basically (for a rough philosophical example), encoding times being cut in half from 1 hour to thirty minutes trumps GP apps processing times going from one second to two seconds.
 
But the idea discussed here (and taking the idea of this dual-core XeCPU PC chip out of the equation) is that these chips are allowed to suck on those dektop productivity apps, because as far as the end-user is concerned, it should still run 'fast enough.' Like, Excel's not going to be slowing to a crawl, y'know?

Rather though, the apps that media-centric* consumers often do feel the slugishness on, should/could enjoy a substantial speedup.

Basically (for a rough philosophical example), encoding times being cut in half from 1 hour to thirty minutes trumps GP apps processing times going from one second to two seconds.

Ahhh. OK. I have my doubts though. Could it realistically clock that high? What would you run on it (Linux? compiled with GCC?) Would integer/oo performance really be acceptable? There is a deep stack of stuff going on between Excel and the bare metal and if it all takes a performance hit, maybe we would notice. Remember that until recently Java was visibly slow on the desktop.
 
Ahhh. OK. I have my doubts though. Could it realistically clock that high? What would you run on it (Linux? compiled with GCC?) Would integer/oo performance really be acceptable? There is a deep stack of stuff going on between Excel and the bare metal and if it all takes a performance hit, maybe we would notice. Remember that until recently Java was visibly slow on the desktop.

Well, I'm just sort of laying out the philosophy behind the discussion taking place here in terms of these IOE chips in general and their desktop performance in GP apps. Not speaking of the Cell and XeXPU directly per se. :)

If indeed the Linux on PS3 isn't constrained, then I guess the PS3 and it's Cell will make for a great proving grounds for these ideas we're batting around back and forth here. And I do so hope it's unconstrained.
 
Well, I'm just sort of laying out the philosophy behind the discussion taking place here in terms of these IOE chips in general and their desktop performance in GP apps. Not speaking of the Cell and XeXPU directly per se. :)
Fair enough. I think nixing OOOe would kill performance for most of my day to day PC usage (development, writing documents, browsing the web, etc.) but on the other hand strong streaming FP performance would boost the digital audio and illustration stuff I do - really I want both!

If indeed the Linux on PS3 isn't constrained, then I guess the PS3 and it's Cell will make for a great proving grounds for these ideas we're batting around back and forth here. And I do so hope it's unconstrained.

Linux on PS3 with unconstrained access to the hardware would spur me towards getting one - I doubt they'll allow it though, running under a hypervisor with limited or no access to RSX seems more likely :cry: - I can't see them allowing enough access for unlicenced game development.
 
Last edited by a moderator:
Fair enough. I think nixing OOOe would kill performance for most of my day to day PC usage (development, writing documents, browsing the web, etc.) but on the other hand strong streaming FP performance would boost the digital audio and illustration stuff I do - really I want both!
This was one of the factors about Cell's Linux performance. The PPE may be pretty sucky relative to conventional cores, but would the end user notice on optimized code. As has been pointed out, old sub-GHz PCs were fast enough for most tasks where the computer is mostly waiting for the user to do something. There's a few applications where the CPU speed really makes a difference, like games and media encoding, but if these can be better targetted by SIMD engines and the like, the requirements for super fast conventional processors are negated. For writing docs and web browsing, you only need the power of a 1 GHz processor x86 (say) and perhaps a 3.2 GHz PPE or similar core running at only 1/3rd the efficiency can meet that target. Then for the actual meat of the workload that taxes a processor, you have more efficient data processors that forgo niceties like OoOE.

I guess you can try it for yourself easily enough. Run Taskmanager (or Linux equivalent) and run a load of programs, and see how demanding they are. Then consider which of those can be mapped onto something like a bunch of SPEs to be sped up, and which need a beefy branch-strong processor. Okay, that last bit is very hard as we're still in the infancy of learning what programs can be fitted onto SIMD arrays. GPGPU is showing us more and more surprising results in things people didn't
think they'd be good at.

Anyway, for myself I would have been happy to stick with my PIII 800MHz except I was ray-tracing and stuff. If someone provided a plugin SIMD-board and optimized software to improve those areas, that 800 MHz PIII would still suit me fine for browsing and Word etc. I find the HDD is still the major bottleneck in Windows anyway. Whenever there's a stutter or jitter and moment of freeze, it's background tasks farting around rather than the CPU being choked with the workload.
 
I think Carmack, Sweeney, and Newell, in their complaints, were voicing their very real concerns about cross-platform ports. They are all trying to sell game engines, and making these engines work on such vastly different platforms is a very large and ugly problem, like we've seen with some games.

...

The previous generation was IMO sort of a testing grounds for massive porting.

Yes, and the lesson learnt from the previous generation is that you share assets as much as you can (3d mesh, animation, textures - in short, Art) but then write custom engines for each platform, to play to each platform's strengths.

Right now, the 360 and PS3 development has some overlap with PC. In a few years, PCs will have evolved to something else, and this discussion will be no longer relevant - to get any kind of decent performance out of a console you'll NEED to have a decent engine fine-tuned to the console.

There are a bunch of different options for games to develop further, but as a consumer I am happy with the way it is now. Who knows some day Consoles will just be a certain PC spec fixed for 5 years, but I don't know. Not now anyway - maybe we can discuss this again once hardware has become so powerful it is no longer relevant to games - hardware will always be fast enough. Until that day, there will be choices made by different platforms, and they will result in different game experiences. And I don't think there's much wrong with that.

(this quite apart from the discussion about what would happen whether a freshly designed CPU aimed at supporting user's modern day needs would look more like a OOOE / Pentium or IO Cell - I wouldn't be surprised if we got something much like Cell then anyway)
 
Last edited by a moderator:
And it would suck ass at running modern desktop productivity applications consisting of difficualt to predict branchy object orientated code...
I have my doubts about that :)

The Xenon CPU core is a deep pipeline core (20 stages) and then it was designed to run at high clock rate. The fact is it privileges the FP and SIMD code, not the GP code, but it can run at higher frequency.

The key to compare two cores is have both at the same fabrication process and power dissipation, not compare the cores at the same frequency.

The Xenon CPU core is currently using a 90nm process with a 28mm2 of die space and run at 3.2GHz. Now lets scale it down to 65nm, we can expect a ~40% increase in clock speed and it means 3.2 x 1.4 = 4.48 GHz. Also we can expect almost half the die space.

Now, will a IOE Xenon CPU core at 4.5GHz be slower than a OOOE core like the Conroe?
Give me specific numbers on how much this IOE is less efficient than the OOOE at the same process and power dissipitation.
 
Last edited by a moderator:
(this quite apart from the discussion about what would happen whether a freshly designed CPU aimed at supporting user's modern day needs would look more like a OOOE / Pentium or IO Cell - I wouldn't be surprised if we got something much like Cell then anyway)
Or something more in the middle like the Xenon CPU cores.
 
I have my doubts about that :)

The Xenon CPU core is a deep pipeline core (20 stages) and then it was designed to run at high clock rate. The fact is it privileges the FP and SIMD code, not the GP code, but it can run at higher frequency.

The key compare two cores is have both at the same fabrication process and power dissipation, not compare the cores at the same frequency.

The Xenon CPU core is currently using a 90nm process with a 28mm2 of die space and run at 3.2GHz. Now lets scale it down to 65nm, we can expect a ~40% increase in clock speed and it means 3.2 x 1.4 = 4.48 GHz. Also we can expect almost half the die space.

Now, will a dual core IOE at 4.5GHz be slower than a OOOE core like the Conroe?
Give me specific numbers on how much this IOE is less efficient than the OOOE at the same process and power dissipitation.

There's a lot more to it than OOOE vs IOE multiplied by clock speed. There's the whole rest of the CPU to consider. Im no CPU expert and im sure a real CPU experts could go a lot deeper into Conroe's merits over the PPU but off the top of my head your talking about 3x the excecution units in Conroe, double or more the issue width, far bigger caches, better branch prediction......

Personally I find it extremely hard to believe, as many seem to be suggesting that Intel/AMD have done nothing but focus on making things like Excel go faster for the last 10 years while spending relatively little focus on making the "media centric" applications go faster which are clearly what they are being judged on in most of the benchmarks, and have been being judged on for years.

ERP has talked quite a bit about the subject and seems to be one of the best placed here to give a true picture but perhaps he (or the other devs) could answer this question for us: if programming exclusively for the architecture of each, which do you think would produce the better results in a game, assuming all have access to an equal GPU (for the sake of argument lets say the best available today - X1950XTX) and the same amount of memory at a suitable speed for that CPU.

Core2 Duo, Cell or Xenon?

And of course, why?
 
There's a lot more to it than OOOE vs IOE multiplied by clock speed. There's the whole rest of the CPU to consider. Im no CPU expert and im sure a real CPU experts could go a lot deeper into Conroe's merits over the PPU but off the top of my head your talking about 3x the excecution units in Conroe, double or more the issue width, far bigger caches, better branch prediction......

L2 caches you can change (size, associativity, speed).
Now lets compare the pure cores using the same fabrication process.

Personally I find it extremely hard to believe, as many seem to be suggesting that Intel/AMD have done nothing but focus on making things like Excel go faster for the last 10 years while spending relatively little focus on making the "media centric" applications go faster which are clearly what they are being judged on in most of the benchmarks, and have been being judged on for years.
Nobody said they have done nothing, but they have not done all they could.
Intel tried with the P4 (deep pipeline), but without all the flops, SIMD performance, etc...
and x86 legacy doesnt help anyway.

ERP has talked quite a bit about the subject and seems to be one of the best placed here to give a true picture but perhaps he (or the other devs) could answer this question for us: if programming exclusively for the architecture of each, which do you think would produce the better results in a game, assuming all have access to an equal GPU (for the sake of argument lets say the best available today - X1950XTX) and the same amount of memory at a suitable speed for that CPU.

Core2 Duo, Cell or Xenon?

And of course, why?
And using the same fabrication process and power.
 
I think it's already been mentioned in this thread, but games spend a lot of time moving things around in memory, and the ability to the keep execution units running during that L1 miss is very useful. I would not be surprised if the Cell PPE were equivalent to a sub-1 GHz Pentium 3 or Athlon in terms of real-world application performance -- in fact, I expect it to be, given how narrow it is and how much time it's going to spend stalled waiting on cache misses.

I speculate there were two main factors that went into the design of the PPE (and Xenon as well):

1. IBM already had the design on the shelf.
2. The console manufacturers (i.e. Sony) were most experienced with in-order processors (R5900, R3000), and were mostly focused on their streaming architecture concept.

So, the decision to use an in-order core has nothing to do with what's best for today's applications, it's strictly the result of extremely narrow design specifications, targeting a specific market with, presumably, a lack of experience in the broader range of CPU microarchitecture.

Anyone run GCC on the R5900? Of course, that chip had a lot going against it, but still -- it was damned slow.
 
It's not in the middle, it's just SMP on a chip with fairly traditional cores.

If it at least had scatter/gather DMA ...
True, but it has some extra VMX registers and DOT3. More SIMD and flops in general. And dont have a x86 legacy.
 
Last edited by a moderator:
Does an OOE processor have an innate advantage with multitasking? And, considering most applications aren't coded by people who know how to optimize things, I think performance would be shockingly terrible in most situations. I think it would be unrealistic to switch to IOE on such a general platform.
 
I have my doubts about that :)

Now, will a IOE Xenon CPU core at 4.5GHz be slower than a OOOE core like the Conroe?
Give me specific numbers on how much this IOE is less efficient than the OOOE at the same process and power dissipitation.

I can't give you specific numbers but I think a number of things would need to happen before such a CPU would beat Conroe at a desktop app such as running a spreadsheet app: for example I'd expect you'd need a compiler better than GCC for the PPE ISA, presumably with profile directed branch optimization and other such techniques to make up for the pretty hefty drop in IPC you'd expect from IOE + deep pipeline.

To reverse the question, what is the performance you'd expect from Xenon CPU on pointer chasing OOP method calls versus Conroe?
 
I can't give you specific numbers but I think a number of things would need to happen before such a CPU would beat Conroe at a desktop app such as running a spreadsheet app: for example I'd expect you'd need a compiler better than GCC for the PPE ISA, presumably with profile directed branch optimization and other such techniques to make up for the pretty hefty drop in IPC you'd expect from IOE + deep pipeline.
I dont know about this part but is not MS/IBM providing good compilers to the Xenon CPU?

To reverse the question, what is the performance you'd expect from Xenon CPU on pointer chasing OOP method calls versus Conroe?
Anyway, my blind guess is 30% decrease in general code. Then a 3.2GHz Xenon CPU core is probably more like a ~2.2 GHz Conroe core. Which should be fine for most SOHO applications. The problem is when someone say to you "Hey we will give you a 3.2Ghz RISC" you will expect more, and with higher expectation comes higher frustration. Probably developers had a higher expectation.

Then a 4.5GHz Xenon CPU core is more like a ~3GHz (both using a 65nm).

But when you need SIMD and flops you will have a different situation :)
 
Back
Top