Future console CPUs: will they go back to OoOE, and other questions.

Yes, 1 cycle access. It didn't work very well though, they found a slower but bigger cache worked better. The instruction issue on the 21164 was quite limited though and the 264 also improved on that.

I'm fairly certain that a 1-cycle load-to-use latency is what worked best for the 21164, otherwise the Alpha designers would have opted for something different. They didn't just build it without prior knowledge of how it would perform on code (tons of codes analyzed before design).

What allowed the 21264 to perform better with higher L1 load-to-use latency was the self scheduling capability.

You're right about the 21164 having strict instruction issue, it was a 4-wide issue CPU but 2 had to be integer and 2 had to be FP. But the 21264 has the same FP execution unit resources and it performed more than twice as good on SpecFP which in general has much more exploitable parallism than SpecInt has, - which on paper should favor the in-order, but doesn't.

On "general purpose" (to which I read "branchy integer") stuff yes, but on SIMD FP I doubt any desktop processor will touch Xenon and certainly not Cell. On heavily threaded server stuff nothing touches Niagara.

You still need to feed your FP execution units, you still needs to store the results, you still need control structures to tell you what to do when. There's a surprising amount of integer "chores" in most FP codes.

Cheers
 
Well in theory Xenon is amazingly powerful. The problem, as we've seen with many of the games so far, is that it's very, very hard to get that speed in an app. Performance relies on highly talented programmers tailoring their code to the CPU. And from all the whining from major devs, "minor" people like Carmack, Sweeney, and Newell, it's really a bitch to pull off.

Today is the time of ports it seems, as almost every game is multi-platform. That doesn't spell happy times for architectures which need a lot of hand holding to get their peak performance, especially when the PS3 is equally challenging but it requires a totally different approach. I wonder if Wii will be better off for ports because its CPU should at least have a semblance of OOO like Cube? I've read Cube's was very basic though.

If you look at one of the most awful ports, say Quake 4, you see how awful things can be. Quake 4 will run fine on 3 year old PC CPUs as long as you have a decent GPU from the past 2 years or so. In fact, it will run like wildfire on a P4 2.6 with an X800XL and at higher quality settings than Xbox 360.

So, IMO, the OOO P3 was a nifty gem for Xbox to have last gen that maybe they should've appreciated more. It certainly would make porting easier. Problem is it obviously is easier and cheaper to really get a lot of theoretical performance out of in-order, and that is very, very good for marketing. I don't think cost of the CPU was as big a deal as people seem to think. MS at least partially paid to develop this new CPU, for sure. There was no R&D cost for a Mobile Celeron 733.
 
So, IMO, the OOO P3 was a nifty gem for Xbox to have last gen that maybe they should've appreciated more. It certainly would make porting easier. Problem is it obviously is easier and cheaper to really get a lot of theoretical performance out of in-order, and that is very, very good for marketing. I don't think cost of the CPU was as big a deal as people seem to think. MS at least partially paid to develop this new CPU, for sure. There was no R&D cost for a Mobile Celeron 733.

a good point. i think MS actually got a bit misled. i think they really wanted to create the gamecube of this generation. for which they approached IBM, assuming the latter would deliver prompty, having experience with console-tailored reasonable PPCs. well, for one reason or another it turned out IBM could not do it this time. i guess at one point there was a decision to be made: a couple (at most) of realtively low-clocked OOOe cores (say, based on 970) or more and higher clocked cores but in-order. at this moment somebody somewhere decided that multiple, high-clocked vector units were of higher importance, plus i guess some PR played a role too - after all the competition had this mysterious & threatening chip on the cooker with its gazillion flops. so the choice was made - ppe-style cores with fat SIMD units. plenty of them, with the mandatory SMT topping.
 
Last edited by a moderator:
Well in theory Xenon is amazingly powerful. The problem, as we've seen with many of the games so far, is that it's very, very hard to get that speed in an app. Performance relies on highly talented programmers tailoring their code to the CPU. And from all the whining from major devs, "minor" people like Carmack, Sweeney, and Newell, it's really a bitch to pull off.
It might be extremely easy when you know how to do it. Like with most everything else.

They just aren't used to it, haven't got the right mindset, tools an libraries yet, and have to experiment. That's what makes it so hard. No Googling for the right solution for them! But that will change, fast. And that is the part of programming I enjoy most.

;)
 
DiGuru said:
It might be extremely easy when you know how to do it. Like with most everything else.
Having seen first hand what a decent PC dev does with it - I don't see it ever becoming extremely easy. Obviously it'll get a lot better then it is now though.
 
Going back? They never went there in the first place... The Xbox is the exception and the GCN barely qualifies (the Bandai Pippin was the first (also one of those "barely ones" and the M2 would've been the second)).
 
http://www.xbitlabs.com/news/cpu/display/20050913222050.htmlI
This is getting more and more ridiculous. First you compare a VIA Eden cpu intended for low cost fanless embedded SBC applications with the latest Athlon CPU claiming the difference is down to oooe. Now you quote a $40 average die manufacturing cost for the P4 (note die cost not chip cost) to prove how the latest dual core Athlons and Conroes could have been viable in the Xenon on a cost basis. If you had bothered to read the link you quoted:
Still, an average cost of a die is not necessarily an indicator of average manufacturing cost of a processor, as final products need to be tested, qualified and packaged.

I am looking at and comparing the cost to MICROSOFT of the CPU chip they are putting into the Xbox 360. The die cost you are quoting is the average cost to the foundry of creating a die which is a small piece of bare silicon excluding overheads. Also I believe the $40 quoted is the typical for single core P4 not a dual core CPU.

The die cost does not include the cost of packaging which involves mounting the die on the package and welding gold wires between the die and the pins, testing each chip, after this allowing for the cost of defective chips you have to throw away (likeky to be at 60% for a large chip at the early phase of it's life), and overheads and profits to pay for salaries, the maintaining manufacturing plant and factory and return on investment. All of these are significant costs. The total cost can easily come to 3 times or more of the cost of manufacturing a die. $160 is about right for the cost of a tested and packaged low end dual core Athlon X2 chip to Microsoft. Look up bulk trade prices on components to get an idea of cost of supplied tested components.

As I said, current AMD and Intel dual core chip prices are too high for it to be used in consoles. This would have been even more the case when the XBox 360 started production one and a half years ago (if production for a dual core was indeed feasible for the XBox 360 deadline).


Gubbi said:
From AMD

Of interest is this recent discussion over at Aces hardware.

You are quoting from what someone has posted in a forum?

Here are some more reliable sources for dual core die size.

http://www.overclockers.com.au/article.php?id=489587
Conroe has a die size of 143 square mm and comes with 291 million transistors. As with its predecessor, Conroe is produced with 65nm technology. But Presler comes with 376 million transistors and a die size of 206 square mm. The smaller number of transistors is probably one reason why Conroe’s TDP (Thermal Design Power) could be reduced by up to 40% compared with Pentium D. The technical description provided to us by Intel shows that Core 2 Duo will run with a core voltage range (VID) of 0.8500 volt to 1.3625 volt. The voltage range is dynamically regulated, which we will explore later in this article.

Conroe reduces die area by sharing the cache between the two cores and comes in at 143mm2 using 65nm fab. If it was done using 90nm fab which was the state of the art when the XBox manufacture started, the die area would be 143 x (90/65)^2 = 274mm2

http://www.gamepc.com/labs/view_content.asp?id=coreduo&page=2&cookie_test=1

http://www.shopping.com/xPR-AMD_AMD...GHz_Socket939_1MB_BOXED_w_fan~RD-193833766532
The smallest and cheapest desktop dual core CPU is the AMD Manchester die which is 199mm2 at 90nm process. The cache is reduced to reduce die size.
AMD Athlon64 X2 4200+
With this impressive justification of my "investment", I dared to move on to benchmarks. The graphics and CPU test 3DMark05 is very demanding and returned a respectable 4950 score for my system. Moving on, the Sandra 2005 cpu benchmark determined 18700 Mips (Dhrystone) and 6996 MFlops (Whetstone FPU) or 9058 MFlops (Whetstone iSSE). Now that's almost twice the numbers of the Athlon 64 3500+ or 3.4GHz Pentium IV. Impressive!
10 Gflops!

The Intel Duo core (Yonah core) is a low power cut down P4 dual core with shared cache for portables. The die size of the low cache size version is only slightly larger than the Xenon (90nm at 65nm (173mm2 at 90nm) However it's performance is similar to the Pentium M except that the SSX extensions and FP performance are a little faster than Pentium M, (but nowhere near the Xenon ballpark).

http://wiki.free60.org/Xenon
Xenon's die is 168mm2 at 90nm process. 115GFlops peak, maybe half that in real benchmarks.
Good floating point performance is important for a console. How much more die area would be required to raise the 199mm2 Athlon X2 core to desirable levels for gaming?

As I said, it is easy to see why Microsoft picked the specs they did for Xenon
1) Triple core oooe chip was not technically feasible when mass production needed to start.
2) Dual core oooe chip was too expensive and probably could not be done in time for the manufacturing deadline.
3) Single core oooe chip performance not adequate.
 
Last edited by a moderator:
The Intel Duo core (Yonah core) is a low power cut down P4 dual core with shared cache for portables. The die size of the low cache size version is only slightly larger than the Xenon (90nm at 65nm (173mm2 at 90nm) However it's performance is similar to the Pentium M except that the SSX extensions and FP performance are a little faster than Pentium M, (but nowhere near the Xenon ballpark).
Uh nope. Yonah is a dual core Dothan with shared cache. Dothan is a highly modified modernized P3 using P4's bus. Top end Yonah cores will lay waste to basically all P4s in most applications (especially games). It will also do it while using like 30W max power.

As I said, it is easy to see why Microsoft picked the specs they did for Xenon
1) Triple core oooe chip was not technically feasible when mass production needed to start.
2) Dual core oooe chip was too expensive and probably could not be done in time for the manufacturing deadline.
3) Single core oooe chip performance not adequate.

1) sure.
2) not necessarily. Additional R&D costs for IBM's PPC Xenon vs. chip cost alone for a Athlon say. People always forget the biggest expenditure: R&D.
3) Definitely questionable. Theoretical in-order performance looks to be of dubious, very unproven value. Higher development costs to get the supposedly great performance. Difficult porting. But, yes, if they have enough wicked coders, the performance could be nice. And of greater importance perhaps, just how many companies have enough amazing coders to optimize the ubiquitous cross-platform title for all platforms? That's the big complaint from industry vets who've been wanting to put their PC games on 360/PS3. The new CPUs make it very, very hard. Already released dud ports just prove this.
 
Last edited by a moderator:
Console CPU's just aren't a good indicator where CPU's are going.
They're generally compromised for cost first, and I still believe that the parts are selected as much by the marketing departments as by the technical departments :p

Ports from PC are going to be hard on all the platforms this gen.

I've said before most apps are limited by cache misses and memory accesses not by FPU performance. If you port those type of apps from a modern PC you're in for a world of hurt on either Xenon or PS3.

If your building something ground up for consoles, then your better treating both as PS2's and using the sensibilities that requires rather than treating them as PC's.
 
If your building something ground up for consoles, then your better treating both as PS2's and using the sensibilities that requires rather than treating them as PC's.

Sorry to go back in time, but this then leads us to a great comparison between the unique architecture of the PS2 and the generic architecture of the Xbox.

Which was easier to develop for, and by that I dont mean which was more powerful, rather which platform gave the best tools and simplest coding practices for the developer?

I know been there done that.. got the T-shirt but a reminder might be nice :)

I know.. your going to say GameCube to really throw a monkey wrench in this futile exercise.
 
That depends on your backround and perspective... For me I'm much more comfortable working on goofy embedded processors with emacs or vi (although now textmate is making a very nice impression on me), an assembler, compiler, linker and debugger... I typically find environments like Visual Studio and Eclipse as big complex morons getting in way. Others can't live without them and swear by them. To each his own.

Also I've spent more of my time on MIPS and PowerPC processors than I have x86 (hell I think I could say I've got my 68k and Z80 under by belt than x86), and I absolutely love trying to work serial problems into parallel solutions (particularly SIMD), and I generally don't get near DirectX. I'm not particularly fond of C++ (I'm more of a C guy), but I'm totally into Objective C, Erlang, and unlike many, I really love Lisp. But again, that's just me. For me, I'd find the Xbox more alien than the PS2 or GCN where'd I'd feel quite comfortable. I'd much rather deal with an in-order processor where the behaviors are more predictable and I can work around them than a relatively complex OoOE processor that leaves me guessing as to why I'm getting a particular result. But hey, again, that's just me...
 
... But hey, again, that's just me...

I don't think it's just you - I never really got on with the PC as a development platform either, and so I probably prefered the PS2.

I think we're definitely in a minority though.

I'm not going to say I prefer vi over Visual Studio however... I did (and still do) most of my development in VS, and did a lot of PS2 stuff on Codewarrior too. Give me an IDE with integrated debugger over command-lines and makefiles any day.

The development environment on PS2 almost certainly wasn't as polished as that for XBox - Microsoft obviously came in with a big advantage there, which they continue to leverage now. However the tools for PS2 were "good enough" and I prefered the platform, so generally I was happy.
 
Microsoft definitely has the edge in development tools, and for the majority of developers, the more comfortable platform. But there's no one-size-fits-all tool out there and even Microsoft doesn't use VS for everything. Plenty of those guys are using editors like gvim or visual slick edit, with nmake for builds.

By the way, VS 2005 users who long for vi should check this out:

http://www.viemu.com/

Personally, I can't live without it!

Sorry to go back in time, but this then leads us to a great comparison between the unique architecture of the PS2 and the generic architecture of the Xbox.

Which was easier to develop for, and by that I dont mean which was more powerful, rather which platform gave the best tools and simplest coding practices for the developer?

I know been there done that.. got the T-shirt but a reminder might be nice :)

I know.. your going to say GameCube to really throw a monkey wrench in this futile exercise.
 
You are quoting from what someone has posted in a forum?

Why not? I'd be happy to quote from a discussion in this forum.

You can read and think for yourself.

On this page there's a die photo of a X2. Calculate the size of one core compared to the rest of the chip, multiply by 230mm^2, you'll get 32-33 mm^2

Here are some more reliable sources for dual core die size.

http://www.overclockers.com.au/article.php?id=489587
Conroe reduces die area by sharing the cache between the two cores and comes in at 143mm2 using 65nm fab. If it was done using 90nm fab which was the state of the art when the XBox manufacture started, the die area would be 143 x (90/65)^2 = 274mm2
Apples and oranges, I was saying that a K8 core in 90nm were ~30mm^2, you're quoting numbers for a brand new core in 65nm.

Not really surprising that a new 2006 state of the art CPU is bigger than what came before it, is it?

But anyway, look at the numbers and the die photo. The cores themselves take up ~35mm^2 a piece, with the rest going to the 4MB cache.

And your scaling presumes perfect scaling from 90 to 65nm, which with 100% certainty isn't the case, since Intel themselves has said that M1 (lowest metal layer) could only be scaled by 5%.

Xenon's die is 168mm2 at 90nm process. 115GFlops peak, maybe half that in real benchmarks.

^^,

Cheers
 
Yonah is a dual core Dothan with shared cache. Dothan is a highly modified modernized P3 using P4's bus. Top end Yonah cores will lay waste to basically all P4s in most applications (especially games). It will also do it while using like 30W max power.

On FP & media operations Yonah is quite a bit behind the P4.

As I said, it is easy to see why Microsoft picked the specs they did for Xenon
1) Triple core oooe chip was not technically feasible when mass production needed to start.
2) Dual core oooe chip was too expensive and probably could not be done in time for the manufacturing deadline.
3) Single core oooe chip performance not adequate.

1 was perfectly feasible.
2 actually happened - the 970MP turned up around the same time as the XBox360.

Microsoft could have used an OOO core if they really wanted it. Xenon was built to order for MS so IBM would have given them whatever they desired.

--

One thing nobody seems to be mentioning is that in order for an OOO processor to deliver high SIMD FP throughput it has to be clocked high. Not only that, the chips all have to run at the same rate, be fairly low powered, supplied in millions and most of all - be cheap.

High end chips from Intel or AMD are made in surprisingly small numbers, the vast majority of the chips they make are for low end machines. They might be able to make a chip capable of the same FP throughput but they wouldn't be able to supply them in any numbers.
 
ADEX said:
1 was perfectly feasible.
2 actually happened - the 970MP turned up around the same time as the XBox360.
1. At 3.5Ghz?
2. Maybe something changed there recently (I really didn't bother following this) but 970MP was one of IBMs phantom chips last I checked (alongside with 970FX and a bunch of 750xxx derivatives).
On paper IBM has talked about forever(and looked really nice), but I'm not aware of any actual real world showing for any of them (but like I said I haven't followed this development recently. And at any rate, you'll have one hell of a time providing an argument that MP could have been ready for very large volume production in 2004 ).

MrWibble said:
I'm not going to say I prefer vi over Visual Studio however... I did (and still do) most of my development in VS, and did a lot of PS2 stuff on Codewarrior too. Give me an IDE with integrated debugger over command-lines and makefiles any day.
I do most of my work in VS as well - but that doesn't mean I don't have a bunch of issues with it.
The one thing I will always bitch about is the fact it scales power requirements faster then games do. 2005 is borderline unusuable on even fastest single core CPU available - ever since Intellisense became resource gobbling monstrosity, the thing requires minimum dual-core configuration if you want work to be anything else then an excercise in frustration (and with offices taking forever to issue upgrades when asked for them - you can imagine this frustration goes on for awhile).
And let's not forget the the ratio of good : bad version is about 1:3-4. Looking back - I can only pick 3 usable versions of VS (1.52, 6.0, and 2005) everything in between was a mess in my experience, and what makes it worse is that nowadays I don't even require extensive use of advanced features - it's the basic stuff that tends to be bad (2005 still maintains a couple stupid basic bugs - but at least the GUI has evolved beyond the stupidity of original .NET incarnation).
But don't mind me - I just like to rant about devtools. :p

But back on topic, I happen to agree with Archie and Wibble about platform preference.
archie said:
where the behaviors are more predictable and I can work around them than a relatively complex OoOE processor that leaves me guessing as to why I'm getting a particular result. But hey, again, that's just me...
It's not just you, it's one of the reasons I consider SPEs to be much better design then PPE.

Tahir2 said:
rather which platform gave the best tools and simplest coding practices for the developer?
The first is rhetorical I imagine :p
I think coding practices were probably simpler on PS2 though - but maybe I feel like that just because the whole graphic subsystem had a set of simple rules that were clear and out in the open.
 
1. At 3.5Ghz?

It wouldn't be even close to that clock rate, probably well under 2GHz.

2. Maybe something changed there recently (I really didn't bother following this) but 970MP was one of IBMs phantom chips last I checked (alongside with 970FX and a bunch of 750xxx derivatives).
On paper IBM has talked about forever(and looked really nice), but I'm not aware of any actual real world showing for any of them (but like I said I haven't followed this development recently.
970FX is the 90nm version of 970, it's been in Apple machines since 2004.
970MP replaced the 970FX in PowerMacs in 2005.

The 750VX did a disappearing act but I never heard much about it.

And at any rate, you'll have one hell of a time providing an argument that MP could have been ready for very large volume production in 2004 ).

The 970MP uses a core which is a variation on the core in the POWER4, it shipped as a dual core processor in 2001...

They could supply them in numbers but not at the frequency / power / price required for a console. Actually price might not have been a problem as the 970MP isn't very big.

Thanks for spelling out for me just why pure FP performance is not indicative of actual resulting game performance

On PC's evidently not.

However this thread has asked the question if OOO is better than in-order. There were OOO cores available for use but both Microsoft and Sony have gone for high frequency in-order designs with masses of SIMD-FP capabilities. They'd probably both like OOO but it's evidently not the most important feature.
 
However this thread has asked the question if OOO is better than in-order. There were OOO cores available for use but both Microsoft and Sony have gone for high frequency in-order designs with masses of SIMD-FP capabilities. They'd probably both like OOO but it's evidently not the most important feature.

Well, as has already been pointed out, they both went to the same vendor and more or less got the same core from off IBM's shelf (it was previously designed as part of a research project to achieve high clockspeeds, or so I recall).

And you may make the case that the 970 doesn't reach the same 3.2 GHz frequency as Xenon, but the issue-width is considerably wider. In fact, I wouldn't be surprised if the dual PPC970 X360 devkits yielded better overall CPU performance than the final machine.
 
Back
Top