2005/2006 PC versus PS3/Xbox 360

Well IPC is about the same as the E6750, so given Ninjaprime's figure of "26.8% instruction advantage" and the fact that the PII has a ~900MHz clockspeed advantage, you can do the math.

What is meant by "26.8% instruction advantage"? Does this mean that the E6750 basically executes general code 26.8% faster than Xenos?

I'm just trying to figure how a 2 core 2 thread CPU with lower clocks comes to beat a 3 core 6 thread CPU. I suppose OoO execution and big caches help, but I suppose there are more factors at play.
 
this PC basically can't play any current game... both the CPU and VGA are quite slow for a PC... the 7900GS is quite slow on the modern dx9 games, and the Pentium 4... single core and very low IPC... both consoles are MUCH better at running games, even if the devs could optimize games for this single PC and I don't think they would get as much as what they are getting from the consoles,

the VGA running some multiplatform games at similar(?) settings as the consoles:
cb87cc11-325a-44d6-9447-f879b9160b95.png


30 on the consoles

6a649d5b-1c47-408c-a818-4903cf9a0095.png


60?

d83d540d-8ee3-4bbb-8742-31ae89b359cf.png


30

1da957cb-2dcf-4735-bcde-9e1f953976da.png


30


http://en.inpai.com.cn/doc/enshowcont.asp?id=7973&pageid=7970
 
Well IPC is about the same as the E6750, so given Ninjaprime's figure of "26.8% instruction advantage" and the fact that the PII has a ~900MHz clockspeed advantage, you can do the math.

What is meant by "26.8% instruction advantage"? Does this mean that the E6750 basically executes general code 26.8% faster than Xenos?

I'm just trying to figure how a 2 core 2 thread CPU with lower clocks comes to beat a 3 core 6 thread CPU. I suppose OoO execution and big caches help, but I suppose there are more factors at play.

Pretty much what you said. Xenon is simple in-order and executes one instruction per thread, 6 threads, 6 IPC, 2 IPC per core. 19,200 MIPS. Your OoO C2D manages to execute 9.2 IPC across 2 cores for 4.6 IPC per core, 24,344 MIPS, though it is almost twice the size in transistors for it. Of course MIPS aren't exactly the best performance unit per se, but I think its relevant here. More interesting, Sandy Bridges almost double that, 2600k averages 9.43 IPC per core, 128,300 MIPS.
 
Xenos performance is pretty much unknown AFAIK but in theory it's pretty neat with its edram, mem export and unified shader architecture. I see some poor texture filtering in 360 games sometimes though (bilinear!) so I wonder about the hardware on that front.
No need to wonder Microsoft has released huge amount of information about X360 and Xenos in their Gamefest slides.
http://www.microsoftgamefest.com/pastconferences.htm

Xenos has 32KB of texture cache so if not used carefully one can easily trash the cache and thus performance.
 
Interesting. Thank you for that :)

You should have a look at some tables that detail at what point a graphics card becomes bottlenecked by CPU. If you are comparing a CPU/GPU combination with a console, you cannot say something about the strength of the CPU directly, because both on consoles and on PC the tasks requiring the most performance don't have to be running on the same chips. Physics tasks for instance will more likely be run on GPU on PC, and on CPU on console. CPUs will typically not be involved in the graphics pipeline on PC nearly as much as on console.

Perhaps someone with more active hands-on multi-platform development could say something more useful about this, but I think the above is an important factor. The GTX260 but even the 8800 are quite a bit more powerful than the GPU in consoles.
 
8800GTX has ~3.5x the pixel fillrate and >5x the memory bandwidth of both consoles, nevermind the ALU resources. It was somewhat faster than 7900gtx SLI on PC. You could run Oblivion at say 2048x1152 and get 60fps.

That was 2006. :) Granted, it was $500 and burned about 175W power......
 
The C2D is close to double the transistor count of the Xenon to do it though, in terms of perf/die area or perf/transistor it loses.

Going back a bit, I do feel compelled to note that the E6750 always runs at 3.2GHz so the perf/die loss isn't as bad. Also my CPU has a lot of L2 cache, which takes up a lot of transistors and die are. The cut-down Conroes with half the L2 don't perform that much worse.

Power wise I suspect my CPU doesn't draw much more than Xenon (at least the earlier revisions) since I have it undervolted (even with the 540MHz overclock).

The point I'm trying to make is, would Microsoft be better off with a CPU more similar to a modern PC CPU than what they ended up with? Or in other words, with the same power and die area constraints, would developers prefer a C2D type CPU over Xenon?
 
I don't think PC hardware is ever a good choice because they save money with custom chips with maximum integration, and with some control over the IP and manufacturing and such. Things didn't go very well with Xbox 1 for some of those reasons.

Also, Core 2 came out in 2006. Xenon should be compared to Athlon 64 X2 and Pentium D, two chips that sold for huge money in 2005. I suppose Core Duo could also be considered but it has weak FPU/SIMD and lower clock rate.
 
The point I'm trying to make is, would Microsoft be better off with a CPU more similar to a modern PC CPU than what they ended up with? Or in other words, with the same power and die area constraints, would developers prefer a C2D type CPU over Xenon?

What part are you thinking could have been in the xbox360 in 2005 that fits the die size and power constraints of XCPU that wouldn't have cost more and/or been an ultimately weaker part?
 
I don't think PC hardware is ever a good choice because they save money with custom chips with maximum integration, and with some control over the IP and manufacturing and such. Things didn't go very well with Xbox 1 for some of those reasons.

I don't mean a straight copy of a PC chip planted in the console. Of course they could loose some cache and other things that aren't really necessary for a gaming console and come up with a smaller and more power efficient chip. But if the general architecture of the C2D was used for Xenon instead of the in order processor that they ended up with, would that be considered a better choice?

What part are you thinking could have been in the xbox360 in 2005 that fits the die size and power constraints of XCPU that wouldn't have cost more and/or been an ultimately weaker part?

There is no particular part I have in mind since like I said I don't think a straight up PC CPU would be ideal for a console. Too many wasted transistors on unnecessary things. But if the general architecture of C2D had been used with the proper considerations for building a console CPU, how would such a chip compare to Xenon? This is very much a hypothetical question and may be simply impossible to answer, but there are some smart folks here and I'd like to hear their opinions. :smile:
 
But if the general architecture of C2D had been used with the proper considerations for building a console CPU, how would such a chip compare to Xenon?
Intel released Pentium D (dual core) just before Xbox 360 launch. It was brand new. Even Pentium D would have had hard time being ready for the launch (console hardware must be ready considerable earlier than the launch, as all SDKs, libraries and software need to be developed and optimized for it, and developers need final devkits before the launch as well, and the processor needs to be ready for high yield mass production). It would be better to compare Xenon to Pentium 4 (with HT) and Athlon X2 instead.

The first Core 2 chips were dual cores (glued together quads came up later, and real quad cores much later), and Core 2 didn't have hyperthreading either. So you would be comparing a lower clocked (2.93GHz) 2 thread CPU with a higher clocked (3.2GHz) 6 thread CPU. Still the Core 2 would have likely performed better on average, but not enough better to warrant a year or 1.5 year later console launch date. I think Microsoft did a really good choice by picking a PPC core instead of x86 core considering the x86 options available at that time.
 
So you would be comparing a lower clocked (2.93GHz) 2 thread CPU with a higher clocked (3.2GHz) 6 thread CPU
Agreed on your analysis (there was nothing really at the time that was much better for games work). but is it really fair to call it a "6 thread machine vs 2 thread machine"? I mean unless the throughput of those 6 threads is fully independent (i.e. they are cores), it's not really fair to imply that it does 6 times the work of a single thread. That's like the GPU folks saying their machines run 20k threads, etc ;)

I'm not super-familiar with the 360 CPU architecture, but I'm assuming it has to round robin those HW threads. A few folks are telling me that it operates sort of like a 1.6Ghz 6 core machine in practice, which is worth noting that is actually *worse* than a 3.2Ghz 3 core machine in terms of parallel performance, because there is always some amount of overhead for parallelism.

Still, agreed on the conclusion that at the time it was a good choice.
 
A few folks are telling me that it operates sort of like a 1.6Ghz 6 core machine in practice, which is worth noting that is actually *worse* than a 3.2Ghz 3 core machine in terms of parallel performance, because there is always some amount of overhead for parallelism.
Nobody is forcing developers to run 6 threads (two threads per core using SMT). You can also run one thread per core if it suits your workload better (or decide what's best for each core invidually).

It's hard to estimate how much extra performance SMT gives. x86 applications (that are designed to scale well) run usually around 20-30% faster with Intel hyperthreading. So basically you could estimate that HT roughly transforms a 4 core CPU to a 8 core CPU with 62.5% clocks (assuming perfect parallel scaling, and linear clock/performance scaling). In-order CPUs of course have more stall situations so the gains of running more HW threads per core are usually higher (but of course depend on the game). SMT is very important for current generation console performance. It's not that hard to find enough free parallerism in the games to run six threads efficiently.
 
Last edited by a moderator:
Do the executables for PC throw out that threading? Lots of recent games still seem to use only 2 cores / major threads.
 
Nobody is forcing developers to run 6 threads (two threads per core using SMT). You can also run one thread per core if it suits your workload better (or decide what's best for each core invidually).
I was told that if you run 1 thread/core you literally get half throughput (i.e. you waste the cycles that would have gone to the other HW thread)... is that not true?

SMT is very important for current generation console performance. It's not that hard to find enough free parallerism in the games to run six threads efficiently.
I'm not disputing that. What I'm disputing is the characterization that the 360 CPU behaves like a 6 "core" 3.2GHz machine. At best, it behaves like a 3 core 3.2GHz machine and it sounds like in practice it's more like a 6 core 1.6Ghz machine, since you need to run 6 threads to get peak throughput. And I'm noting that a machine that requires more parallelism to reach equal throughput is actually strictly worse than a machine that can run that throughput with serial code. Obviously there are hardware trade-offs that make it easier/cheaper/more power-efficient to go the parallelism route, I'm just saying it shouldn't be looked at like something that is desirable for the programmer :)
 
The SMT implementation on Xenon is rather simplistic. Actually, as far as I know, only one thread is running at a time -- more like a static interleaving, but definitely not as elaborate as other vertical MT architectures. On the other hand, such method is better suited for the in-order nature of Xenon, mainly because it avoids stalling due to poor memory performance (no out-of-order loads/stores), and the wimpy half-clocked L2 doesn't help much here either. Indeed, it behaves more like a 3-way SMT, than 6-way.
 
Sure having some extra register space for a couple HW threads to hide memory and instruction latencies is always more convenient than doing it yourself with prefetching or similar, but can Xenon even run full instruction throughput with only one thread but all data in registers? I wouldn't be surprised if the instruction latencies were designed such that your *require* two HW threads/core to avoid pipeline stalls, but someone here should know for sure, right? :)
 
SIMD register file is designed for dual thread occupancy, so that the two active threads could follow more rapidly each other... kind of. And I guess it helps with covering the memory access latency.
 
Do the executables for PC throw out that threading? Lots of recent games still seem to use only 2 cores / major threads.
Quad core (and even dual core + HT) Sandy Bridge CPUs are so much faster than the current generation console CPUs that adding extra threads/cores (or clocks beyond certain point) doesn't improve the performance at all (since most games are designed mainly for consoles). Too bad there aren't many games out there that are designed solely for high performance PC hardware, and fully utilize the 8/12 threads of the Sandy Bridge (and Bulldozer) CPUs.

It would be interesting to perform some CPU benchmarks on the energy efficient (low clocked) dual core Sandy Bridge models (that are present in all the new Ultrabooks, Macbook Air and some of the forthcoming Win8 tablets) to see how much HT really helps in games when the CPU performance is the primary bottleneck. Most games nowadays should scale properly to at least four threads (dual core + HT = 4 threads). But even in these platforms, the GPU is often a bigger bottleneck in games (HD 3000 doesn't keep up that well with the class leading Sandy Bridge CPU performance).
 
It would be interesting to perform some CPU benchmarks on the energy efficient (low clocked) dual core Sandy Bridge models (that are present in all the new Ultrabooks, Macbook Air and some of the forthcoming Win8 tablets) to see how much HT really helps in games when the CPU performance is the primary bottleneck.

I can tell you that a 3.2GHz SNB dual core is tremendously faster than my 3.2GHz Conroe, which lacks HT. There are certainly other factors at play which make the SNB so much faster but I would estimate that HT is playing a rather large role here.
 
Back
Top