Join Date: Apr 2004
Location: Nijmegen, The Netherlands
Real or fake Anandtech article on Next Gen consoles ?
Didn't see it here yet, sorry if I had to dig deeper in all the topics.
Story is that it appeared on Anandtech, but was pulled later.
From Google cache:
Microsoft's Xbox 360 & Sony's PlayStation 3 - Examples of Poor CPU Performance
Date: June 29th, 2005 Author: Anand Lal Shimpi
"In our last article we had a fairly open-ended discussion about many of the challenges facing both of the
recently announced next-generation game consoles. We discussed misconceptions about the Cell processor
and its ability to accelerate physics calculations, as well as touched on the GPUs of both platforms. In
the end, both the Xbox 360 and the PlayStation 3 are much closer competitors than you would think based on
The Xbox 360's Xenon CPU features more general purpose cores than the PlayStation 3 (3 vs. 1), however
game developers will most likely only be using one of those cores for the majority of their calculations,
leveling the playing field considerably.
The Cell processor derives much of its power from its array of 7 SPEs (Synergistic Processing Elements),
however as we discovered in our last article, their purpose is far more specialized than we had thought.
Speaking with Epic Games' head developer, Tim Sweeney, he provided a much more balanced view of what sorts
of tasks could take advantage of the Cell's SPE array.
The GPUs of the next-generation platforms also proved to be quite interesting. In Part I we speculated as
to the true nature of NVIDIA's RSX in the PS3, concluding that it's quite likely little more than a higher
clocked G70 GPU. We will expand on that discussion a bit more in this article. We also looked at Xenos,
the Xbox 360's GPU and characterized it as equivalent to a very flexible 24-pipe R420. Despite the
inclusion of the 10MB of embedded DRAM, Xenos and RSX ended up being quite similar in our expectations for
performance; and that pretty much summarized all of our findings - the two consoles, although implementing
very different architectures, ended up being so very similar.
So we've concluded that the two platforms will probably end up performing very similarly, but there was
one very important element excluded from the first article: a comparison to present-day PC architectures.
The reason a comparison to PC architectures is important is because it provides an evaluation point to
gauge the expected performance of these next-generation consoles. We've heard countless times that these
new consoles would offer better gaming performance than anything we've had on the PC, or anything we would
have for a matter of years. Now it's time to actually put those claims to the test, and that's exactly
what we did.
Speaking under conditions of anonymity with real world game developers who have had first hand experience
writing code for both the Xbox 360 and PlayStation 3 hardware (and dev kits where applicable), we asked
them for nothing more than their brutal honesty. What did they think of these new consoles? Are they
really outfitted with the PC-eclipsing performance we've been lead to believe they have? The answer is
actually quite frequently found in history; as with anything, you get what you pay for.
Learning from Generation X The original Xbox console marked a very important step in the evolution of
gaming consoles - it was the first console that was little more than a Windows PC.
It featured a 733MHz Pentium III processor with a 128KB L2 cache, paired up with a modified version of
NVIDIA's nForce chipset (modified to support Intel's Pentium III bus instead of the Athlon XP it was
designed for). The nForce chipset featured an integrated GPU, codenamed the NV2A, offering performance
very similar to that of a GeForce3. The system had a 5X PC DVD drive and an 8GB IDE hard drive, and all
of the controllers interfaced to the console using USB cables with a proprietary connector.
For the most part, game developers were quite pleased with the original Xbox. It offered them a much more
powerful CPU, GPU and overall platform than anything had before. But as time went on, there were
definitely limitations that developers ran into with the first Xbox.
One of the biggest limitations ended up being the meager 64MB of memory that the system shipped with.
Developers had asked for 128MB and the motherboard even had positions silk screened for an additional
64MB, but in an attempt to control costs the final console only shipped with 64MB of memory.
The next problem is that the NV2A GPU ended up not having the fill rate and memory bandwidth necessary to
drive high resolutions, which kept the Xbox from being used as a HD console.
Although Intel outfitted the original Xbox with a Pentium III/Celeron hybrid in order to improve
performance yet maintain its low cost, at 733MHz that quickly became a performance bottleneck for more
complex games after the console's introduction.
The combination of GPU and CPU limitations made 30 fps a frame rate target for many games, while simpler
titles were able to run at 60 fps. Split screen play on Halo would even stutter below 30 fps depending on
what was happening on screen, and that was just a first-generation title. More experience with the Xbox
brought creative solutions to the limitations of the console, but clearly most game developers had a wish
list of things they would have liked to have seen in the Xbox successor. Similar complaints were levied
against the PlayStation 2, but in some cases they were more extreme (e.g. its 4MB frame buffer).
Given that consoles are generally evolutionary, taking lessons learned in previous generations and
delivering what the game developers want in order to create the next-generation of titles, it isn't a
surprise to see that a number of these problems are fixed in the Xbox 360 and PlayStation 3.
One of the most important changes with the new consoles is that system memory has been bumped from 64MB on
the original Xbox to a whopping 512MB on both the Xbox 360 and the PlayStation 3. For the Xbox, that's a
factor of 8 increase, and over 12x the total memory present on the PlayStation 2.
The other important improvement with the next-generation of consoles is that the GPUs have been improved
tremendously. With 6 - 12 month product cycles, it's no surprise that in the past 4 years GPUs have
become much more powerful. By far the biggest upgrade these new consoles will offer, from a graphics
standpoint, is the ability to support HD resolutions.
There are obviously other, less-performance oriented improvements such as wireless controllers and more
ubiquitous multi-channel sound support. And with Sony's PlayStation 3, disc capacity goes up thanks to
their embracing the Blu-ray standard.
But then we come to the issue of the CPUs in these next-generation consoles, and the level of improvement
they offer. Both the Xbox 360 and the PlayStation 3 offer multi-core CPUs to supposedly usher in a new
era of improved game physics and reality. Unfortunately, as we have found out, the desire to bring multi-
core CPUs to these consoles was made a reality at the expense of performance in a very big way.
Problems with the Architecture At the heart of both the Xenon and Cell processors is IBM's custom PowerPC
based core. We've discussed this core in our previous articles, but it is best characterized as being
quite simple. The core itself is a very narrow 2-issue in-order execution core, featuring a 64KB L1 cache
(32K instruction/32K data) and either a 1MB or 512KB L2 cache (for Xenon or Cell, respectively).
Supporting SMT, the core can execute two threads simultaneously similar to a Hyper Threading enabled
Pentium 4. The Xenon CPU is made up of three of these cores, while Cell features just one.
Each individual core is extremely small, making the 3-core Xenon CPU in the Xbox 360 smaller than a single
core 90nm Pentium 4. While we don't have exact die sizes, we've heard that the number is around 1/2 the
size of the 90nm Prescott die.
IBM's pitch to Microsoft was based on the peak theoretical floating point performance-per-dollar that the
Xenon CPU would offer, and given Microsoft's focus on cost savings with the Xbox 360, they took the bait.
While Microsoft and Sony have been childishly playing this flops-war, comparing the 1 TFLOPs processing
power of the Xenon CPU to the 2 TFLOPs processing power of the Cell, the real-world performance war has
already been lost.
Right now, from what we've heard, the real-world performance of the Xenon CPU is about twice that of the
733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the
next 4 - 5 years, it's nothing short of disappointing. To put it in perspective, floating point
multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.
The reason for the poor performance? The very narrow 2-issue in-order core also happens to be very deeply
pipelined, apparently with a branch predictor that's not the best in the business. In the end, you get
what you pay for, and with such a small core, it's no surprise that performance isn't anywhere near the
Athlon 64 or Pentium 4 class.
The Cell processor doesn't get off the hook just because it only uses a single one of these horribly slow
cores; the SPE array ends up being fairly useless in the majority of situations, making it little more
than a waste of die space.
We mentioned before that collision detection is able to be accelerated on the SPEs of Cell, despite being
fairly branch heavy. The lack of a branch predictor in the SPEs apparently isn't that big of a deal,
since most collision detection branches are basically random and can't be predicted even with the best
branch predictor. So not having a branch predictor doesn't hurt, what does hurt however is the very small
amount of local memory available to each SPE. In order to access main memory, the SPE places a DMA
request on the bus (or the PPE can initiate the DMA request) and waits for it to be fulfilled. From those
that have had experience with the PS3 development kits, this access takes far too long to be used in many
real world scenarios. It is the small amount of local memory that each SPE has access to that limits the
SPEs from being able to work on more than a handful of tasks. While physics acceleration is an important
one, there are many more tasks that can't be accelerated by the SPEs because of the memory limitation.
The other point that has been made is that even if you can offload some of the physics calculations to the
SPE array, the Cell's PPE ends up being a pretty big bottleneck thanks to its overall lackluster
performance. It's akin to having an extremely fast GPU but without a fast CPU to pair it up with.
What About Multithreading? We of course asked the obvious question: would game developers rather have 3
slow general purpose cores, or one of those cores paired with an array of specialized SPEs? The response
was unanimous, everyone we have spoken to would rather take the general purpose core approach.
Citing everything from ease of programming to the limitations of the SPEs we mentioned previously, the
Xbox 360 appears to be the more developer-friendly of the two platforms according to the cross-platform
developers we've spoken to. Despite being more developer-friendly, the Xenon CPU is still not what
The most ironic bit of it all is that according to developers, if either manufacturer had decided to use
an Athlon 64 or a Pentium D in their next-gen console, they would be significantly ahead of the
competition in terms of CPU performance.
While the developers we've spoken to agree that heavily multithreaded game engines are the future, that
future won't really take form for another 3 - 5 years. Even Microsoft admitted to us that all developers
are focusing on having, at most, one or two threads of execution for the game engine itself - not the four
or six threads that the Xbox 360 was designed for.
Even when games become more aggressive with their multithreading, targeting 2 - 4 threads, most of the
work will still be done in a single thread. It won't be until the next step in multithreaded
architectures where that single thread gets broken down even further, and by that time we'll be talking
about Xbox 720 and PlayStation 4. In the end, the more multithreaded nature of these new console CPUs
doesn't help paint much of a brighter performance picture - multithreaded or not, game developers are not
pleased with the performance of these CPUs.
What about all those Flops? The one statement that we heard over and over again was that Microsoft was
sold on the peak theoretical performance of the Xenon CPU. Ever since the announcement of the Xbox 360
and PS3 hardware, people have been set on comparing Microsoft's figure of 1 trillion floating point
operations per second to Sony's figure of 2 trillion floating point operations per second (TFLOPs). Any
AnandTech reader should know for a fact that these numbers are meaningless, but just in case you need some
reasoning for why, let's look at the facts.
First and foremost, a floating point operation can be anything; it can be adding two floating point
numbers together, or it can be performing a dot product on two floating point numbers, it can even be just
calculating the complement of a fp number. Anything that is executed on a FPU is fair game to be called a
floating point operation.
Secondly, both floating point power numbers refer to the whole system, CPU and GPU. Obviously a GPU's
floating point processing power doesn't mean anything if you're trying to run general purpose code on it
and vice versa. As we've seen from the graphics market, characterizing GPU performance in terms of generic
floating point operations per second is far from the full performance story.
Third, when a manufacturer is talking about peak floating point performance there are a few things that
they aren't taking into account. Being able to process billions of operations per second depends on
actually being able to have that many floating point operations to work on. That means that you have to
have enough bandwidth to keep the FPUs fed, no mispredicted branches, no cache misses and the right
structure of code to make sure that all of the FPUs can be fed at all times so they can execute at their
peak rates. We already know that's not the case as game developers have already told us that the Xenon
CPU isn't even in the same realm of performance as the Pentium 4 or Athlon 64. Not to mention that the
requirements for hitting peak theoretical performance are always ridiculous; caches are only so big and
thus there will come a time where a request to main memory is needed, and you can expect that request to
be fulfilled in a few hundred clock cycles, where no floating point operations will be happening at all.
So while there may be some extreme cases where the Xenon CPU can hit its peak performance, it sure isn't
happening in any real world code.
The Cell processor is no different; given that its PPE is identical to one of the PowerPC cores in Xenon,
it must derive its floating point performance superiority from its array of SPEs. So what's the issue
with 218 GFLOPs number (2 TFLOPs for the whole system)? Well, from what we've heard, game developers are
finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak
theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.
Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs.
the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations
per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can
perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add
every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console
marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4
would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No
code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64
will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency
access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to
believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of
the next-generation consoles; don't fall for it.
Why did Sony/MS do it? For Sony, it doesn't take much to see that the Cell processor is eerily similar to
the Emotion Engine in the PlayStation 2, at least conceptually. Sony clearly has an idea of what direction
they would like to go in, and it doesn't happen to be one that's aligned with much of the rest of the
industry. Sony's past successes have really come, not because of the hardware, but because of the
developers and their PSX/PS2 exclusive titles. A single hot title can ship hundreds of millions of
consoles, and by our count, Sony has had many more of those than Microsoft had with the first Xbox.
Sony shipped around 4 times as many PlayStation 2 consoles as Microsoft did Xboxes, regardless of the
hardware platform, a game developer won't turn down working with the PS2 - the install base is just that
attractive. So for Sony, the Cell processor may be strange and even undesirable for game developers, but
the developers will come regardless.
The real surprise was Microsoft; with the first Xbox, Microsoft listened very closely to the wants and
desires of game developers. This time around, despite what has been said publicly, the Xbox 360's CPU
architecture wasn't what game developers had asked for.
They wanted a multi-core CPU, but not such a significant step back in single threaded performance. When
AMD and Intel moved to multi-core designs, they did so at the expense of a few hundred MHz in clock speed,
not by taking a step back in architecture.
We suspect that a big part of Microsoft's decision to go with the Xenon core was because of its extremely
small size. A smaller die means lower system costs, and if Microsoft indeed launches the Xbox 360 at $299
the Xenon CPU will be a big reason why that was made possible.
Another contributing factor may be the fact that Microsoft wanted to own the IP of the silicon that went
into the Xbox 360. We seriously doubt that either AMD or Intel would be willing to grant them the right
to make Pentium 4 or Athlon 64 CPUs, so it may have been that IBM was the only partner willing to work
with Microsoft's terms and only with this one specific core.
Regardless of the reasoning, not a single developer we've spoken to thinks that it was the right decision.
The Saving Grace: The GPUs Although both manufacturers royally screwed up their CPUs, all developers have
agreed that they are quite pleased with the GPU power of the next-generation consoles.
First, let's talk about NVIDIA's RSX in the PlayStation 3. We discussed the possibility of RSX offloading
vertex processing onto the Cell processor, but more and more it seems that isn't the case. It looks like
the RSX will basically be a 90nm G70 with Turbo Cache running at 550MHz, and the performance will be quite
One option we didn't discuss in the last article, was that the G70 GPU may feature a number of disabled
shader pipes already to improve yield. The move to 90nm may allow for those pipes to be enabled and thus
allowing for another scenario where the RSX offers higher performance at the same transistor count as the
present-day G70. Sony may be hesitant to reveal the actual number of pixel and vertex pipes in the RSX
because honestly they won't know until a few months before mass production what their final yields will
Despite strong performance and support for 1080p, a large number of developers are targeting 720p for
their PS3 titles and won't support 1080p. Those that are simply porting current-generation games over will
have no problems running at 1080p, but anyone working on a truly next-generation title won't have the fill
rate necessary to render at 1080p.
Another interesting point is that despite its lack of "free 4X AA" like the Xbox 360, in some cases it
won't matter. Titles that use longer pixel shader programs end up being bound by pixel shader performance
rather than memory bandwidth, so the performance difference between no AA and 2X/4X AA may end up being
quite small. Not all titles will push the RSX to the limits however, and those titles will definitely see
a performance drop with AA enabled. In the end, whether the RSX's lack of embedded DRAM matters will be
entirely dependent on the game engine being developed for the platform. Games that make more extensive
use of long pixel shaders will see less of an impact with AA enabled than those that are more texture
bound. Game developers are all over the map on this one, so it wouldn't be fair to characterize all of the
games as falling into one category or another.
ATI's Xenos GPU is also looking pretty good and most are expecting performance to be very similar to the
RSX, but real world support for this won't be ready for another couple of months. Developers have just
recently received more final Xbox 360 hardware, and gauging performance of the actual Xenos GPU compared
to the R420 based solutions in the G5 development kits will take some time. Since the original dev kits
offered significantly lower performance, developers will need a bit of time to figure out what realistic
limits the Xenos GPU will have.
Final Words Just because these CPUs and GPUs are in a console doesn't mean that we should throw away years
of knowledge from the PC industry - performance doesn't come out of thin air, and peak performance is
almost never achieved. Clever marketing however, will always try to fool the consumer.
And that's what we have here today, with the Xbox 360 and PlayStation 3. Both consoles are marketed to be
much more powerful than they actually are, and from talking to numerous game developers it seems that the
real world performance of these platforms isn't anywhere near what it was supposed to be.
It looks like significant advancements in game physics won't happen on consoles for another 4 or 5 years,
although it may happen with PC games much before that.
It's not all bad news however; the good news is that both GPUs are quite possibly the most promising part
of the new consoles. With the performance that we have seen from NVIDIA's G70, we have very high
expectations for the 360 and PS3. The ability to finally run at HD resolutions in all games will bring a
much needed element to console gaming.
And let's not forget all of the other improvements to these next-generation game consoles. The CPUs,
despite being relatively lackluster, will still be faster than their predecessors and increased system
memory will give developers more breathing room. Then there are other improvements such as wireless
controllers, better online play and updated game engines that will contribute to an overall better gaming
In the end, performance could be better, the consoles aren't what they could have been had the powers at
be made some different decisions. While they will bring better quality games to market and will be better
than their predecessors, it doesn't look like they will be the end of PC gaming any more than the Xbox and
PS2 were when they were launched. The two markets will continue to coexist, with consoles being much
easier to deal with, and PCs offering some performance-derived advantages.
With much more powerful CPUs and, in the near future, more powerful GPUs, the PC paired with the right
developers should be able to bring about that revolution in game physics and graphics we've been hoping
for. Consoles will help accelerate the transition to multithreaded gaming, but it looks like it will take
PC developers to bring about real change in things like game physics, AI and other non-visual elements of
Asus K8V SE Deluxe | AMD Athlon 64 3200+ S754 | ToPower 520W | 1.5 GB DDR400 | BFG Geforce 6800 OC 350 MHz/700 MHz | Creative Audigy 2 ZS | 120 GB Maxtor DiamondMax 9 | 160 GB Maxtor DiamondMax 9 | Pioneer DVR-109 | Asus DVDROM E616 | Windows XP SP2