Future CPU performance

N3xtG3nGam3r

Newcomer
Im not extremely technical, but i can understand most stuff. I figured i would put that out there before anyone reads what i ask and tries to explain rocket science.

Anyways, we all know about the processors inside of the PS3 and the 360. We know that the PS3 uses the Cell, and the 360 uses a Tri-Core IBM power processor which is dual threaded. These CPU's are not even, but would i be correct to say that they are somewhat close? Moving forward, the thing i read most frequently about the Cell processor is that it could possibly outperform the 360's CPU later on. Exactly how true do you guys think this is, and do you think that the CPU's will have been used to their maximum potential by the end of these systems generations? Do you think that the 360 will fall short, a year or two before the end of its life, and not be able to muster any more performance out of the Tri-Core processor? Also, the same can be asked about the Cell. Do you think the Cell will even have been programmed enough for by developers, by the end of the consoles life, to even sufficiently out-perform the 360's CPU?

Is all of this possible in a 4 year time frame?

I have also read other threads, talking about the future of these processor designs, and plans for next generation consoles. It really seems like then, it will be at the point, that there will be a processor for every single thing that needs to be processed. Is that necessary? Considering that the original Xbox didnt even hit its maximum, do you think the same will happen with these consoles? With their processors? With their GPU's?

The question is focused mainly on processors, because i know most of the GPU issues.
 
In short, I dont know.

I think it depends on if devs can multi-thread code on Cell.

It wont be easy, because of the 256kb LS limitation.
 
I don't think that many would argue with fact that Cell has more potential power than the Xenon, possibly MUCH more on some tasks, but it also takes more work to harness that power--especially for developers used to traditional architectures.

As to whether that power will be harnessed to a large degree within a generation lifecycle, I think that depends on no small degree on the success of the PS3. The PS2 architecture, for instance, was only so thoroughly explored because of the console's dominant position in the market. The PS3 on the other hand is off to a slow start so far. That said, Cell knowledge is much more freely available, and progress in using it properly is progressing at a much faster rate than the PS2, so who knows.
 
These CPU's are not even, but would i be correct to say that they are somewhat close? Moving forward, the thing i read most frequently about the Cell processor is that it could possibly outperform the 360's CPU later on. Exactly how true do you guys think this is, and do you think that the CPU's will have been used to their maximum potential by the end of these systems generations? Do you think that the 360 will fall short, a year or two before the end of its life, and not be able to muster any more performance out of the Tri-Core processor? Also, the same can be asked about the Cell. Do you think the Cell will even have been programmed enough for by developers, by the end of the consoles life, to even sufficiently out-perform the 360's CPU?

I guess all these questions will be answered in due time. On paper, Cell should outperform the XCPU in the long run, but it will all depend on how much of the processor will be exploited.


Is all of this possible in a 4 year time frame?

The situation seems to be much better than in the PS2 days, so it is possible that developers will be able to get very good performance out of Cell much more quickly and easily than they did with the EE.
I have also read other threads, talking about the future of these processor designs, and plans for next generation consoles. It really seems like then, it will be at the point, that there will be a processor for every single thing that needs to be processed.

I don't understand that. What exactly do you mean by "having a processor for every single thing that needs to be processed"? If anything, it seems things are moving towards a more general purpose mentality, with processors like the SPEs which do not have hardwired functions (like GPUs), and can process a lot of different tasks.

Is that necessary? Considering that the original Xbox didnt even hit its maximum, do you think the same will happen with these consoles? With their processors? With their GPU's?

Consoles never "hit their maximum". It never happened and it will never happen. However, even if the maximum performance devs could squeeze out of the Xbox was 80%, that was the maximum they could get out of, so that's 100% of what we're ever gonna get. The whole "hit the maximum in consoles" thing is very inaccurate. Devs will never get to 100% of PS3 or X360, that doesn't mean that we should go forward and get PS4 or X720, because at the end of the day, 80% of PS4 will still be many times what we get from 80% of PS3.
 
Do you think the Cell will even have been programmed enough for by developers, by the end of the consoles life, to even sufficiently out-perform the 360's CPU?
That is a kind of strange question. However, I think it is safe to say that the tool support and available knowledge database for the Cell is vastly much better than what it was for the EE when the PS2 launched. Here are a couple of nice articles about some of it.
http://psinext.e-mpire.com/index.php?categoryid=3&m_articles_articleid=722
http://psinext.e-mpire.com/index.php?categoryid=3&m_articles_articleid=667

The fact there will be free Linux dev kits running on the PS3 means that there will be plenty of ground for growing and sharing knowledge about the Cell. I think we are already witnessing some of that on this forum and some other forums will definitely be more be active in this area. ;)

I think you can be quite sure that there will be developers that will squeze a lot of power ut of the Cell. Consider what Evolution Studio has created with the launch title Motorstorm, what do you think they will be able to create on the PS3 within 4 years?
 
SPU's are crazy fast. People are still under the impression that are just smaller crippled full CPU cores. But they are only crippled in the sense that a drag racer is crippled compared to a pickup truck.
 
Cell processor is that it could possibly outperform the 360's CPU later on.

I always find it funny how the Cell is always suppost be "possibly better later on" while it has much more power than Xenon.
 
I always find it funny how the Cell is always suppost be "possibly better later on" while it has much more power than Xenon.

It's only been just a little over a month since the PS3 was released. And people are just being cautious.

But in the non-gaming space, Cell has already shown it's value against traditional multi core approaches.

On the PS3, there are a couple titles out that make use of the SPUs - like Resistance and Motorstorm. But they are launch window games and they don't really make use of them to the point where you could not pull the same thing off on the Xenon if you tried really hard.

If Heavenly Sword really does come out in March we might have the first example of a game that you simply could not pull off on Xenon. But otherwise we probably won't see many good examples until late 2007.
 
(This is a genuine question)

What is HS doing that couldn't be done on 360?

Just of the things we know about: Large scale army's simulations. Hair simulations. Cloth simulations. Even just these things running at the same time would be impressive.

Of course we won't know they extent they are being used until the game comes out.
 
Just of the things we know about: Large scale army's simulations. Hair simulations. Cloth simulations. Even just these things running at the same time would be impressive.

Of course we won't know they extent they are being used until the game comes out.

Or until Marco and Dean spill the beans!!
 
I doubt Deano or nAo would ever say something like "Our game couldn't be done on the Xbox 360"... ;)

Of course not, but they could tell us what things are being done on Cell, and there would be endless threads on here on how Xenon could or could not handle those tasks at the same time ;)
 
I doubt Deano or nAo would ever say something like "Our game couldn't be done on the Xbox 360"... ;)


How about this quote: link
nAo - Dec 29th 06 said:
PS3 is already matching 360 now, with (probably) inferior tools and less experience on the system wrt 360, it can only get vastly better, no doubt about it. Non multiplatform titles are going to show this quite soon imho
 
Anyways, we all know about the processors inside of the PS3 and the 360. We know that the PS3 uses the Cell, and the 360 uses a Tri-Core IBM power processor which is dual threaded. These CPU's are not even, but would i be correct to say that they are somewhat close?
In all likelihood, Cell will offer much better performance than Xenon, once knowledge and toolsets improve.

Moving forward, the thing i read most frequently about the Cell processor is that it could possibly outperform the 360's CPU later on. Exactly how true do you guys think this is, and do you think that the CPU's will have been used to their maximum potential by the end of these systems generations?
In peak numbers alone, Cell offers significantly more performance. In the PS3, it has a conservative total of 7 full cores (1 PPE + 6 SPEs available to a game).
Xenon has 3 cores that can run two threads.

This is a stupid metric, but in thread throughput, Cell offers 6 SPE threads and 2 PPE threads, Cell sports 33% more threads than Xenon (SPEs technically aren't quite the same to program for, but there is a rough equivalence good enough for this discussion).
On Cell, most of those threads get a dedicated core. On Xenon, they each get some fraction of a core.

Xenon's cores are expected to play the role of both the PPE and SPE in Cell. They are probably capable of playing the same role as 3 PPEs, but they will not match 6 SPEs. They don't have the silicon.

If we assume the PPE and one of Xenon's cores cancel each other out, it leaves 2 Xenon cores vs 6 SPEs. Those cores are twice as wide as an SPE, so that makes it--in peak numbers only-- 4 to 6 when it comes to instruction issue per clock (actually worse, SPEs have some dual-issue, so it may be back up to something like 4 vs 7 or 4 vs 8).

With thread counts, the number is 4 vs. 6 again, but since this is SMT, each thread in a high-load situation can expect to get only part of a core's peak execution.

Xenon's shared cache is small: 1MB for 3 cores.
When it comes to on-chip memory, the PPE gets 512 KB, and each SPE gets 256 KB.
Cell kicks the crap out of Xenon for on-chip storage, to put it mildly.

Neither Xenon or Cell are as forgiving as a desktop processor when it comes to optimization, and SPEs can get pretty finicky on top of that, however, quantity has a quality all of its own.

There are shortcomings to each individual SPE, but Cell just offers more silicon that can be used to compensate. Xenon doesn't have as much to offer, and it has its own share of shortcomings.

Do you think that the 360 will fall short, a year or two before the end of its life, and not be able to muster any more performance out of the Tri-Core processor? Also, the same can be asked about the Cell. Do you think the Cell will even have been programmed enough for by developers, by the end of the consoles life, to even sufficiently out-perform the 360's CPU?

It's doing pretty well now. Both cores should probably see better utilization with time, but there's a curve associated with the gains. You don't expect each year to show double the performance wrung out of a design, so Xenon's biggest gains are likely going to be used up in a year or two.
Cell, being released later, will likely not get past its peak until a year (maybe two if some unforseen snafu pops up) after, and it simply has more to wring out.

Is all of this possible in a 4 year time frame?
There's enough money in this to say it is likely that Cell will outshine Xenon on CPU-limited tasks in that time frame. There are other elements to each system that make things hazier, and non-technical reasons (economics, marketing, public perception) that can dominate technical ones.

Most multiplatform games will probably keep their different versions reasonably similar. Most smaller developers will be limited more by the money needed for programming and asset creation. Cell will likely require more proramming effort to truly beat Xenon, but even gradually picking up tips, tricks, and improved tools over four years can lead to enough accumulation to use it pretty well.

The big-name AAA exclusive titles are where it is likely that we will see the differences first, though by the end of 4 years it may be that most PS3 titles will be noticeably different, either in the number of features/units/or effects.

I have also read other threads, talking about the future of these processor designs, and plans for next generation consoles. It really seems like then, it will be at the point, that there will be a processor for every single thing that needs to be processed. Is that necessary? Considering that the original Xbox didnt even hit its maximum, do you think the same will happen with these consoles? With their processors? With their GPU's?
The Xbox hit what could be reasonably expected from the platform in an imperfect world.
No CPU ever hits its limit because something always gets in the way.

Due to silicon scaling issues, some specialization should be expected, if only because general cores will not fit in the power dissipation envelope of a console.

It isn't necessary to have a core for every tiny task, just the big computationaly heavy ones, and only enough to do well.
 
The Xenon core is not identical to the PPE. We don't have enough info disclosed on the Xenon to say if it is better overall than the PPE. But we do know Xenon cores have 128 VMX registers each, while the PPE only has 32. This should reduce the need to go to memory/cache as often when running vector heavy code.

Still, based on benchmarks by IBM and others, in the purely SIMD/vector instruction stream processing dept, even a single SPU can crush it by a factor of 2 or more.

3dilettante say's the Xenon cores are "twice as wide" as an SPU. I guess he is refering to SMT. All the registers are duplicated so there are twice as many of them. But execution units are not. Xenon threads (and PPU) can only have instructions of different types (either VMX, VXU or FPU) in flight at the same time.

A Xenon core can't have 2 instructions of the same type executing at the same time. And specificaly, cannot have 2 SIMD instructions executing at the same time. If you had 2 threads where one is doing mostly SIMD and the other isn't they could play neatly together on the same core. But It's not the equivalent of 2 CPUs by any means.

SPU's on the other hand have 2 pipelines, and can issue and complete 2 SIMD instructions per cycle. The instructions can't be identical. But with careful instruction ordering you can get a lot of them to run in parallel.

So lets say you need to run a really complicated physics system and you needed to leverage as much SIMD power as possible. I don't feel it is unreasonable to say that even in a best case scenario you would need a full Xenon core to perform the job of 1 SPU.

The worse (not worst) case scenario would be that you will need to consume more than 1 full Xenon core just to match the SIMD performance of a single SPU. It might sound crazy, but I'll go out on a limb and say in later generation apps we will see this scenario become common.

These 2 machines are not equals when it comes to SIMD performance. And SIMD is where the real performance is at. And as people understand them better the margins are going to get wider and wider in the Cell's favor.
 
Just of the things we know about: Large scale army's simulations. Hair simulations. Cloth simulations. Even just these things running at the same time would be impressive.

Of course we won't know they extent they are being used until the game comes out.

There are always compromises depending on platform strengths and weaknesses, but the specific things you mention are things already done on year old titles (e.g. large scale armies in games like Kameo and N3, cloth simulation in NBA 2K6, and so forth).

In all likelihood, Cell will offer much better performance than Xenon, once knowledge and toolsets improve.

...

There are shortcomings to each individual SPE, but Cell just offers more silicon that can be used to compensate. Xenon doesn't have as much to offer, and it has its own share of shortcomings.

Fair points, but it is also noteworthy that Cell is about 50% larger in die space and transistors, which explains some of the proportion figures you are using.

Some of my own ramblings...

There is no doubt Cell, as an architecture, is answering some of the problems processors are facing. The fast internal bandwidth and the local store are potentially huge wins. As you noted both Xenon and Cell have their shortcomings and Cell has some nice compromises that specifically address issues that can be significant bottlenecks. Tossing in the fact Cell is a forward looking approach to parallel processing as well as a foundation for an architectural family that Cell2 (and PS4) can build and (and developers leverage their code and knowledge to hit the ground running) I think Cell has some major upside.

On the other hand Cell does make some significant compromises beyond what even Xenon makes (Local Stores are not coherant like L2 cache and streaming can eat into LS, no branch prediction in SPEs, total chip memory may be large but it is broken into a number of smaller segments, assymetric design, and so forth) which not only have an impact in "getting your head around the architecture" but it isn't always a performance win for every sort of processing task. There are areas where Cell absolutely rocks (or can/will), especially when you can throw a lot of SPEs at the problem and it scales fairly linearly and can fit into the Local Store, but from what I have heard this isn't nearly always the case. There is huge potential with Cell but not every problem is SIMD or easily solved (at this point, and this has been an area of work for decades) with gross parallization. So the simplistic flop, total chip memory (versus, say, largest potentual 'thread' memory), and 'core' counting really doesn't explain the differences / performance deviations well. From what I have seen the areas where Cell rocks have as much to do with the SPE architecture on an individual level, and vice versa for areas where it isn't a win.

I do find it interesting that the IBM roadmap for Cell2 shows 2 PPEs and 32SPEs. This is 2x as many SPEs as PPEs in the current PS3 Cell. Those could quite possibly be far superior PPEs, but it seems there are a lot of issues to overcome to get to the point where the PPE resources per SPE are cut in half. But if DeanoC is right and that by the end of the PS3's lifecycle the SPEs will not only be doing the heavy lifting but pretty much everything else as well this could work out. It seems we are a bit away from that at this point... not to mention the consequences to the market in general. Porting from the PC or a system similar to such (assuming things like Terrascale from Intel or their 32 processor machine or AMD's HTT 'open door' to stuff like Clearspeed of Fusion... the future looks hazy) to a system with 32 SPEs could be a HUGE hurdle.

At this point the next 3 years should answer a lot of questions. I think Sony's path is pretty clear and solid. Devs should have a lot of working code and libraries for SPEs and Cell2 should just let them toss that on there and crank it up over even more SPEs. I think Sony has positioned themselves with a "platform". I do wonder if MS will go with AMD/ATI (Fusion like solution; a traditional multi-core with additional cores with something like a large shader array or Clearspeed ), Intel (a traditional CPU with maybe a core or two of Terra-scale processors, maybe a derivative of their 32 core / 128 thread server CPU in development) or maybe even IBM and a Cell configuration (maybe more / more robust PPEs than Cell2?). MS is cloudy right now. While I would expect probably 2x the Cell2s from the IBM roadmap (as I expect PS4 in 2011 at the earliest, I am thinking 2012 and 32nm), I would not put it past MS (and maybe Sony) to shift some silicon budget to the GPU. As GPUs are becoming more versatile "processing pools" that can adequately take up tasks that map well to them (and in some areas overlap with the strengths of a Cell or Terra-scale architecture) the flexibility of tossing resources at, say, "more graphics" or "more physics" may be worth the shift in budget.

We could very well see Sony have something like 2x Cell2 and 1 GPU, and MS go with 1 multi-core processor and an extra large GPU or more modular GPU design (e.g. 2 GPUs that are able to share resources and more efficient than what we see with SLI which wastes memory and has maybe 50-80% performance gains). For Sony I don't think such is much of a surprise (especially as Cell can be used more and more for graphics as well), but for MS such a design would follow their trend to making the GPU a more central part of the system as well as offer a path to the PC. Going with an esoteric design on the Xbox3 could damage the synergy they have. Going with an approach that further leverages GPU technology and the DX API may be something they look at.

That said, I think memory architectures are going to be probably the biggest design decisions next generation. All these CPUs have to access memory controller(s) and system memory. Bandwidth, latency, coherance, and a host of issues are going to play a significant role in actual utilization. We have seen with the SPEs one of the benefits of having a very fast local memory to work in directly. With ZRAM (and thus inflated memory stores) and some of the tiered cache structures being discussed (Intel was talking about cores with like 1MB each, grouped in like quads with 4MB or such for the quads shared, and then for a number of quads having over 10MB) I think CPU designs and architectures could see some radical design changes. I think they are all now testing the waters for what are the bottlenecks that need to be addressed to get the best utilization for future designs.
 
ineffient, I'm almost sure you are wrong about the number of instrucion tthe vmx128 unit can issu in one cycle.
I've read the decoupled fp/vmx unit as its own queue list for instructions and can issue two instructions per cycle.
I remenber this clearly because I ask some time ago i ask if xenon can be view under strict circontencies as a four issu chip given the rest of the chip (integer+plus branch) can also issue two instruction per cycle. Archie# (soorry I can remenber the number in his nickname) respond me that the xenon is still two issue chip no matter what and that the reserved queue list act more like as reservation.
I also remenber that Erp told that its dev team has bench ppe and px and the px is slightly better (he added that he didn't know why).

My point is that if you only use one thread that fp/simd heavy the result of one xenon core should be close to the result you have with a spe(more they had some hardwired fonctions like dot product,etc... who can help). But if you use two thread fp/simd heavy the px will act as a 1.6ghz core ;) (what you were saying about smt and wideness sitll hold)).

Joker454 hinted that the best way to use xenon should be to have 3 fp/simd heavy threads and three "integer/branch" heavy threads to make the most of the xenon.

It doesn't change the fact that the cell has more legs and that xenon prove to be less flexible than cell. Like 3dilletante said quantity is quality of its own ;)
If you more than 3 fp/simd thread in your engine, at least one core in xenon will act as a 1.6 ghz cpu.

Feel free to correct (in i'm not knowledgeable i'just try to read the most....i can understand lol), but i'm almost sure about the two issue nature of the decoupled fp/vmx128 unit, i red it in a IBM official paper some time ago.
 
Last edited by a moderator:
I just want to add that i feel weird about correct you because you're far more knowledgeable than me ;)

So my second point is that if i'm wrong or i missunderstood what i've red, feel free (you or 3dilletante) to correct AND explain me ;)
 
Joker454 hinted that the best way to use xenon should be to have 3 fp/simd heavy threads and three "integer/branch" heavy threads to make the most of the xenon.
That is also a good approach for the two threads of the PPE core of the Cell.

Chapter 10 PPE Multithreading in the Cell BE programming Handbook, describes in detail how to best utilise the PPE core. Some of it probably applies to Xenon as well. Like this one.

Pointer chasing and scattered memory accesses are excellent examples of applications in which multithreading really shines, even if both threads are running the same type of memory-latency-sensitive application. For example a speedup of 2x is entirely possible when running memory-pointer-chases on both threads.

It can be worth noting that type of memory-pointer-chases code probably would run pretty bad on an SPE. You would like to organise the data differently to make the code run well on an SPE, mean while the CELL still has the PPE core to run that kind of code, which anyway is a type of code you try to keep to a minimum.
 
Back
Top