Will the PS2 performance increase be repeated this gen?

scooby_dooby · Mar 31, 2007

Sis said:
I disagree, since both systems introduce multithreading paradigms to deal with which I would expect requires significant and non-trivial learning. I would expect both systems to see large jumps from 1st gen games to "end of life gen" games.

The fact that the tooling experience is there only means that they don't have to take as large a step back as they would have without the tooling, in my opinion.

The multiplatform issue is a bit of a non-issue, since it is the 1st party games that have the opportunity to show off their respective systems.

Obviously this is where the majority of the gains will come, but akin to PS2? No.

People need to remember how bad PS2 games looked initially, they showed such a huge growth because the first generation was horrible. We're just not gonna see that with these consoles, the architectures are much more familiar to developers than PS2 ever was. Like I said, I think the first generation will start much higher up on the curve than the PS2 did.

TheChefO · Mar 31, 2007

oli2 said:
I have the feeling this is quite difficult in this type of discussion because how do you separate "intent" of "main idea".
Second thing which is difficulty is sticking to "differences" that leads to speak about "Ghz", "number of true / false cores", etc. which do not cover the overall figure.

Example:
CPU A will allow xyz due to its design (vs CPU B ...)
vs
Company A designed CPU A to enable xyz due to factors 123 (compared to Company B's decision to...)

Arwin · Mar 31, 2007

Capeta said:
EE is not analogous to CELL eg EE in PS2 was used for TnL...CELL in PS3 isn't. Not only that but PS2's main increase in performance output was graphics. Since PS3 is has traditional GPU with built-in TnL as well as better tools, there won't be a major improvement in graphics like there was with PS2. I don't foresee major graphics improvments from "untapped CELL performance"...physics? Sure.

I think you have it backwards. Putting the Cell processor to physics-based work is easier, done from day one by many games (even if sometimes only for cloth-simulation ...). But how the Cell and RSX interact in terms of all sorts of culling, vector animation, post-processing, volumetric rendering and god knows what else, that's the interesting bit that will see a lot of improvements. There's all sorts of fun things you could do, even with textures generated or, I think maybe even more important, manipulated by cell and so on. Procedurally generated geometry?

Also, don't underestimate what programming to the metal means even for the RSX. There's more to its implementation in the PS3 than just being able to texture from both GDDR and XDR memory pools, and I bet even that isn't used by most games yet, never mind streaming textures. There will be people here more qualified to comment on this though (hoping nAo pops in for this one).

The whole memory, FlexIO, XDR, Cell-cores, RSX, GDDR, and the way each can access the other and so on really allows for a lot of different approaches.

Right now, there are two major development paths underway.

One is making sure that a more general approach for multi-platform games works well on the PS3's more outlandish architecture (think more generic SPE job systems). Think CodeMasters, Ubisoft, EA, Epic, etc. For this, Sony among others is providing assistance (think the Neon engine project), and middleware like Havok and Epic's Unreal Engine provide an abstraction layer.

The other is PS3 focussed developers who are developing a specific game, which they can bring into harmony with the hardware as much as possible. Think the likes of Polyphony Digital, Ninja Theory, Naughty Dog, Insomniac, Factor 5, Evolution Studios.

Added to this is the earlier mentioned parallelisation, focussing development on datastreams more than ever, and I think there is a tremendous amount of growth possible there too.

Then there is the whole HDD available by default, which allows for new programming methods at least compared to PS2 games, combined with the option to stream vast quantities of (i.e. up to 50Gb of compressed) data from the BluRay disc.

Oh and let's try not to make any inter-console comparisons in this thread.

oli2 · Mar 31, 2007

TheChefO said:
Example:
CPU A will allow xyz due to its design (vs CPU B ...)
vs
Company A designed CPU A to enable xyz due to factors 123 (compared to Company B's decision to...)

Acknowledged.
I said difficult but noy unfeasible ('cause i think so hard that Microsoft did the best given the time they had).
/arguing with you.

Sis said:
You lost me here. The 360 has 3 homogenous cores, the PS3 has 8 heterogenous cores. But neither one has multiple processors...except being multi-core they might as well be. No?

Excuse-me ... i should have written "processors or cores".

Lefungus · Mar 31, 2007

oli2, If I understood you correctly you're saying that
- parallel programming means parallelizing the same task over multiple cores
- multi-threaded programming means parallelizing multiple tasks on one core.

I believe this is a bad definition. Valve said it best while doing their interview on the multi-threading of the source engine for both x86/cell/xbox cpu.
When parallelizing, you only need to think about your granularity.
- Micro parallelism. You parallelize small steps or functions on different cores. For example, you split quick sort (very easy) on multiple cores. This is the hardest method since some algorithms are simply not designed for parallelism and you will need to research your own solution.
- Macro parallelism. You parallelize huge tasks over multiple cores. For example, you put physics on one core and graphics on another one. What you do is that you take different units of work with different output, that can be processed simultaneously, maybe with some barrier or synchronization sometimes. This is, relatively, the easiest way to achieve multi-threading. That's also what's recommended in some public Microsoft presentations on game performance.

Of course, to get the best performance, you will need to choose between micro/macro and all the various steps between those two extremes.
Regarding that, I believe there is no difference between the Cell and the Xenon CPU. Both will get enhanced performance using either ways.
Now Cell is more flexible/complex because it has asymmetric cores, with differing capabilities while xenon cores are symmetric. (I also believe it's a little more powerful cpu-wise)
However, Cell have no different paradigm shift from Xenon. Both will profit from parallelism at any level.

To get back to the topic. I'm undecided about whether this generation will achieve the same kind of leap as the PS2.
- Regarding the GPU, we are in known grounds. Both GPU are relatively easy to use. We skipped the 'how the hell am i supposed to draw anything with that' step. In no way will be begin as the same low level as PS2.
Regarding the CPU, one one hand, we WILL improve our knowledge and maximize the efficiency of those multiple cores. I don't think current games are very ambitious regarding this yet. One another hand, multi-threading have been a tough nut to crack for the last decades and we sure won't find any magic wand.

homy · Mar 31, 2007

Lefungus said:
Now Cell is more flexible/complex because it has asymmetric cores, with differing capabilities while xenon cores are symmetric. (I also believe it's a little more powerful cpu-wise)
However, Cell have no different paradigm shift from Xenon. Both will profit from parallelism at any level.

I think I have to disagree with you totally.
Cell is a huge shift from Xenon. I mention just one very typical difference here.
In Xenon, you have cache. We all know that cache has an automated replacement policy be it LRU..., where you can't control or know whereabouts of its data in real time.
In Cell, you have local store in each of SPUs. So the SPUs know the locality of its data and you can replace it in real time. You don't have to worry about cache hits or misses.

Capeta · Mar 31, 2007

Arwin said:
I think you have it backwards. Putting the Cell processor to physics-based work is easier, done from day one by many games (even if sometimes only for cloth-simulation ...). But how the Cell and RSX interact in terms of all sorts of culling, vector animation, post-processing, volumetric rendering and god knows what else, that's the interesting bit that will see a lot of improvements. There's all sorts of fun things you could do, even with textures generated or, I think maybe even more important, manipulated by cell and so on. Procedurally generated geometry?

Culling? Isn't that required? How is that going to significantly improve from 1st gen games to final gen?

Vector animation? What is that?

Post processing isn't going to improve much since the same effects are already being done using the GPU.

Volumetric rendering? Again that's already being done.

Also, don't underestimate what programming to the metal means even for the RSX. There's more to its implementation in the PS3 than just being able to texture from both GDDR and XDR memory pools, and I bet even that isn't used by most games yet, never mind streaming textures. There will be people here more qualified to comment on this though (hoping nAo pops in for this one).

ANY GPU can be programmed to the metal, that doesn't mean you'll see a significant boost from 1st to final gen graphics.

The whole memory, FlexIO, XDR, Cell-cores, RSX, GDDR, and the way each can access the other and so on really allows for a lot of different approaches.

Alot of different approaches that get the same end result doesn't equal significant boost. Flexibility doesn't automatically equal major boost.

Right now, there are two major development paths underway.

One is making sure that a more general approach for multi-platform games works well on the PS3's more outlandish architecture (think more generic SPE job systems). Think CodeMasters, Ubisoft, EA, Epic, etc. For this, Sony among others is providing assistance (think the Neon engine project), and middleware like Havok and Epic's Unreal Engine provide an abstraction layer.

The other is PS3 focussed developers who are developing a specific game, which they can bring into harmony with the hardware as much as possible. Think the likes of Polyphony Digital, Ninja Theory, Naughty Dog, Insomniac, Factor 5, Evolution Studios.

Added to this is the earlier mentioned parallelisation, focussing development on datastreams more than ever, and I think there is a tremendous amount of growth possible there too.

That has been the case in previous generations too, doesn't mean it will result in a significant boost.

Then there is the whole HDD available by default, which allows for new programming methods at least compared to PS2 games, combined with the option to stream vast quantities of (i.e. up to 50Gb of compressed) data from the BluRay disc.

How is a HDD or BRD going to allow a significant boost in graphics? Do they have some secret tech embedded in them?

Oh and let's try not to make any inter-console comparisons in this thread.

Huh?

Lefungus · Mar 31, 2007

homy said:
I think I have to disagree with you totally.
Cell is a huge shift from Xenon. I mention just one very typical difference here.
In Xenon, you have cache. We all know that cache has an automated replacement policy be it LRU..., where you can't control or know whereabouts of its data in real time.
In Cell, you have local store in each of SPUs. So the SPUs know the locality of its data and you can replace it in real time. You don't have to worry about cache hits or misses.

I just meant 'paradigm shift' regarding the whole parallelism/multi-threading context. Of course those cpus have lots of differences, but the one you mentioned is implementation specific. Most of the time, the local spe memory will vastly improve the algorithm speed but not its design. Even then, I remember some talk about how xenon cores can lock some parts of the cache (unconfirmed), which you could mix with prefetching. It certainly won't be as fast though.
Anyway, my previous post was mostly about invalidating the whole 'cell is parallel, xenon is ONLY multi-threaded' thingie.

oli2 · Mar 31, 2007

Thanks for your post : it helps me clarifying my view.

Lefungus said:
Valve said it best while doing their interview on the multi-threading of the source engine for both x86/cell/xbox cpu.

I knew i should have read those !

Lefungus said:
Micro parallelism. You parallelize small steps or functions on different cores. For example, you split quick sort (very easy) on multiple cores. This is the hardest method since some algorithms are simply not designed for parallelism and you will need to research your own solution.

That is what i called in my previous posts parallel model.

Lefungus said:
Macro parallelism. You parallelize huge tasks over multiple cores. For example, you put physics on one core and graphics on another one. What you do is that you take different units of work with different output, that can be processed simultaneously, maybe with some barrier or synchronization sometimes. This is, relatively, the easiest way to achieve multi-threading. That's also what's recommended in some public Microsoft presentations on game performance.

That is what i called multi-threaded model.

Lefungus said:
Of course, to get the best performance, you will need to choose between micro/macro and all the various steps between those two extremes.
Regarding that, I believe there is no difference between the Cell and the Xenon CPU. Both will get enhanced performance using either ways.
Now Cell is more flexible/complex because it has asymmetric cores, with differing capabilities while xenon cores are symmetric. (I also believe it's a little more powerful cpu-wise)

I have to disagree there : IMO Cell seems better suited to micro parallelism and Xbox360's CPU to macro parallelism, for very a simple reason : the number of cores. In the cell, you have one brain, the PPE, and 7 cores to use to define your grid of processors (linear, circular linear, hypercube, ring, etc.), each of them strong to computation, able to communicate very fast with another or with the "brain" and with 256 kb LS. For macro-parallelism, i don't think this processor is well suited because there is only one generalist core.

For Xbox 360, it is enough to read your definition of macro-parallelism to understand why it seems better suited to this. Given what i said of micro-parallelism just bellow, i fail to see how it could be well suited to this processor.

Lefungus said:
One another hand, multi-threading have been a tough nut to crack for the last decades and we sure won't find any magic wand.

I agree very much with that.

oli2 · Mar 31, 2007

Lefungus said:
Anyway, my previous post was mostly about invalidating the whole 'cell is parallel, xenon is ONLY multi-threaded' thingie.

Which, IMO, you failed to prove at the moment, because regarding Xenon, "capable of" is no synonym of "efficient enough" or "worth dealing".

You were speaking of paralell sort previously. Would it really be worth implementing it on a 3 node ring, regarding the caches latencies ? I think in most cases not ... With 7 SPEs, this is certainly not the same answer ...

Lefungus · Apr 1, 2007

oli2 said:
Which, IMO, you failed to prove at the moment, because regarding Xenon, "capable of" is no synonym of "efficient enough" or "worth dealing".

You were speaking of parallel sort previously. Would it really be worth implementing it on a 3 node ring, regarding the caches latencies ? I think in most cases not ... With 7 SPEs, this is certainly not the same answer ...

Those kinds of optimizations have been done for ages on SMP x86 cores, and it does improve the speed. For example, I've done picture interpolation, like bicubic, parallelized on the picture level (N horizontal slices per core). I would put it into the micro category (Macro being processing different pictures at once, or doing two different filters). I tested it and got nearly linear increase per core, aka ~2x for amd64 dual core and a little less than ~4x for intel quadro. I'm sure that I would get 3x the speed of a single core on xenon and 6x on the 6 spe, maybe less depending on the overhead which may increase with an higher number of core. On hyperthreaded cores only, which is my current cpu, I got ~15-20% increase due to higher register pressure.
Cell have an advantage in that it has more cores available (N=6), instead of xenon (N=3).
Now quicksort is also an 'embarrassingly parallel' algorithm, so while I didn't specifically implement it myself, I see absolutely no reason why 3 nodes would not be worth doing it against 1 only. I do agree that 7 nodes may yield better speed and results though.

Another note, micro-parallelization is in no way the 'superior' method. It's a case per case deal (check Valve interview conclusion)

Arwin · Apr 1, 2007

Capeta said:
Culling? Isn't that required? How is that going to significantly improve from 1st gen games to final gen?

The thing is that by default, most people are going to presume to do all sorts of things on RSX. But because the CPU and GPU are so closely linked together and much more closely matched in terms of performance, the line of what you would traditionally do on the one or the other becomes blurred. It's not for nothing that you have threads like these:

http://forum.beyond3d.com/showthread.php?t=26223&highlight=backface+culling+on+cell

I just mentioned culling because I remembered it coming up in one of the Sony presentations at GDC mentioned elsewhere on this site.

Vector animation? What is that?

That's me just clumsily hinting at that you can do a lot of realtime vertex manipulation a lot more efficiently using the Cell which feeds its results directly to the GPU. Again, I did not come up with this myself.

Post processing isn't going to improve much since the same effects are already being done using the GPU.

Again, in the traditional scenario the GPU would do it all, or nothing would happen. But in this case, the Cell can ADD effects to what the GPU can pull off. Also, I think some type of post processing could actually be more efficient on Cell.

Volumetric rendering? Again that's already being done.

Yes, but again I am throwing it out there as one of the examples of how Cell and RSX can both do things, but one of them might be better at it than the other, and you might have more resources left over at one or the other. It depends on your bottlenecks, and you have a choice. Sometimes the choice will be obvious from the start, other times not.

ANY GPU can be programmed to the metal, that doesn't mean you'll see a significant boost from 1st to final gen graphics.

But that is simply because with cards of such complexity, the PC space doesn't offer programmers a long enough window to go that far. Maybe the Wii could be an exception, if it changes little from the already fairly transparent (at least that's what I read) GameCube. But the RSX, simple as it may be in relative terms to other chipsets, given enough exposure to hardened programmers she'll give up some new tricks yet. Just give them more time - and that's exactly what they get on a console.

Alot of different approaches that get the same end result doesn't equal significant boost. Flexibility doesn't automatically equal major boost.

Not automatically, no. But the chances of that happening do definitely increase. I'm just putting all the factors together here.

That has been the case in previous generations too, doesn't mean it will result in a significant boost.

It's an important requirement, lest someone argues (again) that in the next generation it's all about multi-platform engines anyway so getting the most out of a system isn't going to happen. Glad to hear you're not going to bring that one up though!

How is a HDD or BRD going to allow a significant boost in graphics? Do they have some secret tech embedded in them?

Nothing secret about having virtual memory, but it wasn't available in the previous generation - at least not on PS2, and Halo 2 was just about the only game that used it on the Xbox.

Huh?

That wasn't directed at you, silly.

ninzel · Apr 1, 2007

I hope so.

oli2 · Apr 1, 2007

Ouf, i almost doubt of what i thought.

The only thing i can say is you seem to misunderstand what parallel programming is. I 'll try to give an overview of it below but if if you are interested, you should go to the site of a faculty or of an informatic school and look to some pdf or ppt describing what parallel architectures / programming are.

Lefungus said:
Those kinds of optimizations have been done for ages on SMP x86 cores, and it does improve the speed. For example, I've done picture interpolation, like bicubic, parallelized on the picture level (N horizontal slices per core). I would put it into the micro category (Macro being processing different pictures at once, or doing two different filters). I tested it and got nearly linear increase per core, aka ~2x for amd64 dual core and a little less than ~4x for intel quadro. I'm sure that I would get 3x the speed of a single core on xenon and 6x on the 6 spe, maybe less depending on the overhead which may increase with an higher number of core. On hyperthreaded cores only, which is my current cpu, I got ~15-20% increase due to higher register pressure.
Cell have an advantage in that it has more cores available (N=6), instead of xenon (N=3).

These informations are interesting, the problem is that you only describe here a "multi-threaded" method.
The basis for parallel system are :
- X cores online
- Each with its own cache and able to perform independently some calculation.
- A core can put / retrieve some datas from the cache of another core
- Most important : these cores are organized (linear or ring or hypercube or ...)

The algorithms do take account of the capability of the core to independently "make a part of the job" and from the organisation of the cores. (see below)

Lefungus said:
Now quicksort is also an 'embarrassingly parallel' algorithm, so while I didn't specifically implement it myself, I see absolutely no reason why 3 nodes would not be worth doing it against 1 only. I do agree that 7 nodes may yield better speed and results though.

Presicely ! Quicksort does not fit well to parallel programming. There are far more "better" method to perform a sort, each based on a given organisation. You can look for example how "Bitonic sort" is performed.

If my assomption is correct, the SPE are organized either linear , or circular linear (maybe i am wrong and this is a ring !). Go look how a sort would be efficiently done on these organizations, and think back if it would be manageable with Xbox360 CPUs.

Lefungus said:
Another note, micro-parallelization is in no way the 'superior' method. It's a case per case deal (check Valve interview conclusion)

No need that Valve say that to agree. The only thing that matters is what is achievable and "doable".
On thing is certain : parallel architecture is more ambitious than multi-threaded one.

PS : I have some PPT that i could transmit to you if you want. They are in french (sorry) but you will certainly understand the point with the schemas. You will see for example, the method to sort an array for a linear organization, how to perform matrix multiplication.

oli2 · Apr 1, 2007

I have made some researches around the Web and found (between others) this : http://www.llnl.gov/computing/tutorials/parallel_comp/#Models

So, and the vocabulary side it seem that you are correct and i am false, because if we stick to it, Xbox360 and PS3 have the same capability to perform parallel programming.

But on the content my words stay correct which can be resumed as : given you will be able to organize the SPEs, and that the Cell 's architecture is specially made to optimize this aspect, some algos will eventually be able to be implemented on CELL that won't be worth dealing on Xbox360, because there is no way you can get a result in the same scale.

Simple as that. Sorry Chef, because it seems i compare the 2 products, even if this is not my purpose.

PS : i definitely wish i could give you some of my PPT, because when you read them, it is obvious.

TheChefO · Apr 2, 2007

oli2 said:
I have made some researches around the Web and found (between others) this : http://www.llnl.gov/computing/tutorials/parallel_comp/#Models

So, and the vocabulary side it seem that you are correct and i am false, because if we stick to it, Xbox360 and PS3 have the same capability to perform parallel programming.

But on the content my words stay correct which can be resumed as : given you will be able to organize the SPEs, and that the Cell 's architecture is specially made to optimize this aspect, some algos will eventually be able to be implemented on CELL that won't be worth dealing on Xbox360, because there is no way you can get a result in the same scale.

Simple as that. Sorry Chef, because it seems i compare the 2 products, even if this is not my purpose.

PS : i definitely wish i could give you some of my PPT, because when you read them, it is obvious.

How will this equate to a ps2 performance-ramp repeat?

LunchBox · Apr 3, 2007

graphically? probably not...

coz it seems like most of the time in game development is spent on physics...

it's looking like this gen is all about fully utilizing game physics...

it's like the new lens flare

Will the PS2 performance increase be repeated this gen?

Will PS2 "performance increase" be repeated?

Yes, it will without a doubt, be repeated

Yes, it is likely

Maybe

No, it is not likely

No, it will without a doubt, not be repeated

scooby_dooby

TheChefO

Arwin

Now Officially a Top 10 Poster

oli2

Lefungus

homy

Capeta

Lefungus

oli2

oli2

Lefungus

Arwin

Now Officially a Top 10 Poster

ninzel

oli2

oli2

TheChefO

LunchBox

Similar threads