How much % acceleration is Cell for RSX

haha.. take a wild guess!

Answer:No it is from an ATI 4890 and a much more powerful CPU (Intel E8400 dual-core CPU).

I am not surprised my friend. That is a very nice image.

If you have the old machine it would be fun to see how high you can make the frame-rate and settings by disabling or lowering SPU tasks.

For example:
Shadowmaps can be lowered because of preprocessing possibility on SPU
AA can be disabled
Depth of field can be disabled
Resolution can be made low (1024x768, same pixels as COD4, MGS4, etc)
etc.

What do you think?

Then we can compare each with baseline (your first settings) to understand much drag each one is on RSX and also to see percent improvement in FPS.

Of course for different games these things will have different amount of effect but maybe it is possible to get ballpark ranges.

If not, still, thank you for the information and links.
 
Well if it is that interesting I was asked to help out with the DF article "can Crysis run on consoles?". The assimilation which are with same visuals as the provided images compared to CE3 techdemo Island part. A 7900GT 430MHz, 256MB VRAM at 1280x720 with assimilation settings ran at the scenery view part IIRC somewhere at 20-25fps. Though nor OS or HW was in optimal conditions (old drivers, faulty RAM stick etc). Scene had about 1-1,2m polygons per frame. It was also run on CE2 not CE3.

a Geforce 6800GT 256MB at 370Mhz can run crysis playable in low quality (looks better than and more complex than CoD), slows at mid quality and is a slides show at highquality on either an Athlon XP 3200+ or an Athlon X2 6400+

I am not surprised my friend. That is a very nice image.

If you have the old machine it would be fun to see how high you can make the frame-rate and settings by disabling or lowering SPU tasks.

For example:
Shadowmaps can be lowered because of preprocessing possibility on SPU
AA can be disabled
Depth of field can be disabled
Resolution can be made low (1024x768, same pixels as COD4, MGS4, etc)
etc.

What do you think?

Then we can compare each with baseline (your first settings) to understand much drag each one is on RSX and also to see percent improvement in FPS.

Of course for different games these things will have different amount of effect but maybe it is possible to get ballpark ranges.

If not, still, thank you for the information and links.

MGS4 is programmed very differently than COD4, its not going to make a good comparison at all.
 
MGS4 is programmed very differently than COD4, its not going to make a good comparison at all.

I didn't pick that resolution to compare MGS4 vs COD4 but because 1024x768 is acceptable console res to test Crysis and I prove it is acceptable res by listing MGS4 and COD4 which have similar res as 1024x768 (COD4 is different proportion only).
 
What if we had a PC with a 7900GT to test latest versions of COD, UT, Doom, Crysis, Elder Scrolls, Assasins Creed, etc?

Maybe start with 30fps settings, then turn things off to see results?

Or maybe start with max settings and turn things off to see if playable frame-rate is possible?

Can be fun to experiment and see.

Sadly, I do not have such a PC or such games.
Why would anyone want to do that?. We already know what kind of chip it is. And outside that the PS3 setup as a whole has already proven itself to be comfortably the most powerful console on the market when it's potential is fully utilized so i cant understand motivations for stuff like this. Seems a bit of a bizarre thing to do when everyone knows the PS3's magic juice is the Cell.
 
Why would anyone want to do that?. We already know what kind of chip it is. And outside that the PS3 setup as a whole has already proven itself to be comfortably the most powerful console on the market when it's potential is fully utilized so i cant understand motivations for stuff like this. Seems a bit of a bizarre thing to do when everyone knows the PS3's magic juice is the Cell.

Its not just Cell, its also RSX+LibGCM.
 
Not for comparison

Why would anyone want to do that?. We already know what kind of chip it is. And outside that the PS3 setup as a whole has already proven itself to be comfortably the most powerful console on the market when it's potential is fully utilized so i cant understand motivations for stuff like this. Seems a bit of a bizarre thing to do when everyone knows the PS3's magic juice is the Cell.

Hello.

This thread is not to compare console or to say what is best.

Yes, we know Cell has the "magic juice." But we do not have quantifiable estimate (estimatable quantification? you know what i mean) of how much RSX can "appear" to perform more than the expection (like 7900GT) without breaking physics laws when rendering aspects are moved to Cell.
 
I've always looked at it as if the PS3 were 2 seperate consoles with the only difference being Cell's ability to offload tasks from the GPU what the differences would be.

PS3 X = No offload ability + RSX
PS3 Y = Offload ability + RSK

In this sense the only difference would be the level of detail PS3 Y would be able to produce compared to PS3 X. We all can agree that the Cell in the PS3 is allowing it to produce visuals it normally would not be able to achieve with RSX alone. As such comparisons of what the performance of the machine would be without the Cell can be used to determine the increase in performance because of it.

While I agree with Apples vs Oranges comparison between a PC switching off certain visual affects to determine the increase in performance Cell could achieve is unrealistic. I do believe it is the best means currently available to provide enough tangible evidence to estimate an average increase of speed provided by utilizing cell in such a way.

All I think ihamoitc is looking for is:
If Cell removes 30-50% of the workload from RSX it means the RSX would have to be 30-50% faster to produce the same visuals...therefore it makes the RSX 30-50% faster (even though it doesn't).

I wouldn't use the information (if it is given) here to make my case against or for the PS3 but it would be nice to know how much benift the Cell is/can provide to the PS3.
 
IIRC there was a mention in GG tech presenation about 6SPEs being able to increase visual rendering perfomance upwards 40-60%. Though not for all type of GPU tasks but for some and it totalling to 40-60% boost over RSX.

RSX is 500MHz, 7800GTX is 430MHz (?), 7800GTX 512 is 550MHz, 7900GT is 430-500MHz, 7900GTX is 650MHz with some HW refinements (79xx series), 7950GX2 is 2x7900GT(X).
 
Not unrealistic but imprecise

If Cell removes 30-50% of the workload from RSX it means the RSX would have to be 30-50% faster to produce the same visuals...therefore it makes the RSX 30-50% faster (even though it doesn't).

Obviously RSX clock speed will not increase. However, we can ask how much available cycles per frame can increase or, another way, how much reduction in RSX rendering ms per frame.

That is not "unrealistic" but actually well documented reality. Many games use this like Uncharted 2, Killzone 2, GT5 and God of War 3.

What we cannot know is precise answers for many reasons. But who care's about precise?

We can gain approximate estimation using existing games that are operating the 7900GT such as Assasin's Creed, Bioshock, etc.
 
Wow

IIRC there was a mention in GG tech presenation about 6SPEs being able to increase visual rendering perfomance upwards 40-60%. Though not for all type of GPU tasks but for some and it totalling to 40-60% boost over RSX.

That is a great increase, my friend.

I remember a Sony ppt which says for depth of field, by Cell assistance, RSX load was made from 15ms (45% of 30 fps frame render time) to only 3ms (9%)!

http://www.technology.scee.net/file...09/DeferredLightingandPostProcessingonPS3.ppt
 
I've always looked at it as if the PS3 were 2 seperate consoles with the only difference being Cell's ability to offload tasks from the GPU what the differences would be.

PS3 X = No offload ability + RSX
PS3 Y = Offload ability + RSX

In this sense the only difference would be the level of detail PS3 Y would be able to produce compared to PS3 X. We all can agree that the Cell in the PS3 is allowing it to produce visuals it normally would not be able to achieve with RSX alone. As such comparisons of what the performance of the machine would be without the Cell can be used to determine the increase in performance because of it.
Naturally you can achieve bit better visuals if you offload your GPU work to your CPU cores. However the trade off is that the GPU tasks you offload usually require lots of math heavy parallel processing, and thus eat a lot of CPU cycles you could use to do something else. In many cases you already need those CPU cores to run your high quality physics simulation, collision detection, sophisticated AI, pathfinding, ray cast visibility detection, viewport culling, etc, etc. Using CPU for GPU tasks a trade off, and usually a bad one, since the GPU is much better suited for doing parallel pixel processing. If your game has very simple game logic and you have free CPU cores, then offloading to CPU is going to give you good results, assuming of course your engine can properly hide the additional latency and the additional synchronization needed for mixed GPU/CPU processing. (Personally I really hate additional input to screen lag).

In our recent games pixel processing (pixel shaders, texture sampling, blending, writing to frame buffer, etc) has consumed around 90% of our GPU cycles. You can offload vertex processing to CPU easily, but it doesn't help that much, since vertex processing cost isn't usually that high in current console games (around 10% of the frame usually). Post processing takes maybe 5% of your frame time (if you are running at 30 fps), and offloading that helps. But in total you should not expect to see more than 10-15% gain in frame rate even if you offload anything you realistically can to the CPU. But it really depends on the game you are developing.

That is a great increase, my friend.
I remember a Sony ppt which says for depth of field, by Cell assistance, RSX load was made from 15ms (45% of 30 fps frame render time) to only 3ms (9%)!
http://www.technology.scee.net/file...09/DeferredLightingandPostProcessingonPS3.ppt
15 ms for DOF sounds like complete overkill. In Trials HD we have a cheap screen space GPU DOF implementation that uses less than 1 ms and looks good enough for our needs. With a slight quality compromise you can speed up many GPU algorithms significantly. Comparison of different algorithms is usually apples vs oranges really.
 
Last edited by a moderator:
I think that it is not easy to compare when the results are different. Take for instance AA. In GOW 3, you offload the AA to cell and you even get better edge AA than traditional 2xMSAA as used in the demo code before - so how much improvement and acceleration is Cell for RSX in this case: infinity, because RSX alone would never achieve this AA level...it all depends on the definition and the goal the dev wants to reach.

The same goes maybe for postprocessing stuff. If other algorithms and math is used to make it work on Cell, in case you get a better result with this new implementation (for your DOF for instance)...how to you judge and compare the gain due to Cell now?!
 
Motion Blur on TFU2

Moving to SPU can have 5-10% savings. Maybe for RSX the savings is even more.

GPU (Xenos): ~2.6ms max (variable 5-11 samples) = ~8% of frame-render time at low quality

On the XBox360 it takes about 1.8-2.6 ms at full resolution with a quarter resolution velocity buffer and quarter resolution artifact fighting mechanism, performing variable 5 to 11 samples per pixel and special edge conditions to avoid the halos.

Cell (5 SPUs): ~1.9ms max (16 samples) = ~6% of frame render time at very high quality (maybe 3% for low quality?)

PlayStation3 SPU solution is a lot more advanced in terms of quality and takes about 0.9-1.9 ms on 5 SPUs to do 16 samples and complex edge conditions.

Taken from slide 55:

http://and.intercon.ru/rtfrucvg_html_slides/
 
Yes and no, it means it takes 1.9ms to complete the task. What happens inside this 1.9ms for each SPU depends on the algorithm used.

I think if you read the article, you'd find that the task is bounded by other stuff (e.g., DMA). The SPUs can finish the task even faster but without new data, it's not going to make a real difference to the users. The 360 GPU implementation is bottlenecked by something else too. The important thing here is the entire system needs to work together in a balanced manner. It's not merely a CPU/SPU/GPU question.
 
Doesent this mean that it takes 1.9ms on every spu?Total SPU time about 9ms across 5 spus?
Yes, 9ms SPU time if you want to (and can) share the workload across a different SPU distribution. But that's the same as saying 100ms Xenos shader time, at about 2 ms across 50 shader units - that is, a rather nonsensical figure when talking about render times.
 
Yes, 9ms SPU time if you want to (and can) share the workload across a different SPU distribution. But that's the same as saying 100ms Xenos shader time, at about 2 ms across 50 shader units - that is, a rather nonsensical figure when talking about render times.

No it isn't, because what you're interested in is how little space it takes up in your total frame's rendering pipeline, which you'll want to keep as much within 1 frame as possible. Or? This is the reason why they do these tasks across 5 SPUs - not because they actually need 5 SPUs (these processes most likely use but a fragment of the SPUs ) but because they want the latency as low as possible.
 
I don't understand the argument. The comparison between XB360 and PS3 times is how long it actually takes to render the effect. It's this absolute value that matters. How many cores is a side issue, no more important to that final time than how many of XB360's shader arrays are dedicated to the effect as opposed to spread over different tasks. Or in other metrics, the PS2 version is 1.4ms Cell time, 9ms SPU time. The 360 version is 2ms GPU time, 6ms shader array time, 100ms shader time. The only figures that matter here are how quickly you can get the interpolation done as that tell you how much time you have left for the rest of the game. The only exception I can think of is developers you portion Cell according to SPU time and want flexibility, who may think, "we can spread that 9ms SPU time across 3 SPUs instead, finishing interpolation in 3ms and leaving us those 2 SPUs to do such-and-such concurrent jobs." Assuming the scaling works that way.
 
Thanks Shifty...It was just out of curiosity,not to stir things up because some people tend to think that if you do the effect on one spu and it needs ~9ms,than you may as well do it across 5 and you will get down it to 1.9ms.But the problem is,they interpret it as it was total time,and according to them every spu would take approx 0.4ms which is not true.Similar to that MLAA misunderstand....
 
Back
Top