RSX Secrets

Status
Not open for further replies.
Until I had made my "8 ROPs a bad engineering decision?" thread I had thought it was generally accepted that the RSX was indeed clocked at 500 MHz. Surprised to see a forum with as much affluence as this was still debating such an aspect of the PS3.
 
Until I had made my "8 ROPs a bad engineering decision?" thread I had thought it was generally accepted that the RSX was indeed clocked at 500 MHz. Surprised to see a forum with as much affluence as this was still debating such an aspect of the PS3.

Not sure if 'affluence' is the word there (though hey, it may be true!), but it was never the forum itself that questioned it - hell we were the ones to confirm it long ago - but rather a couple of posters who simply must always believe there's more than there is, and that there are secrets and conspiracies around every corner. And those folk traditionally make a lot of noise.
 
We are a public forum where all are welcome to contribute. This means new blood that dares to challenge the status quo (and rightly so to make sure the status quo isn't being stupid! ~Just so longer as they know when enough is enough...)
 
Does anyone want to edit the 550 MHz on Wikipedia?

I've been editing it since the first time i've heard of the downclock last year, but some people just couldnt accept it I guess :LOL:
 
The only secrets that are held within the PS3 are the programming techniques of the developers themselves. And as far as unknowns go now, the Wii is pretty much the last bastion.
 
what would memory frequency do other than provide what I would assume be unusable bandwidth since ROPS in both systems seem to be the bottleneck.

Memory bandwidth is consumable by render targets/framebuffers/blending/alpha/Z (ROP stuff), texturing and geometry fetching. When you add multi-sampling and overdraw, the bandwidth requirements go up and you will not be getting 8 pixels per clock if there is not enough bandwidth.

In other words was memory frequency dropped 50MHz simply because 8ROPS at 500MHz can only push 4 Mp (with memory at 650 or 700MHz)?
As has been mentioned, regards to heat/power dissipation is likely. Keep in mind that the GDDR3 chips are in close proximity to RSX and there would be a greater thermal density on the heat cap than on a similar PC card.

http://techon.nikkeibp.co.jp/english/NEWS_EN/20061127/124495/?SS=imgview_e&FD=-2047582866&ad_q
 
In my opinion the thrue secret of RSX is worst fill rate bandwidth against Xenos with MSAA and probably some superiority in pixel shader processing and texel rate(translated by Altavista):


Xenos Shader Performance

With Shawn HargreavesBlog "The Xbox GPU is a shading monster! "With being to say, your own PC environment (XPS M1210) with you tried comparing.
As for using the number of orders exceeds 500 orders lightly with Julia gathering シェーダ.
Being there is no texture order, purely operational efficiency (ALU efficiency) it becomes to calculate.
So result is as follows.

XPS M1210: GeForce Go 7400 (450MHz) -> 2.308fps (for RSX results maybe add -> 6* for more ALUs and * 1.22 for clock advantage)
XBOX360: Xenos (500MHz) -> 17.657fps

About approximately 7.651 times it became the result, high speed in comparison with GeForce Go 7400.

GeForce Go 7400: 4pixel shader * 2ALU = 8ALU (in pixel shaders RSX 6*)
Xenos: 3shader pipe * 16ALU = 48ALU

When it converts from the quantity of ALU, almost equal っ て thing to GeForce 7800GTX?

http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_shader_pe.html

Xenos MADD Performance

Using シェーダ only of product-sum operation, it tried measuring Xenos peak efficiency.
As for シェーダ sufficient simple ones product-sum operation 4 element vectors 1024 times.
However in order for the highest efficiency to appear with NV40 architecture, the line of sum of products order is adjusted.
So result is as follows.

GeForce Go 7400 (450MHz) -> 2.885fps (add for RSX 6* for more alus and *1.22 for more clock = 450MHz to 550MHz in pixel shader pipe)
Xenos (500MHz) -> 19.543fps
(19.543fps/2.885fps) & (450MHz/500MHz) = 6.10 times

Approximately, it reached 6 times and matched ratios of the quantity of ALU beautifully.

(1280 * 720) * 1024op * 2.885fps = 2.722Gops
(1280 * 720) * 1024op * 19.543fps = 18.443Gops

This becomes peak value っ て thing of effective efficiency.
It meaning that effective value is approximately 75% of theoretical values, you think that considerably it is excellent.
Simply as for NV40 system unless you must pay attention order and in order to move ALU1 and ALU2 efficiently difficult point.

http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_madd_perf.html

Xenos Fill-Rate Performance

Because from before the effect of EDRAM of Xenos had become matter of concern, when 64 times making overdraw with D4 resolution, it tried measuring frame rate.

It means with the drawing of 1 time 58.982 Mpixels to draw.
(1280 * 720) pixels * 64 overdraw = 58.982 Mpixels

When it is with there is no α blend is, like below in the result.

-- G72M --
MSAAx1: 58.982 Mpixels * 15.953 fps = 941 Mpixels/sec (11.292 GB/s) (for RSX adds -> * 1.84 and for more bandwidth and *1.11 for more clock)
MSAAx2: 58.982 Mpixels * 8.010 fps = 472 Mpixels/sec (11.328 GB/s)
MSAAx4: 58.982 Mpixels * 3.997 fps = 236 Mpixels/sec (11.328 GB/s)

-- Xenos --
MSAAx1: 58.982 Mpixels * 56.497 fps = 3332 Mpixels/sec (39.984 GB/s)
MSAAx2: 58.982 Mpixels * 55.814 fps = 3292 Mpixels/sec (79.008 GB/s)
MSAAx4: 58.982 Mpixels * 54.895 fps = 3238 Mpixels/sec (155.415 GB/s)

(Z reading 込 + Z entry + Color entry) = 12bytes/pixel

InteliSample of G72M has decreased it meaning that compressed function is excluded, being proportionate to the number of samples of MSAA, performance.
Being 11GB/s to simply actual memory zone being 7.2GB/s is puzzle. When it judges, that it is necessary to draw clearly with the early Z test by the hierarchical Z buffer early of the pixel unit cancelling the Z test, you exclude the Z reading 込 kana.

Consequently there is α blend, when like below in the result.

-- G72M --
MSAAx1: 58.982 Mpixels * 9.150 fps = 540 Mpixels/sec (8.635 GB/s)
MSAAx2: 58.982 Mpixels * 4.615 fps = 272 Mpixels/sec (8.710 GB/s)
MSAAx4: 58.982 Mpixels * 2.301 fps = 136 Mpixels/sec (8.686 GB/s)

-- Xenos --
MSAAx1: 58.982 Mpixels * 56.497 fps = 3332 Mpixels/sec (53.317 GB/s)
MSAAx2: 58.982 Mpixels * 55.814 fps = 3292 Mpixels/sec (105.345 GB/s)
MSAAx4: 58.982 Mpixels * 54.895 fps = 3238 Mpixels/sec (207.220 GB/s)

(Z reading 込 + Z entry + Color reading 込 + Color entry) = 16bytes/pixel

The α blend of Xenos as for cost free っ て however you have known, when really it tries trying, is enormous. As for G72M about approximately 40% as for Xenos completely there is no change to filling efficiency falling. As for the effect of EDRAM tremendous shelf.

http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate_.html

Xenos Fill-Rate Performance (2)

Next texture there is (a 1024*1024 32bpp), when it tried measuring.

-- G72M --
ROP: 58.982 Mpixels * 8.010 fps = 472 Mpixels/sec (5.664 GB/s) (for RSX adds -> * 4.1 to 6 times for more bandwith for textures at least counting with XDR/FlexIO acess)
TEX: (1024 * 1024 * 32bpp) * 64 * 8.010 fps = 2.147 GB/s

-- Xenos --
ROP: 58.982 Mpixels * 45.024 fps = 2656 Mpixels/sec (31.872 GB/s)
TEX: (1024 * 1024 * 32bpp) * 64 * 45.024 = 12.079GB/s

It is the proper result, but as for G72M memory zone has become the problem. Xenos about 20% performance has decreased with texture fetch. Because well the cost of 12GB/s is paid with texture fetch, if you mention the proper, but naturally what

http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate__1.html

(Xenos maybe in certain moments can process 50% to 10 times more fill rate with MSAA than RSX... and RSX with FlexIO /XDRAM acess can surpass C1/R-500/Xenos in Texel Rate)
 
Last edited by a moderator:
Heinrich4,

The RSX's fillrate is not really at issue. The Cell /can/ generate all the primitives and do all the polygon culling before passing work to the RSX. One of the main challenges of comparison between separate console chips is that they function as coprocessors to each other. {Something most online comparisons do not recognize or are unable to address.) Therefore how well they compare depends on how well the programmers have optimized or utilized cooperation between said units.
 
Heinrich4,

The RSX's fillrate is not really at issue. The Cell /can/ generate all the primitives and do all the polygon culling before passing work to the RSX. One of the main challenges of comparison between separate console chips is that they function as coprocessors to each other. {Something most online comparisons do not recognize or are unable to address.) Therefore how well they compare depends on how well the programmers have optimized or utilized cooperation between said units.

Fillrate may be an issue regardless of cell handling workload. Assuming there can be no fillrate limitation would thus lead us to believe that at any given time numerous spus can be addressed at any given time or an engine would leave 2+ spus free for said tasks at any given time, which may not be the reality.

Optimization cannot relieve a given overhead at any given time.
 
"At any given time" someone may stun you by using such a phrase three times in one sentence. :LOL:

Fillrate could be an issue where multi-layer texturing operations are vital. But so far as polymesh blending and redraw are concerned all that is best handled by the Cell. In order to understanding why I believe this please look up any info technical you can find on "Playstation Edge", "Ratchet and Clank: Future", "Uncharted: Drake's Fortune".
 
:p No, I personally did not do said testing. That is why I have cited these developers. Maybe I need to continue making large bibliographic posts? What I have seen is video demonstration in action and listened to the developers themselves explain the process and their decisions. Which is better than blindly repeating rumors and gossip. You will also notice that I illustrate /can/ and cite limitations that cell would not advance at or benefit in trying to do.

To also summarize why I accept the findings of Cell excelling in vertex operations is simple math. If it takes 7 cycles for the SPU to produce a SIMD result and the SPE operate at 3.2Ghz verses the RSX at 500Mhz, then the SPE is in effect producing nearly one solution per RSX cycle. So for matrix problems, clipping, and culling hidden surfaces, so long as bus latency isn't an issue the SPE make fantastic vector co-processors to the graphics engine.

But then that is just my passing opinion. As all posts are in general simply individual opinions.
 
Luckily during the drive from drive from Gainesville to Jacksonville no one caught this. Correction needs checking. This is from memory back in 2005 since I haven't really gone into it since then. But 128bit SIMD is 1-3 cycles, General 32bit operations are 6-7 cycles, and 64bit operations are 11-13 cycles. In any case more SIMD vector operation can be completed at a higher frequency than the RSX GPU allows.
 
Heinrich4,

The RSX's fillrate is not really at issue. The Cell /can/ generate all the primitives and do all the polygon culling before passing work to the RSX. One of the main challenges of comparison between separate console chips is that they function as coprocessors to each other. {Something most online comparisons do not recognize or are unable to address.) Therefore how well they compare depends on how well the programmers have optimized or utilized cooperation between said units.

FutureCTO,

How many bandwidth can be saved with cell generated primitives,culling,instancing (like a Vertex shader and Tessalation etc) to increase more Gpixel/texel overall (despite RSX in closed box... PsEdge talks something like 30% extra performance with use 5 SPUs)?

If not my mystake with links Hiroshige Goto settings to G70 like and talks in forums developers numbers of RSX alone in fill Rate lis something like 2/3 of Xenos Gpu at best scene/hipoteses.

(another question is how many performance Xenos looses with 22.4GB/sec sharing GDDR3 with cpu and others aplications)
 
Last edited by a moderator:
FutureCTO,

How many bandwidth can be saved with cell generated primitives,culling,instancing (like a Vertex shader and Tessalation etc) to increase more Gpixel/texel overall (despite RSX in closed box... PsEdge talks something like 30% extra performance with use 5 SPUs)?

If not my mystake with links Hiroshige Goto settings to G70 like and talks in forums developers numbers of RSX alone in fill Rate lis something like 2/3 of Xenos Gpu at best scene/hipoteses.

(another question is how many performance Xenos looses with 22.4GB/sec sharing GDDR3 with cpu and others aplications)

I have some slides on this.

1 SPU.

800,000+
Triangles Per Frame
at 60 FPS

60% Culled

So thats 60% of the bandwidth saved right?

Also, the performance hit for the Xenos from the shared bus would depend on how heavy you use the CPU.
 
I have some slides on this.

1 SPU.

800,000+
Triangles Per Frame
at 60 FPS

60% Culled

So thats 60% of the bandwidth saved right?

Also, the performance hit for the Xenos from the shared bus would depend on how heavy you use the CPU.

Thanx for information.

Thats a good information come with PsEdge presentation,but if not my mystake i read here Shootmymonkey, Joker and others talk about real numbers something like half of this (maybe at this time of development PsEdge in march/07).

I wanna know how actual stage of this numbers are "full reliable"(with more advances ins SPURS,etc).

About Xenos and bandwidth ... we have to know how many threads Xenon are still using constantly for many compressing, sound,texture,geometry etc cause im read sometime something like 1/3 power process of Xenon cpu useing for this operation(games like PGR3,PDZ,Kameo and Others) and if this still happen we have great possibiliy to see x360 cpu reach 10GB/sec or even more(and GDDR3 have latencies...).
 
Last edited by a moderator:
Thanx for information.

Thats a good information come with PsEdge presentation,but if not my mystake i read here Shootmymonkey, Joker and others talk about real numbers something like half of this numbers (maybe at this time of development PsEdge in march/07).
I wanna know how actual stage of this numbers are "full reliable"(with more advances ins SPURS,etc).

About Xenos and bandwidth ... we have to know how many threads Xenon are still using constantly for many compressing, sound,texture,geometry etc cause im read sometime something like 1/3 power process of Xenon cpu useing for this operation(games like PGR3,PDZ,Kameo and Others) and if this still happen we have great possibiliy to see x360 cpu reach 10GB/sec or even more(and GDDR3 have latencies...).

Your talking about this right?.

View Post
Q: The last demo how many SPUs were being used.
A: 5 SPUs 1.5mil output triangles. Several mil input

Because if you are, we have no idea how many triangles were inputted.
 
Thanx for information.

Thats a good information come with PsEdge presentation,but if not my mystake i read here Shootmymonkey, Joker and others talk about real numbers something like half of this numbers (maybe at this time of development PsEdge in march/07).
I wanna know how actual stage of this numbers are "full reliable"(with more advances ins SPURS,etc).
I doubt Its reliability has much to do with CELL or SPURS in case of a game like MLB 2K8 (backface culling etc.), it's more to do with scene and model setup.
That EDGE numbers include more complex stuff like pixel visibility culling under MSAA too, which are naturally more computation heavy.
 
Thanx for information.

Thats a good information come with PsEdge presentation,but if not my mystake i read here Shootmymonkey, Joker and others talk about real numbers something like half of this (maybe at this time of development PsEdge in march/07).

Err, the backface cull removes 50% of faces, by definition.
Add it up with "less than one pixel" triangles and you will get to 60% with no problem. Edge can do some other things to reduce it even more.
 
Status
Not open for further replies.
Back
Top