RSX: Vertex input limited? *FKATCT

Our 360 title does 4x msaa (3 tiles) and can still occaisionally hit 60fps, although usually its locked in the 30fps range. Our ps3 version does not do any anti aliasing, but still falls behind the 360 version in framerate. As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states \'vertex input limited\', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.

Is there more you can share with us what your engine is technically doing? The SCE devs have been an accommodating bunch. :)
 
Our 360 title does 4x msaa (3 tiles) and can still occaisionally hit 60fps, although usually its locked in the 30fps range. Our ps3 version does not do any anti aliasing, but still falls behind the 360 version in framerate. As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states 'vertex input limited', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.

Very interesting. Can you tell us if your game is doing HDR as well?
 
Very interesting. Can you tell us if your game is doing HDR as well?

I think he mentioned it did, specifically when he mentioned the ability to do MSAA with FP10 and how porting this to the PS3 had the issue the FP16 wasn't compatible with MSAA and there was no time to design a shader based solution.

granrooster said:
Is there more you can share with us what your engine is technically doing? The SCE devs have been an accommodating bunch.

He actually has said a lot in his posts. I won't collect all his hints here, but I think one can get a pretty good idea of not only the scope of the game (which he described in one of his recent posts) but could probably figure out to a high degree what game it is. If I could PM him I would shoot him my guess as I am 90% sure I know what title it is :cool:
 
Our 360 title does 4x msaa (3 tiles) and can still occaisionally hit 60fps, although usually its locked in the 30fps range. Our ps3 version does not do any anti aliasing, but still falls behind the 360 version in framerate. As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states 'vertex input limited', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.
Even though tiling appears to be a pain in the ass, I am glad devs are starting to take advantage of it.

Thanks for sharing your insight and experiences, joker454. I'm glad you're here, and I hope you stick around.
 
Just considering the performance hit, based on how many tiles you'd need... 720p 4xMSAA is 3 tiles (28.x MB), while 1080p 0xMSAA would be 2 tiles,, and 4 tiles for 2xMSAA, and 7 for 1080p. You can see how this really adds up. Performance penalties will really depend on how the engine deals with tiling, but if that as low as 5% hit for 3 tiles holds up, depending on how it scales, it could become pretty extreme. In other words, probably never any 4xMSAA, though 2x would certainly be possible.

So with Tiling you implying it is possible to have only ~5% hit compared to not Tiling? So why is 4xMSAA not possible @ 1080p with 7 tiles (give all things are equal)?
 
Edit:
Jov said:
So with Tiling you implying it is possible to have only ~5% hit compared to not Tiling? So why is 4xMSAA not possible @ 1080p with 7 tiles (give all things are equal)?
It is possible. I'm merely suggesting that depending on how a specific engine scales, it could see a pretty significant penalty. Only developers will have profiled and found out what they can do. We've heard 5% hit, but that was rather ambiguous if I remember properly. It could have easily applied to just two tiles. And we have no solid proof of either a) how big the penalty has been in real game engines (i.e., a number of them with different requirements, poly counts, etc.) so far, and b) how the penalty scales with number of tiles. If it got as big as 30%, I'd say that's pretty significant.

In that vein, I shouldn't have said it wouldn't happen. Both because I'm not a games developer, and because there might very well be games that do so. But I don't think those games are going to be the high-profile games that everybody looks as the next bar in graphics. Very well might be the next Micky-D's Super Mini Go-kart racer, though. Or whatever.
 
Last edited by a moderator:
So now you have this eDRAM, and logically predicated tiling, and can toss in architecturally the ability for the CPU to stream L2 data to the GPU directly. So on a basic level the Xbox 360 IS very traditional and not radical at all--PPC CPUs, single memory pool, and GPU with eDRAM. Yawn. But the proprietary features and design are quite a bit different and have been hardly touched.

I know it's not been discussed as a prominent Cell feature, but doesn't the PPE also have the same ability to lock parts of the L2 cache down to let an external device like the RSX read from cache?

I mean, I think the XDR is supposed to be fast enough that there's not nearly the advantage that the 360 takes from doing that sort of thing, but my understanding was that the cache line locking was a feature of the underlying PowerPC architecture, and not something that was devised especially for Xenon/Xenos.
 
Acert93 said:
it probably is also the best known in regards to what it can do, and what works well and what doesn't.
Maybe it's just me - but I could swear this thread demonstrates pretty clearly that working well with RSX is not exactly clear to many people out there (developers or general public alike). In fact lot of the discussion sounds eerily reminiscent of 6 years ago.
 
  • Like
Reactions: nAo
Maybe it's just me - but I could swear this thread demonstrates pretty clearly that working well with RSX is not exactly clear to many people out there (developers or general public alike). In fact lot of the discussion sounds eerily reminiscent of 6 years ago.

Are you refering the the PS2's "lack" of VRAM?
 
Soooooo....is there anything we non-techgeeks can derive from this discussion? Any help? :p

I dont understand technical talk but from what I can understand from all these, RSX has issues and pales in comparisson to 360's GPU?

Can someone also be more clear regarding PS3's capabilities relative to 360's? Because lately there is too much negative talk regarding the PS3 I d like to hear something clear from the developers in these forums. Sometimes its hard to get the right idea of something especially when some "journalists" like spreading things that have no relation to reality like that other time when I read in a 360 site that the PS3 is less powerful than the half of 360's. They even used IBM's reports on Cell's performance and other stuff to back up their claims
 
So as Dean say we (NT) use the WWS provided job manager, which is itself built as a custom SPU job scheduler on the standard PS3 job api (SPURS).
We (NT) also have a layer above the WWS job manager which provides a bunch of high level things to make our lives easier but all the hard work is done by WWS and the OS, Which is nice :)

which is how it should be,:smile:

btw all this talk about 40+ fpu's, everyones talking at 1080P or at least 1080i i assume?.

some random thoughts, its clear that currently (if ever) from the user linux side we cant see the GameOS, as is understandable, but can you from your dev side see and auto-mount such a user linux partition and for instance make an option inside your game to stream a section of gaming action to that partition?.

that would be a cool way to get master grade HD class ingame clips to put out on the likes of http://www.zudeo.com/az-web/app the new commercial Azureus (v3 bata)HD video torrent site to show your friends etc, and a good way to advertise ,all without some specialist HD grade video capturing kit from the users side.

perhaps even go as far as realtime encodeing the screen capture to AVC/h.264 at preset sizes (pip/SD/HD-720/1080/ip)for the users convenience?, assumeing it can do it realtime!.
 
Last edited by a moderator:
To understand you correctly, does that mean a ~28.1MB backbuffer (720p 4xMSAA 32bit and Z) is then resolved to ~3.5MB frontbuffer (or smaller for 24bit?) that is used as the framebuffer displayed on screen?
Yeah... that's right. You'd have a single AA backbuffer with Z, and a double buffered (normal sized) front buffer pair.

Cheers,
Dean
 
That rather slow bus is as fast as the PS3's VRAM bus... It's just that in the X360 the same bus has to provide the CPU-RAM bandwith as well.

Of course, i meant in relation to the EDRAM bandwidth. Still RSX does not need this additional step nor is there the need of tiling.

I just wonder what the performance penalty is:

RSX: Rendering the image with MSAA and postprocessing in VRAM (1 pass)

Xenos: Rendering multiple tiles, apply MSAA in EDRAM, tiles moved back to VRAM, put together and postprocessing applied (??-passes).
 
Where PS3 is radical is in the CELL SPEs; instead of multiple traditional CPU cores like the PPE, it has 1 PPE and 7 assymetric processors that are simpler (e.g. no branch prediction

Slight nitpick, but that is incorrect, isn't it? The SPEs do have (albeit quite limited) branch prediction.
 
You can stream of a HDD on the 360 so that isn't in the PS3's favor. Blu-Ray won't increase graphic fidelity (it will allow you to hold more varied data but you are still limited by RAM). Cell would be the major difference I would think.

by saying 'You can stream of a HDD on the 360 so that isn't in the PS3's favor. Blu-Ray won't increase graphic fidelity' are you refering to AVC/VC-1 streaming video content?

according to the two main MS lead Devs/coders/testers http://forum.doom9.org/showthread.php?p=897452#post897452 veffremov and zambelli the 360 has some serious problems (nicely spun but clearly there) in rescaleing and they say that ALL the decoding is done in SOFTWARE even for/with the add-on HD-DVD with its specially coded faster software VC-1 codec, its all rather strange given the varid and informed thread content here so far.

as far as i know the PS3 also uses software only HD 1080P decoding but has the benefit of the spe's for that option?.

read that whole thread above to get the picture given the doom9 locals are specialists in their own right in that thread and veffremov is new ,zambelli on the other hand is a vet of D9 as you can probably tell ;) .
 
Last edited by a moderator:
Oooo, fuzzy memory. But I'm pretty sure SPEs have no branch prediction hardware, and branch 'prediction' is performed by the developer with hints.

As for this thread, it's getting very messy. It's great when the devs get talking, but then every Joe jumps in to ask questions, and every other Joe jumps in to answer them. I'm kinda losing track of what's been said. I think Joker has said the problem he's encountering with PS3 is the memory, which seems to be the OS taking 32-64 MB more RAM than XB360 plus about 30 MB of backbuffer, and the vertex power of RSX. We know RSX has half the vertex setup of Xenos, but wasn't it also hinted that it was more effecient in selecting which vertices to actually render?

Has the original point and later comment been addressed yet?...
Joker454 said:
As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states 'vertex input limited', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.
Is RSX really much more limited in vertex throughput?
 
Of course, i meant in relation to the EDRAM bandwidth. Still RSX does not need this additional step nor is there the need of tiling.

I just wonder what the performance penalty is:

RSX: Rendering the image with MSAA and postprocessing in VRAM (1 pass)

Xenos: Rendering multiple tiles, apply MSAA in EDRAM, tiles moved back to VRAM, put together and postprocessing applied (??-passes).

assuming 720P 4xMAA, It's better to have 3 instant passes with a 5% hit on performance, than 1 single slow pass with an hit of 35-45%
don't confuse apples with oranges, the ROPs on Xenos uses a 158 GB/s internal bus
 
Back
Top