RSX: Vertex input limited? *FKATCT

A 720p 4xMSAA framebuffer is ~ 30MB. A fully resolved framebuffer is (taking a stab at this) ~ 3.5MB? Someone feel free to correct my math.
On RSX, like all NVidia GPUs of that generation, the frontbuffer retains its AA data for its entire lifetime, with the RSX display hardware resolving the AA data to produce the picture on screen with every refresh of the screen (e.g. 60 hz), regardless of the frame render rate (say 30fps).

So, as I understand it, the 720p "framebuffer" consists of ~30MB backbuffer during rendering and a separate ~30MB frontbuffer for actual display. After every frame has been rendered, and now needs to be displayed, the memory areas swap roles.

Jawed
 
On RSX, like all NVidia GPUs of that generation, the frontbuffer retains its AA data for its entire lifetime, with the RSX display hardware resolving the AA data to produce the picture on screen with every refresh of the screen
That is incorrect, as the auto resolving of AA buffers on output is/was a function of the display controller on nVidia cards.. not of the GPU itself. Front buffers on RSX have to be of the real display resolution (eg: 1280x720) rather than any AA-adjusted variant.. which means applications are responsible for the resolve via a full-screen pass.

Cheers,
Dean
 
That is incorrect, as the auto resolving of AA buffers on output is/was a function of the display controller on nVidia cards.. not of the GPU itself. Front buffers on RSX have to be of the real display resolution (eg: 1280x720) rather than any AA-adjusted variant.. which means applications are responsible for the resolve via a full-screen pass.

Cheers,
Dean

To understand you correctly, does that mean a ~28.1MB backbuffer (720p 4xMSAA 32bit and Z) is then resolved to ~3.5MB frontbuffer (or smaller for 24bit?) that is used as the framebuffer displayed on screen?
 
So now im a little bit confused (more than before ;):

My sum up, cmiimw!:

Xenos has a rendering advantage by saving memory when traditional rendering is done. When post buffer effects are used (like in every game nowadays) this advantage is gone since the edram processed scene has to be written back to the GDDR via a rather slow bus?

RSX on the other hand has the advantage that a frontbuffer resides in VRAM all the time and can thus be processed easier since no additional transfers to another buffer (edram) have to be performed? Imo that sounds like rendering on PS3 is alot easier to do, with exception for non-HDTV resolutions (not tiling required) where AA is applied on, as it is done in many current 360 games (PGR3, Tony Hawks, CoD3...)?

My conclusion:

All the memory problems on PS3 result from the OS, not the non-exisiting EDRAM? Guess this cannot be answered because of NDAs...
 
When post buffer effects are used (like in every game nowadays) this advantage is gone since the edram processed scene has to be written back to the GDDR via a rather slow bus?

That rather slow bus is as fast as the PS3's VRAM bus... It's just that in the X360 the same bus has to provide the CPU-RAM bandwith as well.
 
You completly forgot the poin of EDRAM is not to free up memory but to provide FREE AA and if i am not mistaken it can also provide free HDR.

Yes on his mixing and mashing of space & performance (I will let a dev answer most of his questions), but the point of the eDRAM is not to give free AA. The goal of eDRAM is to remove the backbuffer bandwidth, which is a large bandwidth client, from the main system memory. The backbuffer is fairly small, but consumes a disproportional amount of bandwidth for its footprint. So the eDRAM basically isolates a lot of the ROP activity. So that is the purpose of eDRAM. Some of the benefits of such is that the eDRAM provides just enough bandwidth for 4xMSAA as well as 4Gigapixels of fillrate. There is no bandwidth crunch and contention. Likewise the ROPs take this into consideration and can do single cycle 4xMSAA so there is no computational bottleneck either. Of course at 720p with 4xMSAA there is the hurdle of tiled rendering which does have a performance hit. How big has many variables.

As for HDR, there are a number of HDR is a group of technologies (iris effects, bloom, multiple render targets with high percision blending, etc) and has a number of different ways to accomplish such. There are shader based HDR schemes (like NAO32 in Heavenly Sword and Valve's approach in HL2: Lost Coast), and more specialized hardware based approaches that use FP16 and FP10 filtering and blending. The Xbox 360 can use FP10 based approaches with no penalty as they consume typical footprint and bandwidth, but FP16 formats consume 2x the bandwidth and memory footprint and are 2x as computationally expensive. FP16 based HDR formats are slower to process as well as require more tiles.

In a nutshell eDRAM does more than "free MSAA" and has a lot of benefits even when MSAA is not used -- but also has some design "gotchas" that require some elbow grease when using MSAA at HD resolutions -- and HDR is fairly unrelated to MS's choice of eDRAM unless talking about FP16, and in which case eDRAM may help in some areas (like bandwidth) but doesn't help in regards to tiling or computational bottlenecks in the ROPs.

EDIT: Marik2 added the comments about eDRAM being used for more than just MSAA after I posted, all is good.
 
Last edited by a moderator:
My knowledge of software development isn't as educated as some on these forums, but I feel I should offer my two cents to see if I can add to the discussion. So tell me if I've misunderstood anything.

It appears to me that the architectures of the two consoles are very different and that in comparison PS3 is the least understood not just because of time but also because of the radical nature of it's design. It seems unfair at this stage to label PS3 as the least able of the two, particularly if you're applying the rules of one architectural design to a system that clearly hasn't been designed with the same thought process. It seems memory is the current bug bearer in the discussion which in comparison to development on 360 appears to be a genuine concern. But could it not be that to tackle the problem, rather than look at it from the perpective of work conducted on Micorosft's console, the developer has to address his/her understanding of the issue by looking the particularities of the PS3 design?

Of course I may just be stating the obvious , but it does appear that some are looking at this from a comparitive deficiency angle which to me seems a little harsh considering the state of third party develop on PS3 at the moment.
 
It appears to me that the architectures of the two consoles are very different and that in comparison PS3 is the least understood not just because of time but also because of the radical nature of it's design.

I think that "radical" only applies partially, mainly to CELL. RSX is an adaptation of a PC GPU and is fairly straight forward and 'traditional' evolution of what has been happening in the PC GPU space over the last half decade. Developers have had access to the general overall architecture that predates RSX since 2004 (NV40) and G70, which RSX is derived and makes substantial improvements and efficieny gains on NV40, was released in early Summer 2005. RSX also is using fairly mature APIs and tools like OpenGL derivatives and Cg.

Of the GPUs and CPUs on both consoles I think many would agree with me that RSX is the least radical design; it probably is also the best known in regards to what it can do, and what works well and what doesn't. Just my 2 cents on that :smile:
 
Yes on his mixing and mashing of space & performance (I will let a dev answer most of his questions), but the point of the eDRAM is not to give free AA. The goal of eDRAM is to remove the backbuffer bandwidth, which is a large bandwidth client, from the main system memory. The backbuffer is fairly small, but consumes a disproportional amount of bandwidth for its footprint. So the eDRAM basically isolates a lot of the ROP activity. So that is the purpose of eDRAM. Some of the benefits of such is that the eDRAM provides just enough bandwidth for 4xMSAA as well as 4Gigapixels of fillrate. There is no bandwidth crunch and contention. Likewise the ROPs take this into consideration and can do single cycle 4xMSAA so there is no computational bottleneck either. Of course at 720p with 4xMSAA there is the hurdle of tiled rendering which does have a performance hit. How big has many variables.

...

Slightly OT, but it has been mentioned with the latest firmware update the XBox360 can display 1080p.

Thus if a project targets 1080p native (no up scaling) support and an engine designed from the ground up with Predicated Tiling, is this possible on the Xbox 360? If so what kind of challenges and trade-offs?
 
Just considering the performance hit, based on how many tiles you'd need... 720p 4xMSAA is 3 tiles (28.x MB), while 1080p 0xMSAA would be 2 tiles,, and 4 tiles for 2xMSAA, and 7 for 1080p. You can see how this really adds up. Performance penalties will really depend on how the engine deals with tiling, but if that as low as 5% hit for 3 tiles holds up, depending on how it scales, it could become pretty extreme. In other words, probably never any 4xMSAA, though 2x would certainly be possible.
 
...
Of the GPUs and CPUs on both consoles I think many would agree with me that RSX is the least radical design; it probably is also the best known in regards to what it can do, and what works well and what doesn't. Just my 2 cents on that :smile:

From a component only perspective this may be the case, but from the view of the console architecture as a whole I would disagree. It’s analogous to viewing the GS in the PS2 as a crippled GPU compared to the XGPU (or any modern PC GPU in the last 6 or so years), when it was designed to work with the EE.

Traditionally, the GPU handled to all the heavy lifting when it comes to graphics, but if the SPU(s) can reduce the load by carrying things like backface and or occlusion culling more effectively, then the RSX has room for more exploitation. The same apply for the Xenon/Xenos relationship.

Regarding the RSX been best known, I often wonder if this is truly the case and that if there is a bit more to it then we are led to believe.
 
deathkiller said:
The only visible result of this is the "flag algoritm" (Motorstorm, GTHD and Heavenly Sword flags look mostly the same

Where did you get this?! good fantasy! Our flags code was written internally..

I think he may have misinterpreted the IGN Gran Turismo HD Concept quote (see bottom) in the wrong way. Possibly thinking all developers are sharing the same code/algorithm to generate there flag movement.

IGN
IGN said:
The game has also been spiced up with some nice effects, like an impressive glare as you emerge from the track's tunnel, and those waving flags that PS3 developers seem to love (see MotorStorm and Heavenly Sword).
 
From a component only perspective this may be the case, but from the view of the console architecture as a whole I would disagree. It’s analogous to viewing the GS in the PS2 as a crippled GPU compared to the XGPU (or any modern PC GPU in the last 6 or so years), when it was designed to work with the EE.


The PS3 isn't that radical though in regards to general architecture. CPU with memory (XDR System memory) and GPU with memory (GDDR3 VRAM) with some FlexIO voodoo connecting them. Sounds a lot like a PC, especially when you consider that the PPE is a PPC chip.

Where PS3 is radical is in the CELL SPEs; instead of multiple traditional CPU cores like the PPE, it has 1 PPE and 7 assymetric processors that are simpler (e.g. no branch prediction and no L2 cache) and require some extra elbow grease (DMAs to the system memory) but have an insanely fast local store (memory) that make them little monsters when you can fit within the confines of the memory (and even better SIMD). No doubt, that is radical -- both in terms of departure and in coolness :cool:

The Xbox 360 has 3 CPU cores (PPC) with shared L2 cache; these CPUs connect to a unified memory pool shared with the GPU. Like Cell the hurdle of multicores is fairly new to many developers and is a departure from the Xbox, GC, gaming PCs (up to 2005) and in many ways the PS2. For Xbox developers there are a lot of other little issues like big/little endian, the PPC ISA, VMX units and libraries, in order design, and so forth. Not really radical, but different. The GPU on the other hand has its feet in both pools. It is an SM3.0 GPU using a customer DX and HLSL. But on the other hand you have quite a few unique features not found in previous GPUs (call it DX9.5; unified shaders, coherant memory read and writes, tesselation, etc), shared memory pool (like the Xbox, but unlike the PC), and a small eDRAM pool with 256GB/s of internal bandwidth and 32GB/s bandwidth between the two GPU chips. So now you have this eDRAM, and logically predicated tiling, and can toss in architecturally the ability for the CPU to stream L2 data to the GPU directly. So on a basic level the Xbox 360 IS very traditional and not radical at all--PPC CPUs, single memory pool, and GPU with eDRAM. Yawn. But the proprietary features and design are quite a bit different and have been hardly touched.

And hence the dovetail with the post I replied to. The arugement was that the PS3 was very radical and not well known. Of the major components on all the systems RSX is by far the best known and understood. Is there headroom? Of course! Are there better ways to use RSX--even without Cell help--than currently being shown. Absolutely. Is this because RSX isn't as well as understood as the Xbox as the post I replied to indicated? No.

Regarding the RSX been best known, I often wonder if this is truly the case and that if there is a bit more to it then we are led to believe.

Yes, it is G80. A driver update will resolve the confusion... hello Brimstone! :p

From a practical perspective this isn't realistic, not with developers like Joker asking Sony, "How can I best get the most out of RSX?"

Traditionally, the GPU handled to all the heavy lifting when it comes to graphics, but if the SPU(s) can reduce the load by carrying things like backface and or occlusion culling more effectively, then the RSX has room for more exploitation. The same apply for the Xenon/Xenos relationship.

So we don't understand either :p

I agree that SPEs can be leveraged for certain graphic work. "Traditionally" CPUs did a lot of this, but GPUs offloaded a lot of this. There seems to be some rebalancing in both directions (both consoles having the CPUs being capable of more work; on the PC DX10 allowing the GPU to do more without CPU dependance). So there is some learning to go on on both ends.

I would argue that PS3 devs will get a chance to take a hack at this before Xbox devs. First is because RSX is a pretty well known quantity, and second because the SPEs open up a lot of doors. I think devs will be playing with Xenos for a while (ditto PS3 devs with SPEs) and use the CPU for other stuff... all that insignificant stuff like game code :LOL:
 
Performance penalties will really depend on how the engine deals with tiling, but if that as low as 5% hit for 3 tiles holds up, depending on how it scales, it could become pretty extreme. In other words, probably never any 4xMSAA, though 2x would certainly be possible.

Our 360 title does 4x msaa (3 tiles) and can still occaisionally hit 60fps, although usually its locked in the 30fps range. Our ps3 version does not do any anti aliasing, but still falls behind the 360 version in framerate. As stated previously though we are vertex heavy, typically having 6 or so inputs and outputs to the vertex shader, and thats after optimizing! The title of this thread states 'vertex input limited', but actually vertex output is also a limiting factor as per rsx docs. So our title is doubly hit on performance on rsx.
 
[/color][/color]

The PS3 isn't that radical though in regards to general architecture. CPU with memory (XDR System memory) and GPU with memory (GDDR3 VRAM) with some FlexIO voodoo connecting them. Sounds a lot like a PC, especially when you consider that the PPE is a PPC chip.

Where PS3 is radical is in the CELL SPEs; instead of multiple traditional CPU cores like the PPE, it has 1 PPE and 7 assymetric processors that are simpler (e.g. no branch prediction and no L2 cache) and require some extra elbow grease (DMAs to the system memory) but have an insanely fast local store (memory) that make them little monsters when you can fit within the confines of the memory (and even better SIMD). No doubt, that is radical -- both in terms of departure and in coolness :cool:

Yes it does sound very much like a PC with some radical bits and pieces. ;)


Yes, it is G80. A driver update will resolve the confusion... hello Brimstone! :p

Never crossed my mind. If anything having a G80 like design will cause more overlap in functionality and purpose (SPU -> GPU and G80's GPGPU -> CPU). btw who's Brimstone? :smile:

...

I would argue that PS3 devs will get a chance to take a hack at this before Xbox devs. First is because RSX is a pretty well known quantity, and second because the SPEs open up a lot of doors. I think devs will be playing with Xenos for a while (ditto PS3 devs with SPEs) and use the CPU for other stuff... all that insignificant stuff like game code :LOL:

Agree things are balancing out more between the workloads of the two main chips, and that will produce different if not more interesting games.
 
Back
Top