liolio said:
*First, I think I've a misunderstanding about when post processing happens. One would answer after that processing is done
but what do we call processing tho?
My understanding is that post processing happens after blending and MSAA resolve has happen, is that right?
Yes, that is correct. You render the scene in some manner, where the results are eventually written to a render target (directly or not). That is resolved to main memory, which includes MSAA resolve is needed.
You then render (usually) full screen effects to the back buffer (or to other intermediate buffers). These effects may be simple things like colour filters, or more complex effects like bloom (which are multipass, requiring extra render targets and passes). Some of these effects may still require depth information from the rendering pass, eg, a an explosion warping effect, etc.
After that, you then usually render the UI on top of everything else.
liolio said:
*** About the data, after resolve and blending previous data (Z buffer, colors, Alpha) present in EDRAM are lost right? only the resolved frame buffer is left, or data stay in EDRAM and results are straight in main RAM
Yes, the data in EDRAM either gets resolved or it doesn't. You can only access it as a texture if it gets resolved.
liolio said:
*If you don't do post processing you copy the frame buffer to the RAM, and actually you do the same if you want to do post processing as the xenos can't read from EDRAM, How long does it takes?
As it's basically a memory copy, it's subject to the limitations of system bandwidth. You can get a rough idea by running the numbers.
liolio said:
*during post processing the GPU has to access the frame buffer hich is in RAM. The only way for the GPU to touch the RAM is through texture fetch, right? It takes some time so till the operation is not complete we can consider ALU cycles as free, that's the idea right?
Yes, and this is why a GPU has a scheduler. It will keep as many 'threads' (which might be pixel quads) in flight as it can. The number of threads is usually limited by the number of temporary registers a shader uses (only so much space to store them). When a texture fetch stalls because the data isn't in the texture cache, the gpu will put that thread aside, and try and replace it with one which hopefully needs data that is in cache.
When you become fetch limited, it basically means these threads are all stalled, waiting on the fetch from main memory. At which point, you could be doing ALU work for 'free', as otherwise they would be idle.
This is one of the gotcha's with the xbox - it's texture cache isn't huge, and there is a *big* penalty for missing the cache (which is hopefully hidden by other threads). So if your shader is jumping around like mad, sampling all over the place, then performance can simply implode. The xbox has a few tricks to help out, such as the ability to specify a texture will be tiled.
liolio said:
*At which the xenos can fetch textures? So usually how long it take to fetch the whole framebuffer to xenos? I remember reading some ms.
Once again, it's a bandwidth issue. A simple operation (such as drawing a texture to the entire screen) should get very close to the theoretical performance limits.
liolio said:
1) you sample the frame-buffer as a texture, what effects would be achieve this way?
2)you want to some change through shaders. Say you use your framebuffer to texture 1280x720 pixels then run your calculations, various question here.
Can the GPU swallow that without Alpha value, or you have to injected/set alpha value?
Can the GPU swallow that without Z value, or you have to create a matching "flat" Z-buffer?
Could you memexport the color buffer or you have to resolve/blend first (useless).
3) you send the whole frame buffer EDRAM as a color buffer, set a matching Alpha accordingly, compute something in else during the time it takes to move the framebuffer back then the GPU blend it together send it back RAM. (Like in KZII some render target done in SPU is blended with the framebuffer)
1) I think you misunderstand. The frame-buffer should be considered a special case texture. Otherwise, all rendering is done to a render target (ie, a texture that can be drawn to). So it's more a case of 'render to a texture, draw to back buffer with FX' than 'render to back buffer, copy back buffer to texture, draw texture to backbuffer with FX'. There is no difference in resources, just one has a redundant copy. (Or perhaps I misunderstand the question)
2) Pretty much all texture fetches will fetch the entire value for the given texel in the texture. Except some bizzaro-formats which no one uses. So if you wrote a RGBA texture, you will sample an RGBA value. If you use the alpha value is up to you. The hardware is setup to fetch in certain sizes, such as 32bits (RGBA 8 bit), etc.
Z buffers are separate from colour buffers. However some chips can read z-buffers as if they were textures (Most z buffers are 24 bit FP with the remaining 8 bit for stencil masking).
3) I'm sure you could memexport the colour buffer, and in fact, I believe Deano does exactly this in Brink
But I'd expect you'd only do it in very special cases with very special requirements, say, you had some extra info that you only wanted for every 4th pixel (or something like that) - on top of the normal colour output.
liolio said:
I'm lost about what is really happening, the flexibilty xenos provide but as I try to figure it out there is one thing I understand not being able to read from your render target(s) is a pain in the ass.
Actually I come to the conclusion that post processing as a name is misleading. It's not "after processing/rendering" but more do what you could not afford to do at a given point of rendering.
As I try to figure this out it comes clear that developers hands are more tied than I thought.
I don't know how depth of field is handled/faked in video game but to make it properly you would need per pixel-Z value
It's pretty much not possible to read from the render target currently being written. It's one of those things that would be quite useful - but GPU architectures make it very impractical (this is my understanding at least). Random-read would be undefined behavior.
Post processing can usually best be thought of as operations that occur in screen space. Applying bloom, for instance, without post processing would be close to impossible (or at least impractically slow).
Yes, basically. Developers use what they have, and they cheat the system as best they can to get away with what they can. Limitations are everywhere, it's a battle.
And yes, for DOF, you need a depth value. The common way to do this, is take your scene output, copy it to a smaller texture (say, 4x smaller), blur it once or twice (which takes two passes each time, with an intermediate render target). You then have the original, sharp rendered image and a smaller blurred version. The DOF would then be simulated by interpolating between the two based on depth.
This is fast, but has some pretty nasty accuracy issues such as nasty fringes. But for most games it's 'good enough'.