Not quite. As long as your framebuffer is in a blendable format, what you could do is use one pass to calculate how much you have to add to the framebuffer, reading the framebuffer in if it's stored in a nonlinear format, in order to get the desired color output. So a slightly modified algorithm would be:Mintmaster said:Also, doesn't ping-ponging require extra writes? Here's how I thought it worked:
1. Switch render targets
2. Read from the main buffer and write to the temporary one while doing your pixel shader blending.
3. Switch back to the original render target
4. Copy from the temporary buffer to the main one.
(Of course steps 2 and 4 can be appropriately interchanged)
1. Switch render targets.
2. Read from the main buffer to obtain the color difference from that which is stored in the frame buffer. Store this difference in another buffer, along with a write mask stored in the alpha channel.
3. Switch back to the original render target.
4. Blend the difference buffer with the frame buffer while using an alpha test to implement the write mask to do all blends once.
So it's not necessarily going to use up the same amount of bandwidth as FP16 blending, but for this to be an optimization, you need to be able to work temporarily in a linear, FX8 buffer for some number of blends before writing the output. This will place limits upon what you can do with the technique.