Pete said:
So, is ATI AAing the (tone-mapped) back-buffer or the FP RT?
They're using AA on the FP RT, downsample it and use it as a texture (there is no way of using a multisampled RT directly as a texture on current hardware. This will change with D3D10). This is slightly problematic because the tonemapping should conceptually be applied before the downsampling.
The backbuffer itself is a simple non-multisampled color only buffer since the downsampling already took place.
Does tone-mapping necessitate rendering to something other than the back-buffer first, or does the lack of FP16 back buffers simply mean tone-mapping makes more sense when writing the RT to the back buffer? If NV does end up moving the ROPs into the pixel shaders, would this speed up Far Cry's FP RT -> back buffer process by reducing the # of steps required?
Tone mapping in the HDR context is the process of mapping a non-displayable range of color values to a displayable range. There are basically three points in the pipeline where you could apply tonemapping:
- at the end of the shader, writing to a LDR back buffer/RT. This has to be performed per rendered pixel (including overdraw), but saves framebuffer bandwidth. The limitation is that transparency looks odd because you're blending pixels in tonemapped color space rather than in linear color space.
- at the end of a frame, reading an FP16 RT and outputting it to the LDR back buffer. This is performed per pixel per frame and needs a HDR RT and LDR back and front buffers.
- on output, displaying a FP16 front buffer directly. This could be implemented with the color LUT already used for gamma correction extended to 16 bits. This is performed once per pixel per screen refresh and needs HDR front and back buffers.
The latter two are very similar to AA downsampling at the end of a frame vs. downsampling on scanout. For double buffering and FP16, they need the same amount of memory, and bandwidth requirements depend on the fps:refresh ratio.
End-of-frame tonemapping however is more flexible because it can use shaders instead of either a small LUT that does piece-wise linear interpolation or some fixed-function tonemapping.
Moving the ROPs into the shaders changes nothing, it's a die space optimization replacing fixed function hardware with programmable hardware that is already present.