RoOoBo said:
Dingly Dell said:
http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/talk-html/
Just skimming the paper and slides, but in which form is that different from the R300 4 render targets (pixel shader output to float point textures)? The paper even states in the section '5.4 Where does F-Buffer data enter the fragment pipeline?' that texture units could be used to access F-Buffers as inputs.
I guess (and taking into account that the source is TheInquirer) that there isn't something new in R350. Maybe now the drivers are able to auto multipass long shaders using the render targets. I wonder if some kind of hardware could be used to avoid the need to pass geometry again for the multiple passes, something like the hability of a fragment to 'loop' over the pixel shaders with different programs. The output fragment would be send to the fragment FIFO again with a pointer to a new pixel shader program, however it would be very inefficient if the new pixel shader program must be loaded from memory each time (somekind of scheduling and batching of fragments taking into account which shader programs are used would be needed).
The whole point of having an F-buffer is to avoid complete and explicit software-defined multipassing, i.e. splitting shaders, handling outstanding intermediates (using primitive bounding box sized texture intermediates), and re-issueing primitive. Nasty business in a driver.
HW support would simplify the driver's task. An F-buffer could be designed to work a single pass over an entire primitive or multiple passes of a primitive section. An F-buffer that works over an entire primitive would store all the pass intermediates into an F-buffer in FIFO order. This would likely have to go off chip. For a 30 fragment primitive, something like :
SUB_PASS_1 : EXEC 30 fragments and PUSH 10
SUB_PASS_2 : EXEC 30 fragments and POP 10 and PUSH 5
SUB_PASS_3 : EXEC 30 fragments and POP 5 and DRAW 30
For each pass, the "drawing" and pushing/poping will occur at an instruction level. It's likely that the driver will have to modify the shader by explicitly changing instructions to write outputs into a "special" F-buffer register. So it is also likely, that the driver would preprocess long shaders to help the hardware do the multipassing. Still, this is much easier (and faster) than explicit software multipassing.
Since the chip store shaders in static memories (at least today they probably do this), it would complicate the shader scheduling by requiring a DMA to fetch each sub-pass from memory. And, of course, the F-buffer would live in local memory. Relatively easy to implement.
Other implemenations can work, but there are separate issues.
- SM