Since the early days of hardware-accelerated 3D graphics, GPUs could perform a small set of blending functions while drawing polygons: Add, Subtract, Min, Max, etc. with selectable source and destination factors.
What I find strange is that blending has not evolved beyond primitive functions. The first GPUs with programmable pixel shaders were released over 15 years ago, and yet the newest versions of OpenGL, Direct3D and Vulkan lack programmable blending.
Programmable blending seems simple: Just let pixel shaders read the last framebuffer value. It's not as if nobody ever thought of this before. In the GLSL 1.10 specification, there is this issue:
A whole decade later, programmable blending isn't standard. If we want it, we are forced to use hacks involving auxiliary framebuffers or atomic loads/stores.
It seems like manufacturers are slowly, slowly catching on to the fact that programmable blending can be useful and valuable to developers. Recently, the OpenGL extensions GL_ARM_shader_framebuffer_fetch and GL_EXT_shader_framebuffer_fetch appeared. These extensions offer a "gl_LastFragData" variable to pixel shaders with some caveats: Multiple Render Targets may not be supported, depth and stencil data is not available without another extension, and floating-point buffers are not supported.
The most obvious problem with programmable blending performance stems from how the GPU cannot exploit mathematical properties of the blending function. For instance, there is no way to tell if blending is commutative or associative, so if a group of triangles are submitted, and some of the triangles overlap in framebuffer space, blending correctly requires rendering each triangle one-at-a-time. This could be mitigated by extensions like GL_INTEL_fragment_shader_ordering which gives shaders more control over the order of memory operations.
I feel that standard-writers cannot ignore programmable blending. Developers want to have more flexible blending capabilities instead of being forced to use hacks and workarounds. Plus, framebuffer-fetch is part of Apple's Metal API, meaning it is already a part of millions of devices on the market today.
What I find strange is that blending has not evolved beyond primitive functions. The first GPUs with programmable pixel shaders were released over 15 years ago, and yet the newest versions of OpenGL, Direct3D and Vulkan lack programmable blending.
Programmable blending seems simple: Just let pixel shaders read the last framebuffer value. It's not as if nobody ever thought of this before. In the GLSL 1.10 specification, there is this issue:
23) Should the fragment shader be allowed to read the current location in the frame buffer?
DISCUSSION: It may be difficult to specify this properly while taking into account multisampling. It
also may be quite difficult for hardware implementors to implement this capability, at least with
reasonable performance. But this was one of the top two requested items after the original release of
the shading language white paper. ISVs continue to tell us that they need this capability, and that it
must be high performance.
RESOLUTION: Yes. This is allowed, with strong cautions as to performance impacts.
REOPENED on December 10, 2002. There is too much concern about impact to performance and
impracticallity of implementation.
CLOSED on December 10, 2002
also may be quite difficult for hardware implementors to implement this capability, at least with
reasonable performance. But this was one of the top two requested items after the original release of
the shading language white paper. ISVs continue to tell us that they need this capability, and that it
must be high performance.
RESOLUTION: Yes. This is allowed, with strong cautions as to performance impacts.
REOPENED on December 10, 2002. There is too much concern about impact to performance and
impracticallity of implementation.
CLOSED on December 10, 2002
A whole decade later, programmable blending isn't standard. If we want it, we are forced to use hacks involving auxiliary framebuffers or atomic loads/stores.
It seems like manufacturers are slowly, slowly catching on to the fact that programmable blending can be useful and valuable to developers. Recently, the OpenGL extensions GL_ARM_shader_framebuffer_fetch and GL_EXT_shader_framebuffer_fetch appeared. These extensions offer a "gl_LastFragData" variable to pixel shaders with some caveats: Multiple Render Targets may not be supported, depth and stencil data is not available without another extension, and floating-point buffers are not supported.
The most obvious problem with programmable blending performance stems from how the GPU cannot exploit mathematical properties of the blending function. For instance, there is no way to tell if blending is commutative or associative, so if a group of triangles are submitted, and some of the triangles overlap in framebuffer space, blending correctly requires rendering each triangle one-at-a-time. This could be mitigated by extensions like GL_INTEL_fragment_shader_ordering which gives shaders more control over the order of memory operations.
I feel that standard-writers cannot ignore programmable blending. Developers want to have more flexible blending capabilities instead of being forced to use hacks and workarounds. Plus, framebuffer-fetch is part of Apple's Metal API, meaning it is already a part of millions of devices on the market today.