It's super-useful! Fully programmable blending, z and stencil is just the top of the ice-burg... what about building up a K-buffer in a single pass? How about building a deep shadow map in a single pass? It's useful in precisely any situation that you need to build up a data structure based on all of the fragments that hit a given pixel.
And on a tiled renderer like Larrabee, it goes even beyond that: all of the render target data is sitting close to the processors, so you happily work away on this data in a R/W fashion and only write out the final results. Deferred shading w/ MSAA and tone-mapping with only the final, resolved 32-bit RGBA buffer ever leaving the local cache? Yes please
Exactly. And it also allows completely new algorithms to be implemented that nobody has tried to solve before with the GPU because of this limitation. Imho this is much more important feature than geometry shaders or GPU tessellation (and likely consumes less die space if implemented correctly).