Fast A-Buffer Algorithm Demo Using OpenGL 4.0

No limit on the number of samples per pixel (only on the total storage for all the pixels). At least if they use the same approach as AMD did with DX.
 
The OpenGL implementation is different than AMD's Mecha demo and the stencil routed A-buffer Humus implemented. I don't think it does anything AMD hardware doesn't support, but maybe OpenGL has some limitations I'm unaware of.

As described on the developers blog memory is pre-allocated for every fragment that might be needed. Say 16 fragments per pixel. I don't think it's a complete A-Buffer implementation because I didn't see a coverage mask used anywhere when quickly looking at the shaders.

AMD's Mecha demo uses a per-pixel linked list so it can use less memory in most cases.
 
Yeah from the code it looks like a standard k-buffer (a-buffers are generally linked lists per pixel while k-buffers are pre-allocated arrays). AMDs method should actually be both faster and use less memory with a decent hardware implementation since the UAV "counters" used are actually much faster than even uncontended global atomics (they're a special case that can be nicely batched up).

It is worth pointing out that it sucks that the pixel shader UAV stuff in DirectX 11 wasn't included in any form in OpenGL 4.0 (that's a pretty big omission considering it brings it up to the same level as DirectX in almost every other way) but nice to see an NVIDIA extension to fill the gap somewhat.
 
I was going to suggest using OpenCL for the append stuff, but there's no append concept in OpenCL either. What were they thinking?...

Jawed
 
I was going to suggest using OpenCL for the append stuff, but there's no append concept in OpenCL either. What were they thinking?...
There's an extension in OpenCL for the counters IIRC, but you really need pixel shader append for OIT unless you want to rasterize in OpenCL :p
 
Back
Top