Fast A-Buffer Algorithm Demo Using OpenGL 4.0

Broken Hope · Jun 17, 2010

It's a shame that this is using a Nvidia exclusive extension, I'm guessing ATI don't support anything similar?

http://www.geeks3d.com/20100610/3d-programming-fast-a-buffer-algorithm-demo-using-opengl-4-0/

Jawed · Jun 17, 2010

Is the stuff that's in D3D11 to make this work really not in OpenGL 4.0?

The Mecha demo is much the same conceptually:

http://developer.amd.com/samples/de...sRealTimeDemos.aspx?cmpid=DevBanner_5800Demos

Is the OpenGL approach better?

nbohr1more · Jun 17, 2010

Advantage over this method other than OpenGL???

http://www.humus.name/index.php?page=3D&ID=76

MfA · Jun 17, 2010

No limit on the number of samples per pixel (only on the total storage for all the pixels). At least if they use the same approach as AMD did with DX.

3dcgi · Jun 18, 2010

The OpenGL implementation is different than AMD's Mecha demo and the stencil routed A-buffer Humus implemented. I don't think it does anything AMD hardware doesn't support, but maybe OpenGL has some limitations I'm unaware of.

As described on the developers blog memory is pre-allocated for every fragment that might be needed. Say 16 fragments per pixel. I don't think it's a complete A-Buffer implementation because I didn't see a coverage mask used anywhere when quickly looking at the shaders.

AMD's Mecha demo uses a per-pixel linked list so it can use less memory in most cases.

MfA · Jun 18, 2010

So it's actually closer to the stencil buffer demo than AMD's.

Andrew Lauritzen · Jun 20, 2010

Yeah from the code it looks like a standard k-buffer (a-buffers are generally linked lists per pixel while k-buffers are pre-allocated arrays). AMDs method should actually be both faster and use less memory with a decent hardware implementation since the UAV "counters" used are actually much faster than even uncontended global atomics (they're a special case that can be nicely batched up).

It is worth pointing out that it sucks that the pixel shader UAV stuff in DirectX 11 wasn't included in any form in OpenGL 4.0 (that's a pretty big omission considering it brings it up to the same level as DirectX in almost every other way) but nice to see an NVIDIA extension to fill the gap somewhat.

Jawed · Jun 20, 2010

I was going to suggest using OpenCL for the append stuff, but there's no append concept in OpenCL either. What were they thinking?...

Jawed

Andrew Lauritzen · Jun 20, 2010

Jawed said:
I was going to suggest using OpenCL for the append stuff, but there's no append concept in OpenCL either. What were they thinking?...

There's an extension in OpenCL for the counters IIRC, but you really need pixel shader append for OIT unless you want to rasterize in OpenCL

Jawed · Jul 23, 2010

http://blog.icare3d.org/2010/07/opengl-40-abuffer-v20-linked-lists-of.html

...I implemented a variant of the recent OIT method presented at the GDC2010 by AMD and using per-pixel linked lists. The main difference in my implementation is that fragments are not stored and linked individually but into small pages of fragments (containing 4-6 fragments). Those pages are stored and allocated in a shared pool whose size is changed dynamically depending on the scene demands.

jeraldfler · Sep 8, 2010

Fast is subjective

Fast A-Buffer Algorithm Demo Using OpenGL 4.0

Broken Hope

Jawed

nbohr1more

MfA

3dcgi

MfA

Andrew Lauritzen

Moderator

Jawed

Andrew Lauritzen

Moderator

Jawed

jeraldfler

Similar threads