Road to Anti-Aliasing in BRE

I want to implement anti-aliasing in BRE, but first, I want to explore what it is, how it is caused, and what are the techniques to mitigate this effect. That is why I am going to write a series of articles talking about rasterization, aliasing, anti-aliasing, and how I am going to implement it in BRE.

Article #1: Rasterization

All the suggestions and improvements are very welcome! I will update this post with new articles
 
Interesting results you got with instancing tests with your Fermi (GTX 680) GPU:
https://nbertoa.wordpress.com/2016/02/02/instancing-vs-geometry-shader-vs-vertex-shader/
https://nbertoa.wordpress.com/2016/02/04/instancing-vs-geometry-shader-vs-vertex-shader-round-2/

Geometry shader usually doesn't score wins, but in this special case, your instance vertex count is tiny. Only 8 vertices. Fermi doesn't seem to pack vertices from multiple instances to a single warp. As warp is 32 threads and each instance is only 8 vertices, you are likely seeing only 25% utilization of vertex waves. There could be some other bottlenecks as well. The bottleneck disappears when you bump up the vertex count to 44 per instance, which is still a very low vertex count per object by today's standards. In practice most objects have more vertices than that.

On AMD card (GCN) you would see different results, since AMD packs multiple instances to each wave (64 threads). AMD GCN1-3 (Radeon 7000 series, 200 series, 300 series) also have poor strip rendering performance. Geometry shader outputs strips and strip cuts, which is bad for AMDs architecture. Polaris and Vega have improved strip rendering performance, but I would expect instancing still to beat geometry shaders, even at very low vertex counts.

There is a workaround for the instance packing inefficiency. You create a vertex buffer with N copies of the same object, for example N=4. Then you use SV_InstanceId and SV_VertexId to calculate the actual instance id, and do custom fetch of instance data from a buffer. Use constant buffer for instance data if your instance count is small, since Nvidia and Intel have special hardware (and special on-chip memory) for fetching and storing constants. Draw calls with huge amount of instances need to use Buffer<T>, StructuredBuffer<T> or ByteAddressBuffer for instance data.

There are also various tricks you can use to avoid instancing completely and reduce the vertex data size. Very helpful when rendering lots of instances with tiny vertex counts:
Thread: https://forum.beyond3d.com/threads/programmable-vertex-fetching-and-index-buffering.57591/
Post about emulating multidraw with index packing: https://forum.beyond3d.com/posts/1900656/
 
Last edited:
Thanks, sebbbi for reading those articles and for the explanation about the different vendors and architecture, I did not know those details.

I will check the links that you mentioned.
 
Back
Top