The comparison is not valid. AOIT is a lossy OIT algorithm whereas a fragment sort gives you correct ordering.
True, but the visual difference is negligable with as few as 4 notes (see the paper for more results). Even 2 nodes is usually fine especially if you have even a rough sort (which most games do).
Being an approximation doesn't make a comparison invalid, it just means you have to compare both the image error and the performance. Hell blending pretty much anything (particles, hair, etc) is a huge approximation already compared to reality, so it's hard to argue on a theoretical purity level. And as we all know, game developers don't really care about ground truth anyways as long as it looks good, behaves well and is fast
Anyways my only point there was to emphasize how expensive sorting fragments is on GPUs. It's not a particularly SIMD-friendly algorithm, particularly with linked lists. I think the fragment sorting thing could be made to work better in a sort-middle architecture than an IMR to be honest, as then you could use local memory and organize it a lot better than linked lists. It's quite unfortunate that the DX/UAV/IMR model has forced us into the global atomics/scatter solution.
Now to be fair I'm a linked-list hater even on the CPU (where they are less bad), but I'm in good company judging from the game dev twitter conversation the other day
A better comparison would be to compare AOIT with a K-nearest fragment sort, whereby only the first K-fragments are sorted and remaining ones composited (or blended out of order).
For a given K/storage size, AOIT already gets you a better result than a K-buffer (arguably a K-buffer is just a different replacement strategy). The key insight is that "nearest" isn't the greatest heuristic in a lot of cases if the transmittance of those fragments is very high. It's better to optimize for the error in transmittance over the curve (i.e. contribution to the final pixel) directly. Again, this is all covered in the paper from HPG 2011.
That said, even simpler heuristics work pretty well in practice. I think Marco's upcoming paper will discuss some of that in more detail as well.