While it's true that if you filter in ln space anistropic filtering is not correct, you can still generate your mip maps in ln space, which means that anistropic filtering can't deviate that much from the 'correct' answer. In practice what I've noticed in my tests is some minor overdarkening.
Interesting. What I was thinking was that when you linearly filter
ln values, you're no longer doing a weighted average of the original exponents and instead doing a weighted geometric average, so that would mess up your edge filtering. However, since you're prefiltering, it's probably not a big deal now that I think about it.
What is not working in ln space?
The estimated max function. I was talking about my use of the exponent, not ESM. The thing is that I didn't think of being able to use exp(O-R) to recover the depth test, and was instead thinking of linear ways to stop bleeding from the otherwise nice VSM.
Actually shadows intersections are fine in ESM as long as your receiver lies on a plane (from the light POV) within the filtering region.
True, but when that's not true the problem is similar, though located in a different area. One of the test cases I like for these fancy shadow techniques is two layers of objects over a ground plane. VSM shows bleeding on the ground plane, and ESM shows it on the second object.
Now I'm only using the positive moments of the depth distribution, the challenge is to also use the negative ones..
I don't see how the negative moments help you (at least from the perspective of a negative exponential). They give you information about the closest samples in your filter kernel.
The way I see it, the postive moments give you information about the weight of the furthest samples, and when they're all on the same plane as your receiver, then you can actually retrieve that weight and it equals visibility.
One thing I tried is using regular distance, a positive exponential, and a negative exponential. Then I can get the average, max, and min for an arbitrary kernel, and can get a nice gradient by linearly interpolating. The results are mostly good, but when there's three or more distinct groups of distance values in the kernel, somewhere in the scene there's an artifact. It's better than VSM, but not perfect.
For ESM, I think a more interesting direction for improvement is clipping the furthest values in each texel's kernel when prefiltering. It could introduce other artifacts, but maybe they're less objectionable.