Me and our technical AD were very impressed about your visual quality and consistency. You have achieved very nice results for such a small team.
The required functionality to do it has been available for a long time, but it hasn't been popular at all. This is probably due to the higher cost on PC. 3 years ago when I did lots of experimentation, every GPU I tried performed quite a bit worse with a custom resolve. It was especially bad on AMD GPU's, which have a high fixed overhead just from accessing an MSAA texture as an SRV. Nvidia has gotten better in recent years, and the cost is not so bad now. I'm not sure about AMD PC hardware, as I haven't tried anything recently. On consoles it's possible to do it rather cheaply, since you have access to low-level data that can accelerate the process. In fact I was able to beat a hardware resolve with my compute shader when I didn't have temporal AA enabled.
On Xbox 360 you couldn't access the multisampled EDRAM data without a fixed function resolve (custom resolve was not possible). I believe this was one of the (many) reasons why custom resolve based pipelines were not that popular last gen.
From what I understood from your presentation, you guys aren't exactly doing a typical resolve" since you're not actually using MSAA to oversample anything. You're instead using it as a way to shade and output your G-Buffer at a lower frequency, and you then do an in-place upsample by interpolating the UV's and tangent frame to your subsample locations. Which is very clever, by the way!
In our current renderer, we actually use 8xMSAA and pack four (2x2) 2xMSAA pixels inside it. So we actually have 2xMSAA per pixel. We use custom MSAA pattern to make all our 8xMSAA quadrants to have identical 2xMSAA sampling pattern. Without custom sampling patterns this technique produces (slight) jitter at object edges.
We have also experimented with EQAA. Because we pack four pixels to a single 8xMSAA pixel we actually have less "unknown" samples, meaning that 2xMSAA + 2xEQAA (actually 8xMSAA + 8xEQAA shared between four pixels) produces a result closer to 4xMSAA. Your AA quality is stunning (very stable). I will be certainly looking at your implementation, and adapting some ideas
I was about to post follow up questions regarding to your SDSM and EVSM implementations, but I noticed that you have already explained these very well in your blog. In Trials Evolution we also used similar simplified SDSM implementation than yours (depth only fitting). We also had problems in read back latency (culling in CPU side). We had to use several conservative approximations to hide the issues. This eventually lead to the idea of GPU-driven rendering.
We also used EVSM with 16 bit channels (only positive moments, so two channels) and hardware MSAA in our depth maps (with Xbox 360 fixed function hardware resolve). Did you use a custom resolve for your MSAA shadow depth maps?
One nice thing about virtual shadow mapping is that it compresses the depth range as tightly as possible based on the screen space receiver geometry (=pixel) depth values (in shadow space). Our culling pipeline produces min/max depth for each shadow page, allowing us to tightly pack the depth values. This makes 16 bit shadow maps much more useful. It would be a nice experiment to combine this with 16 bit EVSM. Currently we use single tap PCF, as that provides nice antialiasing for our 1:1 resolution matched shadows, but I really miss blurry soft shadows. Prefiltering is also one advantage in variance based techniques, since we use XOR hashing for the virtual shadow map pages (to detect if they changed). Filtering cost is only paid when the page changes.