Okay, here's what I'm pretty sure is happening. Clues:
1) nVidia says they're using machine learning and super sampling.
2) Some screenshots demonstrate upscaling, e.g. 1080p source for a 1440p display.
3) Other screenshots demonstrate better detail even than same-resolution with no AA.
What I suspect is going on here is that, in the example of a 1440p output resolution, they use a 1080p minimum resolution. However, each individual pixel in the 1080p framebuffer might actually include data for a bunch of different sub-pixels. They could be entirely flexible with this: each pixel would be saved to a framebuffer to store up to a maximum number of samples (say, 8). They could reduce the storage size using some compression too. And when they don't need to render a 1080p image before producing the final 1440p output: they can take the image with all the extra samples for some pixels so that you get the full benefit of super-sampled 1440p when you need it.
They would also output a set of numbers representing the rasterization inputs. These could be things like identifiers for both input data and shaders applied (or even just the number of them), colors of lights applied, distance to the location, etc. They don't have to output these values for every single pixel, just a representative sample of them. There would be a tradeoff between storing more data per pixel and storing the data for more pixels. This set of inputs is important, as it's necessary to train the learning model.
The final step is rescaling the image to 1440p. During this step, the rescaler has access to the colors of neighboring pixels, and is able to create an estimate of how much aliasing was found in the final image. A very simple score for aliasing would be color contrast. But they might do something a little different to ensure that more detail means a higher score.
The two data outputs from this process are combined each frame to update the learning model: the set of inputs and the per-pixel score are used to update the learning model. The learning model then takes the set of inputs to estimate how many sub-samples should be used for each pixel. This calculation is probably going to be the biggest limitation on the number of inputs they actually use. The actual calculation performed here is basically a matrix multiplication, which these cards are good at. But too many inputs and it will overwhelm the other rasterization calculations.
Finally, why 1080p? Why not have the minimum resolution be 720p? Or keep it at 1440p for quality?
Performance is surely part of the answer. But I think the bigger answer is simply that learning models always have problems with tail effects. Learning models make ridiculous errors, and it seems to be pretty much impossible to avoid them entirely. Performance suggests the minimum should be a lower resolution. The tail error issue with learning algorithms suggests it should not be 1440p, because some areas of the scene are going to end up with no anti-aliasing. And making the resolution too low will have the same issue only worse. So going down by a half-resolution step to 1080p is perfect: performance should be good, and you get a little bit of automatic anti-aliasing no matter how badly the ML algorithm fucks up.
Finally, the nature of this kind of algorithm is such that it would probably benefit greatly from pre-baked learning models for each game. Which might explain why game support is important.