That's the opposite of what is happening. None of the temporal upscaling algorithms apply any kind of AA (except for some very basic morphological AA in the case of TSR) before upscaling. The idea behind all modern temporal upscalers is to first upsample the low res input image to a higher resolution, then accumulate details frame by frame in the higher output resolution by warping the previous frame (essentially resampling it via MVecs) using low res motion vectors for the moving objects (and typically calculating them on the fly for the camera projection to save bandwidth). Every resampling stage adds blur, as resampling is a weighted averaging of several pixels, and any averaging of neighboring pixels introduces some degree of blurring. However, there are techniques to minimize that.
The issue with the TAA lies not just in resampling but also in the pixel rectification algorithm, which typically uses neighborhood color clamping. In this algorithm, for every pixel, a 3x3 neighborhood of pixels in the current and previous warped frames, called the history buffer, is checked. If the color in the 3x3 neighborhood of the previous frame differs significantly from the same neighborhood in the current frame, the history pixels that fall outside the current frame's color bounding box are either discarded or clamped to the current frame's color.
The problem with this approach is that it not only discards disoccluded areas with neighborhood color clamping, but it also drops all small subpixel geometry or pixel details. This is why TAA never converges for subpixel details and introduces a significant amount of blur. Therefore, TAA is inherently lossy.
Modern DL-based temporal upscalers ditch color clamping altogether and instead use an autoencoder CNN network to determine where to blend objects and where not to (that's why they can get away with 8 bit precision in many cases). These networks are obviously much smarter than a simple heuristic like color clamping, and they can be trained on various cases. As a result, they can analyze what is happening in the frame, blending where necessary and avoiding blending where it is not.
FSR 2.0 does not eliminate color clamping, but it attempts to restore subpixel details by applying the pixel locking heuristic on top (with varying levels of success). As the name suggests, the pixel locking heuristic locks thin features and removes the locks using several other heuristics (by calculating depth discontinuities between frames, using the reactive mask, etc.). Unfortunately for FSR 2.0, these lock removing heuristics are quite fragile, so you often see high frequency ghosting on internal surfaces of objects with high reflectance, such as water, especially if these objects lack motion vectors. This causes water to often look messy with FSR 2.0, and the same issues arise with fire and other particle effects. As a result, FSR 2.0 typically faces more problems compared to DL methods.