What if nvidia continues training at their servers and releases updates once in a while?
If my supposition is correct, the issue isn't so much training. The thing is, actually running the learning model on each pixel has to be
fast. And I'm just skeptical that a fully pre-trained system can get there.
Here's the basic idea. Let's say that they have access to 16 input numbers when computing the DLSS result per-pixel and a simple learning model. With 16 input numbers, the learning algorithm will try to find some number of correlations between those 16 inputs and the final result. Each correlation factor increases the size of the model. For instance, if they just took the bare 2-point correlation (which has a different value for each pair of 16 inputs, including the self-correlation), then they've got 136 independent factors (N(N+1)/2). The model might even take into account 3-point correlations (which will have many, many more). But typically they'll discard factors which are too small in value to be meaningful, and only keep the top N factors (say, 50 or so). In that case, that means that in order to compute the final DLSS result, they've got to perform something like 50 scalar multiply-add operations to produce the answer.
And, well, that's probably not enough to get a good result. You could probably train a model like the above to work really well on a narrow set of different scene types. But there are a huge number. This is where deep learning comes in.
With deep learning, instead of just a single network, you split the network into multiple levels. At the lowest level you have a set of nodes which are used to compute the final numbers. The next level up decides which of those nodes to consider to be more important than others. Conceptually it's like the lower-level nodes compute the final results, while the higher-level nodes determine which set of low-level nodes to use. So it's kind of like the low-level nodes represent a class of models, while the high-level nodes determine which models to use.
More low-level nodes might represent good accuracy on certain scenes, while more high-level nodes represent flexibility in the number of different sorts of scenes you can render well.
The reason why on-the-fly training might be useful is that the pre-baked model has to cope with every single game situation that is ever thrown at the video card. The huge number of potential configurations, games, and situations within those games could easily cause an optimal learning algorithm for the above to explode drastically. You could gain a little bit of mileage by simply asking the game dev to tell you when you've loaded a particular level, so that nVidia just trains different learning models for each level, and switches the model as you progress throughout the game.
But it might be better to break up the info a bit: use the datacenter for doing the lion's share of effort in training the model. But let the video card take care of the rest by re-training some of the high-level nodes as it renders scenes. You'd have to take this re-training into account in the datacenter training, but it should increase the flexibility of the system as a whole, because it makes it so that the learning model doesn't really need to retain all of the information about all of the different types of models: it just kicks things off and lets the video card figure the rest out as it goes.
Basically, with on-the-fly training you could get away with a far, far smaller learning model, which means higher performance.
Of course, the above argument might not actually apply. It's conceivable I misunderstood what Tom's Hardware measured, with it taking a little bit for the model to "kick in". Perhaps different scenes in different games really are similar enough that a single model with no more than a couple hundred parameters or so can do the trick. But it seems unlikely.