You're right that a true fork would cost a lot of state. About N times as much as without the fork. (N=number of samples.) But it could be seen more as a "virtual" fork. The state at the fork position is stored, and then iteratively run the SSAA part completely to the end, one subpixel at a time.
But that's of course just a subset of your idea.
You don't even need dynamic loops to do it your way. The number of subsamples should be fixed at render time. It could be accessed through one new PS instruction: SET_SAMPLE Rn; where Rn holds an integer that tells what sample is currently accessed. Let it automatically add offsets to iterators, and change what framebuffer pixel is read/written. If Rn<0 then switch back to MSAA mode (you'd need to do the downfiltering yourself).
Actually changing the sample pattern within the PS isn't likely easy (at least for geometry sampling), so that's probably best left constant. But even with the method above, you could do it by loading an iterator to a temp reg before you do SET_SAMPLE, and then add your own constants for shifts to change texture sample positions. And in fact, if you just want to supersample textures (or texture+some nonlinear filtering), and then blend the result to one value and use it for MSAA, then it's possible today.
I'm not sure what you mean by changing sample size, it sounds like something that is very hard to do.
Number of samples is also something that I think should be constant (not surprising if I think that the sample pattern should be constant). But again, if you mean a number of samples that will be blended to one MSAA color inside the PS, then yes that should be possible.
But now we need this to be used. As far as I know, no games yet are explicitly coded for FSAA, other than possibly an in-game FSAA level slider. This is far from what we talked about above. So it must be possible for the driver to change a shader to add the "supersampled" fog and frame buffer blending at the end.
As long as the hardware supports shaders that are a bit longer, and has a few more temp register than reported, it should be possible.
But how about the performance?
Well, that is of course a problem. The cost of doing fog and FB-blend in PS in SS style when all else is MS could be a quite big part of the PS instruction budget for short shaders. So the instructions to do that part better be efficient.
But that's of course just a subset of your idea.
You don't even need dynamic loops to do it your way. The number of subsamples should be fixed at render time. It could be accessed through one new PS instruction: SET_SAMPLE Rn; where Rn holds an integer that tells what sample is currently accessed. Let it automatically add offsets to iterators, and change what framebuffer pixel is read/written. If Rn<0 then switch back to MSAA mode (you'd need to do the downfiltering yourself).
Actually changing the sample pattern within the PS isn't likely easy (at least for geometry sampling), so that's probably best left constant. But even with the method above, you could do it by loading an iterator to a temp reg before you do SET_SAMPLE, and then add your own constants for shifts to change texture sample positions. And in fact, if you just want to supersample textures (or texture+some nonlinear filtering), and then blend the result to one value and use it for MSAA, then it's possible today.
I'm not sure what you mean by changing sample size, it sounds like something that is very hard to do.
Number of samples is also something that I think should be constant (not surprising if I think that the sample pattern should be constant). But again, if you mean a number of samples that will be blended to one MSAA color inside the PS, then yes that should be possible.
But now we need this to be used. As far as I know, no games yet are explicitly coded for FSAA, other than possibly an in-game FSAA level slider. This is far from what we talked about above. So it must be possible for the driver to change a shader to add the "supersampled" fog and frame buffer blending at the end.
As long as the hardware supports shaders that are a bit longer, and has a few more temp register than reported, it should be possible.
But how about the performance?
Well, that is of course a problem. The cost of doing fog and FB-blend in PS in SS style when all else is MS could be a quite big part of the PS instruction budget for short shaders. So the instructions to do that part better be efficient.