GT212 isn't coming anytime soon and they need something to counter RV770X2 with.
On the other hand, 384 SPs and 96 TMUs (presuming they're going 32/8 TCPs for GT21x) on 40nm is nothing to shout about. Such chip should end up being in the same league as G92 die size.
Purely theoretically and only in terms of ALU throughput with a decent ALU frequency it could end up close to or slightly over 2 TFLOPs. You mean G92@65nm? If yes nothing to disagree (and all of the above is of course 100% speculative).
I wouldn't count on any changes to the basic building blocks until DX11 GT300 or whatever they'll call it.
But since they'll still need to change the ROPs to support GDDR5 there is a possibility that they'll 'fix' MSAA 8x performance in their GDDR5 cards...
If the problem lies in the triangle setup (its a rumour that's circulating, I've no idea as a layman if it even makes any sense), then the answer is probably no. Pardon me but what's the big deal about 8xMSAA anyway? Personally if I'd have a SLi or Crossfire system I'd still go for the highest possible resolution and I have severe doubts that you end up in the majority of cases with playable performance with more than 4xMSAA samples. And no before anyone else says there's really no good excuse either for NV not to have as fast performing 8x sparse MSAA as AMD (since its my understanding it should require only two cycles for 8xMSAA), but what hardware engineers always say (and they're essentially right) is that you can never get everyone satisfied.
Personally give me a combination of coverage sampling with fast performing edge detect custom filter AA in the future (always on top of at least 4xMSAA) and I'll be a much happier user than with ordinary 8xMSAA. Box filters won't cut it for very long and that of course always IMHLO.