Where are we sitting right now in terms of where we think this is going?
Judging from the latest rumors:
I'm thinking a roughly 50% upgrade from G92 with a 512-bit bus. 192sp with a 512-bit bus. On 65nm nothing else makes sense, there isn't a lot of plausible scenarios given G92 on the same process is already so big. This idea has been floating through chiphell since the inception of those rumors and it does seem the most likely.
Possible interpretations of 384sp could mean a couple things:
1. 192sp X 2 MADD = 384 MADDs. I see this as HIGHLY probable. Perhaps MUL is excluded, as it's not used for general shading, but rather for GPGPU. Who knows, maybe the MUL will gets a new name, like the "on-die physX engine" or something else marktastic™. It's very possible many new enhancements (through CUDA) will use the MUL (other than just SF), so I don't think it's crazy to look at it this way; as nvidia starting to separate the two. IE 384SP, but computational power of >1TB because the MUL is used for CUDA and presumably PHYSX.
2. 2x192sp = 384sp. Perhaps an SLI or 45nm GX2 product. You know with a die this big they will shrink it down, as they did G80, and repackage it along with a GX2 version. As R600/RV670 and G80/G92 as testaments, I believe nvidia will continue the Intel mack-daddy tick/tock R&D.
TANGENT =
On the other hand, I think AMD will pursue their route through power consumption, die size, and a balanced arch that can scale through multiple chips. With an optimal goal of succeeding with one chip that could span across all markets, using 1/2/3?/4 cores. The main object here would be to keep power consumption to their respective markets. 1 die card = <75W, 2 die <150W, 4 die < 300W, or as power connectors go: board power, one pci-e 6-pin, and one six pin and one eight pin. You can already see this pattern emerging with RV670 and it's 105W/190W designs, and furthermore perfected with a two-die system of RV770 supposedly using 135/250W, or a roughly 20W difference between cards (same as RV670 designs). I wouldn't be surprised to see R800 use a finished four core model shooting for 70W/120W/190W?/260W. A 256-bit mem controller is also a great place to start, as with using different memory it can scale from low (GDDR2) to high (GDDR5), with acceptable ram buffers for all (256MB, 512MB, 1GB,2GB?) to allow for appropriate buffer in a multi-gpu situation.
/TANGENT
At any rate...
If you figure g92's 334mm2 x 1.5, the very dirty and non-mathematically proportionate way, you get 501mm2, or roughly what we should be hearing to expect (a little bit bigger than G80?). We hear greater than R600 in die size, and currently perhaps at 310W power usage (from SSX in Taiwan, I don't know if he' reliable, but I seem to recall that name). It just seems to line up after deducting some die space for things that would not be there for redundancy, or different units using different amounts of transistors/space.
Also, 192x3 = 576. To acquire 1TF, it would require a shader speed of <1750mhz. To me this sounds very realistic, although I could see how it may require some juice to an already large die. G80 was a huge die of most likely comparable size, and it wasn't incredibly power hungry, so there could be room there to play with. Either way, it's more probable than seeing a die this big running 2000mhz shaders, as we see in the original rumor.
Irregardless, I think the main point we recognize is that the current nvidia architecture is severely bandwidth limited, as the 8800GTX could have even used more bandwidth. Even with an increase of only 50% in the shaders, a somewhat massive increase in performance could come not only from raw units, but better overall efficiency using the massive bandwidth from the 512-bit bus. A 512-bit bus (Allowing for a 1GB buffer) should help with multi-gpu scenarios as well.
It's also worth noting with a design like this would bode well for a "GTS" part. Using the same formula as the 8800GTS (rev1) It would contain 144 shaders, and a 384-bit bus with 768MB of memory...something that looks eerily similar to a part we've seen before, using roughly the same amount of die space. The differance, unlike it's brother, it would have increased bandwidth because of use of faster ram (at least 2400mhz GDDR3/4, compared to G80's 2000mhz).