Hmm, Charlie seems convinced that NVidia is building something very similar to Larrabee, with little fixed-function hardware. Does that seem likely?
Maybe it could happen, though like Charlie I would question whether it would be wise to try to out-Larrabee Larrabee.
Nvidia has set a precedent for shaking things up a fair amount on occasion, thanks to G80, but there was at least a significant API shakeup to coincide with that change.
A clean-sheet design that would basically abandon a huge chunk of the G80-GT200 framework would take time and resources to bring about. Given the time cycles for something like that, the roughly four years since the completion of G80 (assuming GT200's somewhat underwhelming improvements meant it was a secondary effort) would be a frighteningly tight timeline to architect a general purpose VLSI architecture.
Larrabee-esque is also something of a broad categorization with a lot of wiggle room. That there could be commonalities seems inevitable, since anything extending programmability or closing the read/write loop on GPUs would appear to be Larrabee-like.
The amount of time since Intel released actionable details of Larrabee would not be enough to complete G300, so a direct rip-off seems unlikely and also wrongheaded (might be a patent minefield there as well). There are corners of Larrabee's design that were necessitated by the choice of a specific x86 core and the x86 architecture. There would be no reason to voluntarily inflict them on a new design.
To top it off, with Theo, Fuad, and Charlie postulating, we have three current or former Inq writers spouting on future GPU designs (Three and a half if we count the gallstone Charlie must have given the sheer amount excess of bile we see in his writing. Seriously, I think he gave my monitor jaundice. Nvidia must have taken advantage of him on prom night or something) whose various stories need to be stitched together or need to be tracked to see in a few quarters how much each was on the "make shit up" train.
Could it be that NVidia is simply adding D3D11 features by running them on the shaders, e.g. the tessellator?
If that's so, then that doesn't necessarily mean NVidia's ditching most fixed-function units, such as ROPs.
Possibly, maybe.
There are ways to make special-purpose hardware sit alongside general-purpose cores such as making them units within cores, through memory messages, or special signalling paths.
x86 made the first difficult, the second necessary*, and the third for the most part impossible for Larrabee.
Nvidia had the advantage of adding whatever bells and whistles they wanted.
*Maybe if they had done what
I would have done...
Could that strategy be playing out in this shader-centric D3D11-features model? Most of the new stuff runs solely on the ALUs. If it performs like shit who cares? The D3D10 features will have maximised die-space and the increase in compute performance is most important for CUDA's sake.
A big problem I see, as was noted in the discussion concerning the latency of Nvidia's atomic ops, was how the read-write-read process for GPUs with their read-only caches was so very long.
As far as general computation is concerned, the rearchitecting of how caches interact would be something Nvidia would be interested in looking at...