but you could PCIe x1 or x4 slots, similar to Matrox's use of PCIe x1 for their G4x series port, then again that isn't giving your PPU much bandwidth
as for interesting concepts, why not mimick 3DLabs or 3dfx and stick a geometry unit on the board, and put a coprocessor with it to do the crunching, you could cut costs per chip by not producing these 300-400M transistor devices, cut cooling down by moving components (this was shown to work even moving components on the same die, a University of Virginia experiment showed that principle for thermal management as viable, but two seperate dies in seperate areas would be even more dispersal of heat)
the only reason I consider this is if you observe dual core processors, and the soon coming multi-cores, they aren't being fully utilized (both cores aren't) by games, why not make one of those 2.8-3.6GHZ (or equiv) cores do # crunching for a slimmed down NV40 or G70 design?
the processing power seems to be there, the only issue might be bandwidth between CPU and GPU in this solution, but with HT that shouldn't be a problem
but if you did something like that, and used an on motherboard socket for this geometry/graphivs unit, you could upgrade more freely (it would seem) due to the cheaper package
then again, this is very similar to the concept of integrated graphics, and the RAM usage would go up along with IRQ requests made in order to perform calculations
the only other viable socket option I could see would be a continued utilization of PCIe and boards with sockets on them, when the socketed board "expires" you upgrade to a new one
RAM would be a potential issue due to speeds and bus width however, possibly a system with dynamically adjustable channels? the ability to re-configure the RAM anyway needed, from anything in a single or dual channel (32-64 bit) to higher widths, like 256-bit or 512-bit, you'd still have all the capacity, and your speed wouldn't have to be derived from pure clock speed
if the design used say, 1GHZ GDDR3 that can scale 32-bit to 512-bit, you can scale from 8GB/s to 128GB/s, easily enough bandwidth to handle anything from a GeForce3 Ti 200 to a Radeon X1800XT (with bandwidth left over)
the memory controller design for that however would be, costly to say the least
yet it would allow for a better upgrade cycle, and when 128GB/s is maxed out you would have to upgrade to a new board, maybe an XDR-II board @ 8.4GHZ with 64-1024 full adjustment, who knows?
idk, just my opinions
as for an on motherboard socket, the cooling requirements that would add make me squirm...just no