full virtualisation of memory, and shaders. so that you define different shading-units, that will do different tasks, and can bind shaders to them. the hw then profiles how much workload each defined unit gets, and schedules a different amount of internal processing units (the 16 ones we have now) to the different user defined units.
that way, you could, for now, define the vertex and pixel shading units, and, if the hw is designed right, you can, instead define the 3 units of renderman, to map shaders directly to hw, or you can define an intersection, and a shading unit, to do hw raytracing.
and each of those (and many more possible) can run at 100% gpu speed (unlike now, where you have to drop vertexshader units for example if you try to do some raytracing..).
another gain of this is the ability to have multiple thread units on the gpu, allowing (longhorn style) running different threads at the same time on the gpu. that means, two longhorn windows, both with some shading (one rendering a modern video, processed in video shaders, one doing some complex rendering tasks, say 3dsmax in the background, while you watch a movie), and both get a part of the gpu for its own.
and the most important: plugability. the removal of actual video output from the generic gpu. make it a render card, not a graphics card. instead, introduce a display card, wich is merely there to talk with a lot of hd screens in different settings (so you buy a small pcie-1x card for your ordinary 1024x768 tft you had since long time now, and you buy a pcie-4x for the new hd-tv you buyed, and just plug them in. the pcie-16x gpgpu's on the other hand just render, and don't bother to wich display card they have to send the images. full hw acceleration all the time, independent on (multi-)display modes at all.
another thing that bothers me is the temporal antialiasing.. this can get, if performance moves on, even motion blur, if continued.. i mean, having several houndreds of fps on q3, set to 60fps max with vsync on leads to 5 and more frames (and at lower res, much more
) per "screen-frame". this could directly cummulate into an accumulation-buffer to get motion blur. possibly something directly choosable in a next driver revision? or on nextgen hw, at least. (there is hw accumulation on radeons, and they have fancy post-effects like those smartshaders we had a contest on.. and they have trible-buffering-support, too.. in combination, we could have it even now).
those where some random things that moved trough my brain.