Hrm, does anyone know how difficult it would be to design a chip that can do a per frame switch between TBR + OD, IM + OD (HyperZ style) and IM? That'd be interesting. If the T&L is being done on the GPU wouldn't it be able to do a quick and dirty calculation to figure out which would be faster and if it's on the CPU, then it could be supported via an extention, hopefully it won't be hard to implment. If some game has some compatibility issues, the user could force a certian mode of operation or stop the card from using whichever mode is causing the problem. I just thought it'd be a neat idea.
I also would like to see greater than or equal to 32bits internal rendering percision.
I would really really really like to see an OPEN STANDARD for hardware texture compression/decompression (bring back FXT1, if possible, seeing as it's already done) and vertex compression/decompression.
To tell you the truth, right now I'm more interested in bettering older features before "moving" onto new ones, I'm not concerned about shaders. I would like to see fill rate go up by improving the amount/use of bandwidth, I want to see compression schemes that will free up bandwidth to allow better FSAA. I'd also like to see more aggressive filtering methods. As soon as these happen, bring on more features.
Also, what's the feasibility (minimal increase in price and additional perfomance hit relative to current FSAA methods taking equal number of samples) of implmenting a SS FSAA method that will change the number of samples and their orientation on a 4*4 grid based on depth and location on the triangle. I know smoothvision comes close to this, but I believe the sampling pattern doesn't adapt to location on the triangle and I don't think it reduces the number of samples based on distance.