I'm looking forward to the day when there's no dedicated texture-filtering hardware in at least one GPU :smile: But for the time being it seems the balance in terms of die size is to keep it fixed-function.
This may be purely because of the range of SKUs that an architecture needs to cover, something like a 10-fold range in performance.
e.g. on a high end GPU with 2000 ALU lanes at 1GHz there might not be any need for dedicated TF, but on the $30 GPU, a couple of hundred lanes, even at 1GHz, won't be enough.
As to the actual cost of TF, one of these days perhaps we'll have a thread that tries to get to the bottom of it. I don't know how to split-out the cost of TF from the rest of a TU. I'm hazarding a guess that the whole lot is in the region of 125M transistors in R670 (caches, thread arbitration, instruction issue, point addressing, filtered addressing, fetching point samples, fetching for bilinear, filtering). A fair amount of the TU needs sizing up in order to increase the TA:TF ratio.
Needless to say, I'm pessimistic about the degree of architectural change in R7xx. ATI's designed a set of knobs (SIMD and TU width, SIMD count, RBE count, MC count, MC width) and will frobnicate them for R7xx.
Jawed