I think the TUs cost so much that no substantial increase will be seen.
With batched texturing, a 64-object batch requires 4 clocks in RV670 TUs, and this duration is extremely unlikely to change. So then you have to consider a batch size on the TUs that matches the batch size on the ALUs (4 clocks * SIMD width).
So 480 ALU lanes is either 6 SIMDs with 64-object batches or 4 SIMDs with 96-object batches. The latter would lead to 24 TUs, which I think is in transistor budget-busting territory. Plus I think an increase in batch size is unlikely. Finally I expect ALU:TEX ratio to continue increasing, so 6:1 fits the bill. It would be ironic if it turns out that RV670 is generally ALU-bound - like R520 was, such that the TUs are underutilised... Certainly there are plenty of individual shaders in many games that are ALU bound - it's just a question of whether games are globally ALU-bound.
It may turn out that RV670 runs out of batches in flight all too easily, so turning theoretically ALU-bound shaders into TEX-bound shaders. Who knows...
As for TAs - the point sampling and filtering parts of the TUs are independent - able to run in parallel. Point sampling is not just about fetching vertex data. I will admit that this part of R6xx is pretty poorly understood - e.g. how much the compiler is able to parallelise unfiltered and filtered fetches for pixel shader code.
Jawed