Why is everyone doubting the possibility of 8 texel sampling units per pixel pipeline for R350.
That is a good question. The reason why most people doubt "2 TMUs per pipe", is because it is believed to be a significant architectural change from the R-300, which most tend to assume won't happen, givin both the constraints of time and transistor budget on 0.15u.
That being said, it would not surprise me at all if R-300 turns out to be more or less what you are proposing: Double the texel sampling units per pipe, though expressly limited to filtering capability. Especially if we consider the R-300 might already have the transistors there for it, but it might not be enabled or completely functional.
Consider the difference between GeForce256 and GeForce2 GTS. AFAIC, the GeForce256 had a "broken" or just not fully implemented 2nd TMU per pipe. The GeForce256 could do 8 samples per pipeline, "with only 1 TMU per pipe." The GeForce2 could also do 8 samples per pipeline, even though it had "2 TMUs" per pipe. Seems to me that the GeForce256 essentially had 2 TMUs per pipe also, but that 2nd TMU was not completely functional (intentional or not), and was limited to just reading in texels for Trilinear, as opposed to being a fully flexible and independent unit.
This might apply to R-300 similarly: The extra texel reading units might be there already, just not "functional." I still do NOT expect to see a "2nd TMU" per pipe, meaning that each TMU is fully flexible and in theory would (for example) double the multitexture fill rate test on 3D Mark 2002.
I do accept as possible, however, that the R-350 might be able to do exactly as you propose: enable double the texel reads for increased trilinear / anisotropic performance. That would very nicely explain how there are two camps that claim seemingly contradictory specs: 1 TMU or 2 TMUs per pipe.
Exactly because the definition of "TMU" is not clear....
Also, refering to these as "TMUs" nowadays seems a bit anachronistical to me.
Right.