I meant texture fetches, assuming I know what you're talking about. If the R300 is indeed capable of taking up to 16 samples from a single texture per pixel pipeline per clock for use in bilinear, trilinear, or anisotropic, then that is very impressive.
Well it can do 16 texld's in a single pass. You can assign all 16 to the same texture if you wanted to. Whether it can do this in a single clock on 8 pipelines simultaneously is probably not even technically possible given the memory bandwidth, but given that it's coming off the same cached data, who knows?
Eight pixel pipelines capable of 16 texture fetches, all operating in 128-bit? It just seems to me that that would need more transistors than the Radeon offers.
It seems that's what ATI is telling us.
Additionally, I don't believe there is any problem with using "merely" 64-bit fp color, even for multipass, as long as the internal processing is all carefully done to minimize as much error as possible
Well if all your calculations are done on 128bit FP color, then you run into serious precision issues in multi-pass shader implementations if you can only output 64 bit FP color. By maintaining the same precision as internal, your shaders can be arbitrarily long without fear of lossiness based on number of passes.
I really don't believe 128-bit color will be used much at all, except in certain special cases, such as with high-resolution normal maps for use with bump mapping or displacement mapping.
Huh? For displacement mapping?
Oh, and one last thing. Using 128-bit color in the framebuffer would make it very hard to use much of any FSAA at high resolutions.
You mean from a memory footprint view? MSAA takes care of that. In any case, we're not talking about displayable 128bit color. We're simply talking about 128bit output for intermediate shader passes. Presumably there is some mechanism for converting this to 24 or 30 bit for display.
There may be efficiency issues involved here, and there's also the fact that complex shaders (lots of textures) won't be much worse on a 4x2 pipeline.
But they will be. All of the existing 4x2 architectures have much lower memory bandwidth as well as fewer texture fetches allowed per pass. There are going to be a lot more memory fetch stalls on an R200 or NV25. There's no question in my mind that the R300 should technically have the most efficient shaders of all existing solutions.