NV40: TMU Pooling and proper filtering?

DaveB presented the following in his NV40 preview:
In our discussions, David Kirk suggested that the texture units on NV40 will operate not at the pixel level, but at the quad level, such that if it was determined that the texture sampling requirements of the entire quad were less than the overall sampling abilities of the quad of texture units then potentially trilinear sampling may be achievable in a single cycle over the entire quad, where two cycles would be required with each texture sampler fixed to a pipeline.
We asked Emmett if this was the case with NV40 and he replied that "some in the NV4x range would feature this as it has benefits and drawbacks."
Emmit mentioned some drawbacks; what would those drawbacks be? Anyone find out if this is the case for NV40?

On a second note, Dave added the following to his discription of NV40's filtering method:
It is thought that the normal GeForce Anisotropic Filtering mechanism is still available within NV40, however its understood not to be presently available in the drivers - presumably the "High Quality" mode of the "Image Settings" option will enable this.
Why is it thought that normal anisotropic filtering is available in NV40? Any additional comments or information on this bit?
 
My first guess would be that doing what was quoted would require extra cache, and so the lower-end versions may cut out such a feature to save transistors.
 
FWIW it is my understanding that even the texture filtering in NV2A/25 worked at the quad level.

As long as you could fetch all the appropriate number of samples for all of the pixels in the quad, filtering was free. This is backed up by some of the bench marks I've run on NV2A.
 
I think the non-bolded part of the first quote is the notable part and that is what the second quote is referring to. Not just operating on quads at a time.
 
3dcgi said:
I think the non-bolded part of the first quote is the notable part and that is what the second quote is referring to. Not just operating on quads at a time.

To be clear I was refering to the first bolded part specifically.
The real limit on NV2A at least wasn't one bilinear pixel, but N texture samples per quad, I don't off the top of my head remember what the exact number was.
 
I'd guess the drawback is more latency, something they've aggressively worked at reducing in the NV4x. Could be wrong though.

Uttar
 
Back
Top