TA vs. TF

mczak

Veteran
So, G92 has different texture units (like G84) than G80.
However, I still don't quite see how the original arrangement of the G80 makes sense. Per texture unit, it could do 4 TA and 8 TF (G92/G84 is 8 TA/8TF, HD2900 is 8TA/4TF per unit).
Now, to me the G92 arrangement makes the most sense (same texture addressing capability than what's needed for (bilinear) filtering).
The HD2900 makes a bit less sense (seems like it would only provide a benefit if you use point sampling / fetch4, is this that common?).
But how does the G80 arrangement make sense? What do you use the additional texture filtering capability for? It was said to provide free trilinear (or free 2xAF bilinear). How should that work? You still need 8 texture addresses for trilinear filtering (4 for one mipmap + 4 for another mipmap).
Or were those 4 texture addresses actually 8, but with parameter restrictions which now have been lifted in G92 (so it would actually calculate 8 addresses, but 4 addresses can't be setup independently and are rather a side-product of the other 4 so they can only be used for the same texture (and next smaller mipmap).
Oh, and there's another flaw in those calculations. Obviously, you need to fetch 4 texels for 1 bilinear-filtered output pixel. So you also need 4 times as much texture addresses - where are all these addresses coming from?
 
When we say TA, we're talking about the full texture address calculation: start from texcoords, figure out LOD, and determine what address the texels are at.

For G80, the extra TF units don't need to do all that. LOD calculation already tells you which two mipmaps levels are needed, and if you've already figured out where one of them is in memory, then the other is in a related location. The direction of walking through a texture for anisotropic filtering and the number of samples needed is also part of the LOD calculation. The second TF only needs to increment in this direction. For wider bit depths that would half filtering throughput without the extra TF, you only need one address anyway.

That's why G80 didn't need another full featured TA to achieve all these things.

As for the other "flaw", I'm sure you can figure out how the 4 texels needed for a bilinear fetch are related. ;)
 
Ok. So TA does not include the actual calculation of the memory address of a texel (which is at least for npot textures a bit more complex than just a few shifts). That's part of texture fetch (and actually the R600 article had that listed separate, fetch 16 texels to feed into TF, and 4 unfiltered). So in that case a G80/G92 texture unit can fetch 32 texels (which it all feeds into TF). Or maybe you could say for G80/G92, texture fetch (including figuring out where the texels are in memory) is part of TF.
 
If I understand it correctly the "figuring out where the texels are in memory" is up to the TA. The additional texels needed for trilinear/AF are then retrieved using much simpler calculations based on the initial base address. It helps that those texels are probably in cache anyway hence the "free trilinear/AF".
 
If I understand it correctly the "figuring out where the texels are in memory" is up to the TA. The additional texels needed for trilinear/AF are then retrieved using much simpler calculations based on the initial base address. It helps that those texels are probably in cache anyway hence the "free trilinear/AF".

I thought the "free trilinear" was a function of having twice the TFs (G80).
 
Ok. So TA does not include the actual calculation of the memory address of a texel (which is at least for npot textures a bit more complex than just a few shifts).
They do include it, but it's not the only thing a TA does.

For trilinear mipmapping, deriving the second memory location from the first is easy. Anisotropic filtering address changes per sample simply involve a couple additions for every sample beyond the first. You have to worry about tiling modes across the edge of a texture, but if that was expensive then G80 could simply avoid using the second TF in those rare situations.

npot textures aren't mipmapped, so you don't need a second memory address calculation. Who knows if npot supports AF, and if they do, G80 could again just ignore the second TF because you never see this in games.
 
I thought the "free trilinear" was a function of having twice the TFs (G80).

It is. I was merely pointing out that the extra texels used by the "spare" TF are probably already present in cache any so there's no additional bandwidth hit. The 1:1 TA:TF ratio probably has better bandwidth usage though as there are more independent memory requests in flight for the scheduler/optimizer to work with.
 
They do include it, but it's not the only thing a TA does.

For trilinear mipmapping, deriving the second memory location from the first is easy. Anisotropic filtering address changes per sample simply involve a couple additions for every sample beyond the first. You have to worry about tiling modes across the edge of a texture, but if that was expensive then G80 could simply avoid using the second TF in those rare situations.
Ok. I think I'll just think of it that there are indeed a lot more addresses involved, but since they are "easy" to calculate they aren't mentioned anywhere explicitly (and only the 4 or 8 fully independent ones are listed for the TA unit)
npot textures aren't mipmapped, so you don't need a second memory address calculation. Who knows if npot supports AF, and if they do, G80 could again just ignore the second TF because you never see this in games.
Are npot textures still not mipmapped in DX10? In OGL they sure are. Full orthogonality wrt to npot textures (wrap modes, mipmapping etc.). As of ARB_texture_npot, which is a OGL 2.0 feature (and yes I know that there's a lot of hw out there which can't do it, but the driver announces OGL 2.0 and desperately tries to hack up that feature somehow). There's NO difference between pot and npot textures from an application point of view any longer.
 
Are npot textures still not mipmapped in DX10? In OGL they sure are. Full orthogonality wrt to npot textures (wrap modes, mipmapping etc.). As of ARB_texture_npot, which is a OGL 2.0 feature (and yes I know that there's a lot of hw out there which can't do it, but the driver announces OGL 2.0 and desperately tries to hack up that feature somehow). There's NO difference between pot and npot textures from an application point of view any longer.
Really? How is that possible? Are the mipmap dimensions sized to be rounded up from the parent texture? Then how do you fill it?

If you used a repeat mode, what happens with filtering at the boundary when a lower mipmap is used?

EDIT: nevermind, googled ARB_texture_non_power_of_two and read the spec. You're right, that does make things more complicated. Anyway, if the address calculations for second TF are significant, then G80 probably just doesn't use it. npot+filtering is very rarely needed.
 
Last edited by a moderator:
Back
Top