TMU Asymmetry

I've taken note of a trend among some GPU's (e.g. Playstation 3's RSX) to use more than one texture filtering unit for every texture addressing unit. Do such lopsided ratios really offer "free" anisotropy as the vendors like to claim, and are they completely wasted when high quality filtering is not utilized?
 
That trend has reversed since a very long time ago actually. As for RSX I'm not sure the ratio is what you think it is. For filtered texture accesses, the number of filtering and addressing units is the same, i.e. 24 pixels/clock for bilinear, 12 pixels/clock for trilinear or 2xAF with bilinear, etc... IIRC it's point sampling that can run faster.

G80 had faster texture filtering than texture addressing but that already got reduced in G92 and then they became genuinely the same rate (i.e. 1/2 rate trilinear) on GT200. It hasn't really changed for NVIDIA since then in terms of filtered texturing (unfiltered has varied a bit over time for both NVIDIA and AMD but iirc at the moment only AMD has a difference between the two: 32bpp point is full rate, 32bpp bilinear is full rate, 64bpp point is full rate, but 64bpp bilinear is 1/2 rate or at least was).

Was it a good idea at the time for e.g. G80? I'm not sure. It was nice in reviews to make 16xAF really cheap and really high quality compared to DX9 chips, I remember myself and others being excited about that at the time. But things have moved on and I'm not sure it matters as much, especially as there's more and more post-processing which will never use more than bilinear or point filtering. So for most of the frame you're not using trilinear or anisotropic and the extra filtering units would just be filtering hardware. There are probably other reasons I'm not thinking of why it doesn't make much sense nowadays, but hopefully this is enough to satisfy your curiosity... :)
 
On the other hand, console games tend to forego AF a little too often (bandwidth problem?).
 
That trend has reversed since a very long time ago actually. As for RSX I'm not sure the ratio is what you think it is. For filtered texture accesses, the number of filtering and addressing units is the same, i.e. 24 pixels/clock for bilinear, 12 pixels/clock for trilinear or 2xAF with bilinear, etc... IIRC it's point sampling that can run faster.

G80 had faster texture filtering than texture addressing but that already got reduced in G92 and then they became genuinely the same rate (i.e. 1/2 rate trilinear) on GT200. It hasn't really changed for NVIDIA since then in terms of filtered texturing (unfiltered has varied a bit over time for both NVIDIA and AMD but iirc at the moment only AMD has a difference between the two: 32bpp point is full rate, 32bpp bilinear is full rate, 64bpp point is full rate, but 64bpp bilinear is 1/2 rate or at least was).

Was it a good idea at the time for e.g. G80? I'm not sure. It was nice in reviews to make 16xAF really cheap and really high quality compared to DX9 chips, I remember myself and others being excited about that at the time. But things have moved on and I'm not sure it matters as much, especially as there's more and more post-processing which will never use more than bilinear or point filtering. So for most of the frame you're not using trilinear or anisotropic and the extra filtering units would just be filtering hardware. There are probably other reasons I'm not thinking of why it doesn't make much sense nowadays, but hopefully this is enough to satisfy your curiosity... :)
Thanks for the info. Yeah, my curiosity was partly sparked by the discrepancy in ratios between G80 and G92, wondering what the implications were. It's interesting how even the 8800 GTS 512 can best the GTX in some tests, but I assume that mostly comes down to the former's higher shader core clock. I've scoured the internet for benchmarks that proved to be TMU bound, but I haven't succeeded yet.
 
Well, with Adressing (32) vs. Filtering (64) parts on TMU, less Adressing vs. Filtering means peak filtering speed for blinear may be limited (?).
Either way, anandtech did nice table with difference between G80, G92, GT200 and GF100 :
TMU.png
Source : LINK (close to end of page)

I wish there was similar table done for Kepler, Maxwell, Pascal and Turing...
Also, since NV made TMUs = "Adressing units" for Fermi, shouldn't this be done for G80 as well (ie. full G80 should have 32TMUs, and not 64) ?
 
Last edited:
Also, since NV made TMUs = "Adressing units" for Fermi, shouldn't this be done for G80 as well (ie. full G80 should have 32TMUs, and not 64) ?
Shh, don't embarrass GTX even further over its intimidatingly rivaling GTS "apprentice".
Or is it the other way around, 9800 GTX shamefully struggling to categorically outperform the 8800 GTX? Yeah, that is definitely the dunce cap for the former GPU.
 
Back
Top