RV870 texture filtering

I'm still a bit confused on this as well. In the context of some of the reviews it almost sounded like all kinds of values are now interpolated on the SPs, courtesy of the new instructions, including both vertex attributes and texture samples. After reading this discussion though, it sounds like only vertex attributes are interpolated by the shader core (which was my initial impression) and texture filtering is still fixed function. Is that correct?
Yes of course that is correct.

If you would do bilinear filtering in the ALUs they would also need to do the rest of the filtering (including AF) and there were no TMUs ;)

Doing the vertex value interpolation in the ALUs is a wise design decition - more performance/die space.

G80 and GT200 used separated mini-SPs for this work. They are located in the shader core, but not used for general shading. E.g. G80 had 128 SPs for general shading and 128 mini-SPs for interpolation and special functions.
AFAIK GT200 actually can use the MUL if it is not used for interpolation or the SFU is used for transcendentals. Also I've seen an NV patent that claims that there are many transistors shared beetween the circuits for interpolation and transcendentals.
 
Last edited by a moderator:
The irony in all this is that where the filtering happens is NOT as important as that other number, 272 billion unfiltered 32-bit fetches per sec, under the assumption that the programmer can now bypass the filtering bottleneck using unfiltered texture fetches and directly hit L1 (general purpose vector gather).

This could be quite useful in CS work such as in-game full screen post processing, etc!

So with all the confusion on R800 details, this is one I'd like to see profiled!

There are multiple possibilities here such as,

(a.) That number is bogus and cannot be realized using point sample texture fetches. In which case the number simply refers to sample capacity in the TU's themselves.

(b.) That number represents fetch4/gather4 performance only. Must access a 2x2 texel quad. Seems possible if filtering is in the TUs. Possibly would NOT require wider TU to SIMD core interconnect if didn't expand texels to floats prior to the transfer (under the assumption filtered texels are always expanded to floats as a result of texture filter interpolation).

(c.) That number represents peek performance of single 32-bit sample point sampled texture fetch, which would be unlikely.
 
(c.) That number represents peek performance of single 32-bit sample point sampled texture fetch, which would be unlikely.
I dunno ... a banked cache with 4 independently addressed 32 bit banks/ports kinda makes sense for a texture cache, and it would behave like that if you exposed that addressing to the shaders.
 
This
(b.) That number represents fetch4/gather4 performance only. Must access a 2x2 texel quad. Seems possible if filtering is in the TUs. Possibly would NOT require wider TU to SIMD core interconnect if didn't expand texels to floats prior to the transfer (under the assumption filtered texels are always expanded to floats as a result of texture filter interpolation).
is how prunedtree's matrix multiplication is able to produce >1TFLOPS single-precision matrix multiply on HD4890:

http://forum.beyond3d.com/showthread.php?t=54842

So 16 fp32s as a quad makes 64 bytes, which is a nice size for a cache line.

Now the intrigue is what's the performance like on HD5870...

Jawed
 
(c.) That number represents peek performance of single 32-bit sample point sampled texture fetch, which would be unlikely.
In order for that to work, you would need:
Code:
texld r0, r0, s0
texld r1, r1, s0
texld r2, r2, s0
texld r3, r3, s0
to work as fast as:
Code:
texld r4, r4, s1
where s0 is point-sampled and s1 is bilinear filtered. This isn't reasonable in the general case. Gather/Fetch4 works because we already get the 4 point samples when we do bilinear filtering so all that needs to be done is return the raw values instead.

Note that DX11 Gather specifies some things. For one, you specify what channel you want returned. For example, say you have a 32-bit ARGB surface and you want the 4 alpha values, then you would select that alpha channel for the Gather. In other words, it will take multiple instructions to get more than one channel.

Also remember, Gather takes a single address as input.
 
Ati 5870 Filtering

This link appears to show that the 5870 is still using some kind of optimisation in it's textures filtering, the filter tester shows a perfect implementation of AF filtering, but there are comparison videos further down that show shimmering on actual 'gamelike' textures? They mention underfiltering in the review but the translation isn't the easiest to understand.

I thought the 5870 was supposed to offer perfect AF and supposed to be doing Trilinear all the time by default? Is this simply a bug? Or is it another case of over enthusiastic optimisations?
 
Interesting.

I didn't know Novum wrote a flat plane version too ... I wonder why they didn't use his earlier pipe version (it could test both depth translation and rotation, although I guess you could just add a rotation to the flat plane version as well).
 
Last edited by a moderator:
This isn't looking good, here's another review showing the same problem and coming to the conclusion that the AF quality of the 5870 is equal to a Quality setting on a current Geforce. Not even High Quality. What one earth is the point of having near perfect AF if you then go and fudge it up by screwing with the mip blending?
 
This isn't looking good, here's another review showing the same problem and coming to the conclusion that the AF quality of the 5870 is equal to a Quality setting on a current Geforce. Not even High Quality. What one earth is the point of having near perfect AF if you then go and fudge it up by screwing with the mip blending?
It doesn't have near perfect AF, the shimering is caused by AF undersampling, not (as much) by mipmap filtering.
The HD5870 was never supposed to have perfect AF, angle dependency is far less important for near perfect AF than lack of texture shimmering (caused by lack of adequate number of AF samples). It was just that far too many reviewers drew hasty conclusions from faulty logic. From these facts:

'Angle dependencies' causes 'blurring at some angels'
'AF undersampling' causes 'texture shimmering'


They basically drew the faulty conclusion that:

no 'Angle dependencies' means no 'texture shimmering'


They really should know better, because this is really basic stuff, but unfortunately far too many reviewers don't know what they are talking about. Good to see that some that some do.
 
In fact, shimmering seems to be slightly better than on RV770. Not as good as on G80/GT200 at maximum quality, but it's a step in the right direction.
 
I think it was that one, and yes the test in general is quite math intensive, however we aren't exactly short on Math power, so in previous gen this test is interpolator limited.

Sorry to dig this up again, but just today I've stumbled upon this older test (of ours):
http://www.pcgameshardware.com/aid,...w-AMD-graphics-card-reviewed/Reviews/?page=12

The HD 4670 performed abnormally slow in Vantages Perlin Noise Test - but i certainly couldn't be interpolator limited, could it?
 
Back
Top