Hellbinder says full trilinear would cost 50% preformance?

bloodbob

Trollipop
Veteran
Hellbinder is the one in the know for ATI he always has the scoop and here is what he had to say.
Originally posted by Hellbinder
Thats not accurate. Becuase ATi with their optimizations can run at 16X AF with virtually no loss of speed. Where the GF4 is only doing 8x.

That means that the Primary Texture is pushed so far back into the scene before the first mip-map is even generated that you almost dont need to use it. 16x AF is a whole different ball game.

Scene Complexity per Frame is growing at an astronomical rate,, especially in the comming year. Cards Simply will not have the ability to do Full Trillinear on an entire Frame without at least a 50% hit in performance. Thats just the bottom line.


Well I guess this is probably why ATI won't give us a full trilinear option in the control panel.
 
While I could easily be quite wrong, I feel it would only be the case that moving from single cycle* to dual cycle trilinear would cost you 50% of your performance if you were constantly texture lookup/filtering limited. If you were on the other hand CPU/ system/ video bandwidth/ transform/ fillrate/ pixel shader ops bound that would not be the case, and I assume those scenarios are more likely.

*This is assuming that the ATI "trylinear" is single cycle which has to my knowledge not yet been confirmed/denied.

nota bene: What exactly does Hellbinder mean by the fact that you don't need even the first mipmap? If you have a 256x256 texture on a wall and the wall occupies a ~128x~128 area in screen space there would have to be a smaller mipmap to prevent undersampling and aliasing.

I also don't understand why he says trilinear will become more of a burden in the future. I would figure that with arithmetic operations and non-color texture lookups (shadow maps, etc.) taking up more and more computational resources that applying trilinear on traditional color textures would be (in percentage terms) actually less of a burden.
 
In my understanding if you decide to implement fast anisotropic sampling you are fetching many texture samples from the l1 cache.. and you can easily fetch enough samples to approximate the lower level texture...
After all aniso in one direction is approximating a stretched box filter - this is just generalising it :)

Just like CPU's the cost of accessing memory in a non linear manner ( unlike FB accesses ) has become extremely prohibitive, and extra calculations are more economical in terms of time. ( in the same way as switching from maths tables to actually calculting series in games.. )
 
Re: Hellbinder says full trilinear would cost 50% preformanc

bloodbob said:
Hellbinder is the one in the know for ATI he always has the scoop and here is what he had to say.
Originally posted by Hellbinder
Thats not accurate. Becuase ATi with their optimizations can run at 16X AF with virtually no loss of speed. Where the GF4 is only doing 8x.

That means that the Primary Texture is pushed so far back into the scene before the first mip-map is even generated that you almost dont need to use it. 16x AF is a whole different ball game.

Scene Complexity per Frame is growing at an astronomical rate,, especially in the comming year. Cards Simply will not have the ability to do Full Trillinear on an entire Frame without at least a 50% hit in performance. Thats just the bottom line.
That's just blatantly false. The performance hit certainly is not going to exceed 50% for full trilinear, but it can get close to that. It really depends upon where the limits are. If the program is almost entirely fillrate-limited, then the performance hit can get close to 50%. If it is mostly memory bandwidth-limited, the performance hit will be much less than that (as an example, the most highly bandwidth-limited GPU ever was the GeForce2 GTS: it had essentially free trilinear).

This isn't an entirely different scenario from any previous GPU. It's just that current GPU's tend to be more fillrate-limited.

Edit: By the way, with current GPUs, the answer to getting full anisotropic/trilinear without a performance hit would be to hide the latency involved in filtering the texture via long pixel shaders. It may not be possible today to do this, but I doubt it'll be more than 1-2 generations.
 
Re: Hellbinder says full trilinear would cost 50% preformanc

Chalnoth said:
Edit: By the way, with current GPUs, the answer to getting full anisotropic/trilinear without a performance hit would be to hide the latency involved in filtering the texture via long pixel shaders. It may not be possible today to do this, but I doubt it'll be more than 1-2 generations.


What makes you think this isn't the case already? Run any test that has long pixel shaders with more ALU instructions than texture instructions and turn AF on and check the perf difference...if it isnt free you have a sucky card..return it :)
 
Crazyace said:
Just like CPU's the cost of accessing memory in a non linear manner ( unlike FB accesses ) has become extremely prohibitive, and extra calculations are more economical in terms of time. ( in the same way as switching from maths tables to actually calculting series in games.. )
Memory bandwidth is a small part of the cost of texture filtering.
 
Chalnoth said:
Memory bandwidth is a small part of the cost of texture filtering.
Well I think back the GF2 the memory bandwidth was 100% of the preformance hit as Trilinear was a completely 1 pass fix function.
 
Edit: By the way, with current GPUs, the answer to getting full anisotropic/trilinear without a performance hit would be to hide the latency involved in filtering the texture via long pixel shaders. It may not be possible today to do this, but I doubt it'll be more than 1-2 generations.

That would be the "cheapest" sollution in terms of hardware cost. I'd rather prefer a rather more costly approach and get rid of any angle-dependancy all along.
 
Isnt the Geforce 4's Multi texture fill rate effectively halved when Anistropic Filtering is enabled? Because the Second TMU does not work In Direct3d? That was my simplistic understanding of it.

Changing a Geforce 4 Ti 4200 from 1000/2000 to about 600/600 in fill rates?

HBs conclusion doesnt make sense if you take that into account no?

I mean having your fill rate halved could effectively explain where he got his 50% conclusion?
 
ChrisRay said:
Isnt the Geforce 4's Multi texture fill rate effectively halved when Anistropic Filtering is enabled? Because the Second TMU does not work In Direct3d? That was my simplistic understanding of it.
IIRC the NV25 cannot use the second TMU while bilinear AF is in use. When you're using trilinear AF then the NV25 can use the second TMU. The NV20 hasn't this limitation. This is the reason why a GF3Ti500 was sometimes faster than a GF4Ti.
 
Re: Hellbinder says full trilinear would cost 50% preformanc

bloodbob said:
Well I guess this is probably why ATI won't give us a full trilinear option in the control panel.

Figures I'm getting are more in the region of 2%! (although I suspect that not wholly accurate). However, the higer the level of AF applied, the more you are using samples from just the first mip level in the first place, so the comparative reduction in performance is reduced with AF.
 
Re: Hellbinder says full trilinear would cost 50% preformanc

DaveBaumann said:
bloodbob said:
Well I guess this is probably why ATI won't give us a full trilinear option in the control panel.

Figures I'm getting are more in the region of 2%! (although I suspect that not wholly accurate). However, the higer the level of AF applied, the more you are using samples from just the first mip level in the first place, so the comparative reduction in performance is reduced with AF.

That just prove Hellbinder wrong.
________
Fuck
 
Last edited by a moderator:
Re: Hellbinder says full trilinear would cost 50% preformanc

mikechai said:
DaveBaumann said:
bloodbob said:
Well I guess this is probably why ATI won't give us a full trilinear option in the control panel.

Figures I'm getting are more in the region of 2%! (although I suspect that not wholly accurate). However, the higer the level of AF applied, the more you are using samples from just the first mip level in the first place, so the comparative reduction in performance is reduced with AF.

That just prove Hellbinder wrong.

well if its only 2% they should just put the full option in and still end up beating nvidia in benchmarks .



Personaly this subject is getting way out of hand .
 
Re: Hellbinder says full trilinear would cost 50% preformanc

DaveBaumann said:
Figures I'm getting are more in the region of 2%! (although I suspect that not wholly accurate). However, the higer the level of AF applied, the more you are using samples from just the first mip level in the first place, so the comparative reduction in performance is reduced with AF.

Sadly, we do not have the X800 Pro with us anymore, but Dave, could you be so kind and see, how performance changes at default clock speed using UT2003 and the pyramid-Demo?

Our tests with colored mipmaps showed a ~20% performance degradation in a real game scenario but that was with AA enabled* to better keep out CPU and system influences from the benchmark results.

Maybe you could just try and bench 16x12x3 with max. details and 16xAF?
If you're still CPU-Bound there, maybe you can enable some AA as well.

edit:
*and the X800 clocked down to R9800XT's Fillrate and Bandwidth
 
@ quasar
Borsti said:
I received an additional info from Epic:

xxx @ NVIDIA just pointed out that filling textures with a constant color/alpha might
cause more pixels to be written to the framebuffer (passing alpha test) which could
affect performance. Totally forgot about that one and I should probably change the code
to leave the alpha channel alone for our next generation engine :)


This would explain why there´s always a performance hit, even with full bilinear and with NV40 as well.

Lars - THG

http://www.beyond3d.com/forum/viewtopic.php?t=12486&postdays=0&postorder=asc&start=240

wouldnt this mean that results from the ut engine would be kinda inexact or misleading?
 
christoph said:
Quasar said:
... but then, how greatly did this affected performance on our R9800XT?
it may influence the results though...

No it didn't.

Compare the R9800XT results of the normal run and the run with colored mipmaps (aka "speciale" firstcoloredmip 1).
 
Back
Top