Do many xbox titles use trilinear filtering?

Luminescent · Nov 4, 2002

Just wondering if titles, the likes of Halo, Doa 3, splinter cell, shenmue 2, etc. used trilinear texture filtering. I know that the NV2A incurs a performance hit with trilinear (half the bilinear fillrate, unless developers figured how to use both pipeline tmu's in conjunction for filtering?). Do many titles utilize this superior method, and is it plausible on the xbox with acceptable performance?

Can someone who develops on the xbox expound on the reasons behind the trilinear performance penalties on the NV2A?

Thankyou

ERP · Nov 4, 2002

Just wondering if titles, the likes of Halo, Doa 3, splinter cell, shenmue 2, etc. used trilinear texture filtering.

I doubt that there are many XBox games that don't use trilinear.

I know that the NV2A incurs a performance hit with trilinear (half the bilinear fillrate, unless developers figured how to use both pipeline tmu's in conjunction for filtering?). Do many titles utilize this superior method, and is it plausible on the xbox with acceptable performance?

Your information is incorrect. Trilinear is for all intensive purposes free on NV2A, in our product one of our debug modes allowed you to force various texture modes overiding the artist specified default. There was no measurable difference in performance between Triliniear and point sampling.

You do take a penalty when you enable aniso, the actual penalty is very dependant on the max aniso level and the scene. There is also a similar penalty for forcing negative mipmap Bias.

Luminescent · Nov 4, 2002

Thankyou for your reply ERP, I guess I got aniso and trilinear performance mixed-up.

Panajev2001a · Nov 4, 2002

ERP, sure there is no perfomance hit with tri-linear ( BTW, Shenmue II is not using tri-linear, it really look again like anisotropic filterign + bi-linear ) ?

Are you saying it can pump out ( 250 MHz clock ) 2 GTexels/s with tri-linear ? AFAIK, didn't it do 2 GTexels/s with bi-linear and 1 GTexels/s with tri-linear ?

I'm not saying anything you do not know, but having ZERO hit when you are requesting 2x the number of texels per pixel ( for tri-linear ) and you only have two TMUs per pipe it is tough to mantain the same speed...

Unless each TMU in the 4 NV2A pipes can do a single texel tri-linear filtered I cannot see how the performance of tri-linear is THE same as bi-linear... 2 textures per cycle AND tri-linear filtering... ? uhm...

ERP · Nov 4, 2002

Are you saying it can pump out ( 250 MHz clock ) 2 GTexels/s with tri-linear ? AFAIK, didn't it do 2 GTexels/s with bi-linear and 1 GTexels/s with tri-linear ?

I the real world (bandwidth constrained and all) it will never pump out 2gigatexels. In benchmarks it hits about 1.3-1.4 Gtexels peak, with or without trilinear.

I'm not saying anything you do not know, but having ZERO hit when you are requesting 2x the number of texels per pixel ( for tri-linear ) and you only have two TMUs per pipe it is tough to mantain the same speed...

The cache is optimised for the trilinear case. It's read from bandwidth is such that it can supply enough texels for trilinear in the majority of cases. Yes "majority" implies that there are cases where trilinear takes longer, and my understanding is that this is the case, however I have not observed a significant cost when enabling trilinear. Bear in mind though we used almost exclusively DXTC textures and that reduces texture bandwidth significantly.

Unless each TMU in the 4 NV2A pipes can do a single texel tri-linear filtered I cannot see how the performance of tri-linear is THE same as bi-linear... 2 textures per cycle AND tri-linear filtering... ? uhm...

This is basic misconception, your understanding of how the pipelines work is flawed, it's based on simplistic explanations that float around on the internet. The resources are not all seperate, the pipelines are tied together and the spacial coherency implied by this is exploited.
The limitation isn't the ability to compute the filtering it's the bandwidth required to the read the source texels from the cache. And as I mentioned above this limit such that in the majority of cases trilinear takes 1 cycle.
The exploitation of the coherency is probably why setting a negative MipMap bias is so expensive, it increases the size of the sample mask for the pipeline exceeding the maximum read rate from the cache in a larger percentage of cases.

Panajev2001a · Nov 4, 2002

This is basic misconception, your understanding of how the pipelines work is flawed, it's based on simplistic explanations that float around on the internet. The resources are not all seperate, the pipelines are tied together and the spacial coherency implied by this is exploited.
The limitation isn't the ability to compute the filtering it's the bandwidth required to the read the source texels from the cache. And as I mentioned above this limit such that in the majority of cases trilinear takes 1 cycle.
The exploitation of the coherency is probably why setting a negative MipMap bias is so expensive, it increases the size of the sample mask for the pipeline exceeding the maximum read rate from the cache in a larger percentage of cases.

well I never read detailed specs and user manuals of the NV2x family so yes my ideas might be wrong... and I'm glad you take the time of schooling me on this

I remember Korval ( Tryarch ) talking about how it was the cache bandwidth limiting you from doing tri-linear at the same speed as bi-linear ( 2 textures/pixels in one cycle ) now that I think about it...

Even reading your post you "seem" to make it clear that thos benchmarks were stressing the GPU a bit with bi-linear filtering on and the switch to tri-linear might not have dropped performance a lot "maybe" because the limitation was elsewhere... I'm sorry to not a bit of an angered tone ( as it might be after the 100th time you repeat things you have coded yourself and seen in the XDK to people who do not have that info..

) and I can understand why... still I won't mind it since you are providing a good explanation so far...

Steve Dave Part Deux · Nov 4, 2002

Korval was talking about the Geforce3's tri-linear performance. It was assumed at the time that the cache architecture of the GF3 and NV2A were nearly identical.

Panajev2001a · Nov 4, 2002

so they are not ?

strange... that would put the NV2A texture performance 2x as high as Flipper's even with tri-linear... in the real world having e-DRAM on FLipper will sustain the fill-rate a bit better and they NV2A's advantage will be much smaller ( and in some cases identical )

ERP · Nov 5, 2002

Panajev2001a,

Firstly I want to say that a lot of what people believe about NV2X is based largely on benchmarks run on PC's with GF3/4's, the problem is that it is difficult on a PC to understand what's happening. Between the machinations of the Driver and the DirectX runtime it's pretty much impossible to know what your measuring, it's even difficult to know if your CPU or GPU bound.

XBox DX is a very thin interface, I can see all the interupts, and I can setup the buffers so I know their won't be stalls. The net result is that on Xbox I can measure CPU and GPU usage seperately and accurately. So in general if I'm talking about graphics performance I will not give CPU limited results.

OK I don't have anywhere I can currently rerun the tests so these results are from my memory.

Test involves trivially filling the screen repeatedly with large polygons Z set to LE source textures compressed.

1 Texture bi or tri just over 700 MPixels
2 Textures bi or tri somewhere between 600-700 MPixels 1.2->1.4 Mtexels

My in game tests show no noticable difference (read <2%) in GPU performance between Trilinear and Point sampling, and performance in this case is largely limited by fillrate. If you search in the main forum I posted some in game Aniso costs along with MIP bias costs aswell, they're not all encompassing but they give a general indication of cost.

Yes for all intensive purposes NV2A will out fill flipper if you just fill the screen over and over. Flipper seems to be somewhat less sensitive to small tri's (i.e. it doesn't seem to slow down as much) but it's hard to do direct comparisons because of the large discrepency in T&L performance.

FWIW I have never seen a resonable situation where Flipper out performs NV2A. Filling the screen over and over with transparent textures might tip the balance, but I've never done the test to verify it.

I'm not trying to slam flipper here, GC is a really nice platform to work on, but Flipper just isn't going to win many benchmarks against NV2A.

Sorry this is one of those misconceptions that annoys me for some reason :/

The other one is people assuming that NV2A is limited by framebuffer memory bandwidth. IME this is hardly ever the case. Once we were no longer CPU bound we found that we actually hit fill rate limits before we ran out of memory bandwidth. We could relatively easilly test this by comparing performance with various AA modes, and looking at the performance difference.

Panajev2001a · Nov 5, 2002

thanks a lot ERP... your post as very interesting

I guess I'll have to do more research on the NV2A, I like learning

( and do not worry, I do understand how certain things might annoy people, I just wanted to let you know it wasn't intentional ).

It doesn't surprise me that you found Flipper to have those kind of performance... even reaching or remaining close to NV2A is a success for Flipper given the clock speed disavantage ( what would you have thought of a 202 MHz Flipper ? ) and the fact Flipper had a bit smaller budget than NV2A...

Something that suprised me of Flipper was the T&L unit... assuming we mostly stick to static meshes and DX7 class features, it does perform quite well...

If you see how you add one, two textures and then one and two lights ( local not only infinite ones ) and see how the performance is still quite high... that did impress me

megadrive0088 · Nov 5, 2002

I sometimes wonder how close Flipper and XGPU would have been if they had been their full original specification. XGPU - 300 Mhz NV25 and
Flipper - 202.5 with more on die 1T-SRAM.

Perhaps any eventual "Flipper2" and "XGPU2" are alot closer in performance in 2006. Actually, I do believe the differences will be almost insignificant by then, to all except useless benchmarks.

I know this is slightly off-topic, I'll say it anyway. All I am really hoping for in the next generation of consoles, is television-show quality prerendered graphics in real-time games. any of you believe that is possible in the timeframe we are talking about, of 4-5 years?

Goldni · Nov 5, 2002

Anyone know what ever happened to Korval? He really knew his stuff. After he spanked that Deadmeat guy that was the last i heard of him..well when PGC Forums went freebie ads and all.

BoddoZerg · Nov 5, 2002

If NV2A is mostly fillrate limited, then why don't we see Antialiasing in more Xbox games?

Also, does the NV2A take the same humongous fillrate hit from anisotropic filtering that the GeForce4 suffers?

Tagrineth · Nov 5, 2002

BoddoZerg said:
If NV2A is mostly fillrate limited, then why don't we see Antialiasing in more Xbox games?

Well obviously if fill-rate is the limiting factor, that means there IS no extra fill-rate left for AA - it's all being used already!

(EDIT = Abject Stupidity Removed To Save Face)

Geeforcer · Nov 5, 2002

Tagrineth said:
BoddoZerg said:

If NV2A is mostly fillrate limited, then why don't we see Antialiasing in more Xbox games?

Click to expand...

Well obviously if fill-rate is the limiting factor, that means there IS no extra fill-rate left for AA - it's all being used already!

Multisampling is largely fillrate-free.

ERP · Nov 5, 2002

If NV2A is mostly fillrate limited, then why don't we see Antialiasing in more Xbox games?

I've covered this before here, it's still not free.
Given 2x multisampling there is some additional cost incurred because the Z compression doesn't function as effectively. In addition the filter/copy forwards operation is expensive.
The cost isn't necessarilly prohibitive, but it does need to be planned for.

Also, does the NV2A take the same humongous fillrate hit from anisotropic filtering that the GeForce4 suffers?

OK here are my LOD Bias/Aniso benchmarks from our title, posted previously in the other forum.
Times are in ms to complete rendering of the background with various combinations of forced Aniso and Mipmap bias.
Mipmap Bias accross the top/Aniso setting down the side. So the bottom right cell is Mipmap Bias of -3 and Aniso 4 (8x). If your just interested in Aniso then just look at the left hand column.

Code:

                 0     -1   -2   -3 
Linear       5.2 5.4 6.0 6.3 
Aniso 2     5.7 6.2 7.1 7.4 
Aniso 3     6.1 6.8 8.1 8.3 
Aniso 4     6.4 7.4 8.7 8.8

So to help you out 2X = 10% slower 8x = 23% slower. Obviously this would vary with the scene in question. And the real cost is slightly higher since this is the total frame time and includes things like clearing the Screen.

In the actual game we allowed artists to specify aniso level and Mip LOD on a material by material basis, using it where is made a difference, in general the performance cost for using it in this fashion was extremely low.

Legion · Nov 7, 2002

didn't Deadmeat say the flipper was an overclocked version of some card called "alladin7"?

archie4oz · Nov 7, 2002

Legion said:
didn't Deadmeat say the flipper was an overclocked version of some card called "alladin7"?

*Snicker*....

As in the motherboard embedded graphics core of the ALi ALaddin?

Do many xbox titles use trilinear filtering?

Luminescent

ERP

Luminescent

Panajev2001a

ERP

Panajev2001a

Steve Dave Part Deux

Panajev2001a

ERP

Panajev2001a

megadrive0088

Goldni

BoddoZerg

Tagrineth

murr

Geeforcer

Harmlessly Evil

ERP

Legion

archie4oz

ea_spouse is H4WT!

Similar threads