How does trilinear filtering effect the NV2A/NV20 hardware?

XBox's graphics processor, the NV2A aka XGPU, has a total of 8 TMUs.
2 per pixel pipeline, with 4 pipelines. it's the same configuration as the GeForce2 GTS, GF2Ultra, all GeForce3 and all GeForce4 Ti chips. the
XGPU is basicly a souped up GeForce3 or just short of the GeForce4 Ti.
 
8 TMU's .

As for the trilinear, there isn't a simple answer to this. I've tried to explain it previously, hunt around the technology forum and here, suffice to say that it isn't as simplistic as most internet explanations.

Assuming 0 LOD bias trilinear is in most (not all) cases free.

There are a number of limitations that can effect this, it's all really about cache architecture, not TMU's.
 
Trilinear filtering is not free on XBox AFAIK. same with GF3 and GF4Ti.
bilinear filtering is free, but it takes twice the fillrate to do Trilinear, I believe. there might be some situations where Trilinear is free.

the GameCube's Flipper gets Trilinear for free with some types of textures.
 
bilinear filtering is free, but it takes twice the fillrate to do Trilinear, I believe. there might be some situations where Trilinear is free.

This is not true.......
Both in raw fillrate tests and in game trilinear is basically free, with some exceptions.

This assumes that the texels are available in the cache, it is unlikely that a 32 bit texture would run at full speed in trilinear mode (unless it was small), but it has nothing to do with the TMU's it has to do with texture read bandwidth and cache design.
 
can you come up with a mock scenerio chart for fillrate effects for 1 - 8 textures for the Xbox and GC. I would like to see how their fillrates woud be effected.
 
can you come up with a mock scenerio chart for fillrate effects for 1 - 8 textures for the Xbox and GC. I would like to see how their fillrates woud be effected.

Not without causing a flame war....

I'm sure someone else can provide theoretical maximums, the Xbox numbers will be somewhat below the theoretical maximums, especially the 1/2/5/6 texture cases.

This is a relatively meaningless metric out of context anyway, on real models with real shaders you are always partly vertex limited, partly bandwidth limited and partly fill limited. Exactly how that breaks down varies with triangle size, shader complexity and the hardware in question. A model that is entirely vertex bound on GC could well be entirely fill limited on Xbox.

Attempting to analyze performance based on any single metric is a waste of time. It's useful from a developer standpoint to know what the maximums are, but only as a point of reference.[/quote]
 
Heh, it's quite a shame that you can't divulge information without th' fear of starting a flame war. *sigh*
 
I would love to but I don't know. I will see if i can figure it out though if i get some time tonight between writing my final speech.
 
how well does the NV2A or NV20 handle texture read/writes?

On average how many clock cycles does it take to pass a trinlinear texture?

how many clock cycles does it take to read from ram and write?

how does latency play into this?

If it takes 8 sample texels to pass 1 trilinear textures why is it possible for the GC to take no hit from trilinear filtered texels?
 
Btw i have read a couple of post´s here about the xbox texture-cache, it seems many thinks it 128-256kb, going pretty rough out of my head when i was talking too a gamedeveloper.
It was in mars this year and we come upp with the subject when i asked him what he thought about the GC´s ram on-chip.
His "clear" opinion was 4kb, don´t that sound little??
Does someone here know?
 
When working with Audio how much bandwidth does streaming to the MCPX usualy take? How much externala bandwidth is generally used?
 
how well does the NV2A or NV20 handle texture read/writes

I have no idea what sort of answer your looking for.
The texture cache at least appears to me to be very efficient, NVidia do not publically disclose the exact function of the cache and I won't discus it here. If you had to pick a single architectural feature that had the largest impact on GPU performance (all other thibgs being equal) the texture cahce is probably it.

On average how many clock cycles does it take to pass a trinlinear texture?

I'm unclear if the question is how many clock cycles does trilinear take, in which case it's 1 as mentioned above it's basically free. Or if your asking how many clocks a pixel spends in the pipeline, in which case the answer is I don't know other than a lot (of the order of hundreds).

how does latency play into this?

The nice thing about 3D graphics is that memory accesses are very predictable, as a result memory latency can be almost entirely hidden in the pipeline. The one big exception to this is when you do a defered texture read with what random texture positions.


If it takes 8 sample texels to pass 1 trilinear textures why is it possible for the GC to take no hit from trilinear filtered texels?

If your cache was architected with enough output bandwidth, it wouldn't matter if it took 800 texels.....
It's a design decision the designer picks what he's optimising for and the extra resources are just wasted on simpler pixels.

When working with Audio how much bandwidth does streaming to the MCPX usualy take? How much externala bandwidth is generally used?

Clearly it depends on the number of channels in use, the sampling frequencies and the format.
reading 1 channel of raw stereo 44KHz audio takes 44000*4 bytes/sec 176Kbytes/second.
So to read 256 channels would take 256*176K = about 45Mbytes/second or not very much of the total available bandwith.
Now the MCP does a bit more than this, there are intermediate buffers and positional samples won't be stereo, but from a bandwidth standpoint, it probably isn't a major contributor.

Now what worries me is that these questions sound a lot like your trying to build some funny math to argue some point with someone, do your self and everyone else a favor, if this is your intention forget about it. There is no magical mathematic formula that will give you a number that will reflect real world performance.
 
overclocked said:
Btw i have read a couple of post´s here about the xbox texture-cache, it seems many thinks it 128-256kb, going pretty rough out of my head when i was talking too a gamedeveloper.
It was in mars this year and we come upp with the subject when i asked him what he thought about the GC´s ram on-chip.
His "clear" opinion was 4kb, don´t that sound little??
Does someone here know?

I'm not sure who you were talking to, or whether you just misunderstood, but this statement is pretty much entirely inaccurate.
 
I'm sure someone else can provide theoretical maximums, the Xbox numbers will be somewhat below the theoretical maximums, especially the 1/2/5/6 texture cases.

This is a relatively meaningless metric out of context anyway, on real models with real shaders you are always partly vertex limited, partly bandwidth limited and partly fill limited. Exactly how that breaks down varies with triangle size, shader complexity and the hardware in question. A model that is entirely vertex bound on GC could well be entirely fill limited on Xbox.

Attempting to analyze performance based on any single metric is a waste of time. It's useful from a developer standpoint to know what the maximums are, but only as a point of reference.

What are the shaderlimitations as you see them and what are the things we can expect from future games like halo2, that improves much.
I know there´s a lot of the art-direction involved but consider god art-direction and optimization of the Xbox.
 
ERP what do NV2X hardware do with their internal caches?

How if at all does the P3 733 within the xbox limit its performance?

performance wise how does the mobile P3 version of the 733 compare to the based socket P3?

how does a socket P3 733 compare to the slot p3 733? are there major or minor latency issue?

does the P3 733 support double percission integers?
 
ERP what do NV2X hardware do with their internal caches?

Cache stuff :p

There are a lot of caches and fifos of varying size throughout the chip, they're there to hide latency and prevent redundant reads/writes to main memory. Only NVidia know all the details, but a number are documented.

.... Uninteresting questions obout P3 .....

Can't compare it to other P3's, since it's the only one I've ever used, I've done sod all PC work. To me it was always just plain fast, but I'd previously been working on an N64 where an fp multiply takes 5 cycles and your clockrate is <100MHz.

In terms of bottlenecks, yes it CAN be a big one, but it needn't be. I've said before the Xbox is a machine that it is easy to get poor performance out of.

Like every processor I've used in recent memory (with the partial exception of GC) it's performance is gated largely by external memory accesses (latency), the less you do the faster it runs. If you try to be too clever with your graphics engine you will be CPU limited.

XBox games should be GPU limited not CPU limited, it can be less than trivial to get to this point, there are some non-obvious tradeoffs that need to be made especially when high polygon counts are involved. Even significantly increasing the speed of the processor probably wouldn't have made this much easier to be honest, though it would have been nice from a game logic standpoint.
 
Back
Top