View Full Version : How does trilinear filtering effect the NV2A/NV20 hardware?
How many TMUs does the Xbox have?
How does trilinear filtering effect the Texel fillrate?
megadrive0088
11-Dec-2002, 01:50
XBox's graphics processor, the NV2A aka XGPU, has a total of 8 TMUs.
2 per pixel pipeline, with 4 pipelines. it's the same configuration as the GeForce2 GTS, GF2Ultra, all GeForce3 and all GeForce4 Ti chips. the
XGPU is basicly a souped up GeForce3 or just short of the GeForce4 Ti.
8 TMU's .
As for the trilinear, there isn't a simple answer to this. I've tried to explain it previously, hunt around the technology forum and here, suffice to say that it isn't as simplistic as most internet explanations.
Assuming 0 LOD bias trilinear is in most (not all) cases free.
There are a number of limitations that can effect this, it's all really about cache architecture, not TMU's.
megadrive0088
11-Dec-2002, 01:51
Trilinear filtering is not free on XBox AFAIK. same with GF3 and GF4Ti.
bilinear filtering is free, but it takes twice the fillrate to do Trilinear, I believe. there might be some situations where Trilinear is free.
the GameCube's Flipper gets Trilinear for free with some types of textures.
bilinear filtering is free, but it takes twice the fillrate to do Trilinear, I believe. there might be some situations where Trilinear is free.
This is not true.......
Both in raw fillrate tests and in game trilinear is basically free, with some exceptions.
This assumes that the texels are available in the cache, it is unlikely that a 32 bit texture would run at full speed in trilinear mode (unless it was small), but it has nothing to do with the TMU's it has to do with texture read bandwidth and cache design.
can you come up with a mock scenerio chart for fillrate effects for 1 - 8 textures for the Xbox and GC. I would like to see how their fillrates woud be effected.
can you come up with a mock scenerio chart for fillrate effects for 1 - 8 textures for the Xbox and GC. I would like to see how their fillrates woud be effected.
Not without causing a flame war....
I'm sure someone else can provide theoretical maximums, the Xbox numbers will be somewhat below the theoretical maximums, especially the 1/2/5/6 texture cases.
This is a relatively meaningless metric out of context anyway, on real models with real shaders you are always partly vertex limited, partly bandwidth limited and partly fill limited. Exactly how that breaks down varies with triangle size, shader complexity and the hardware in question. A model that is entirely vertex bound on GC could well be entirely fill limited on Xbox.
Attempting to analyze performance based on any single metric is a waste of time. It's useful from a developer standpoint to know what the maximums are, but only as a point of reference.[/quote]
Heh, it's quite a shame that you can't divulge information without th' fear of starting a flame war. *sigh*
can some one then please provide with the theoretical maximums?
I would love to but I don't know. I will see if i can figure it out though if i get some time tonight between writing my final speech.
ERP why don't you just PM me the info and i will give you my word not to post it anywhere.
how well does the NV2A or NV20 handle texture read/writes?
On average how many clock cycles does it take to pass a trinlinear texture?
how many clock cycles does it take to read from ram and write?
how does latency play into this?
If it takes 8 sample texels to pass 1 trilinear textures why is it possible for the GC to take no hit from trilinear filtered texels?
overclocked
11-Dec-2002, 05:20
Btw i have read a couple of post´s here about the xbox texture-cache, it seems many thinks it 128-256kb, going pretty rough out of my head when i was talking too a gamedeveloper.
It was in mars this year and we come upp with the subject when i asked him what he thought about the GC´s ram on-chip.
His "clear" opinion was 4kb, don´t that sound little??
Does someone here know?
When working with Audio how much bandwidth does streaming to the MCPX usualy take? How much externala bandwidth is generally used?
how well does the NV2A or NV20 handle texture read/writes
I have no idea what sort of answer your looking for.
The texture cache at least appears to me to be very efficient, NVidia do not publically disclose the exact function of the cache and I won't discus it here. If you had to pick a single architectural feature that had the largest impact on GPU performance (all other thibgs being equal) the texture cahce is probably it.
On average how many clock cycles does it take to pass a trinlinear texture?
I'm unclear if the question is how many clock cycles does trilinear take, in which case it's 1 as mentioned above it's basically free. Or if your asking how many clocks a pixel spends in the pipeline, in which case the answer is I don't know other than a lot (of the order of hundreds).
how does latency play into this?
The nice thing about 3D graphics is that memory accesses are very predictable, as a result memory latency can be almost entirely hidden in the pipeline. The one big exception to this is when you do a defered texture read with what random texture positions.
If it takes 8 sample texels to pass 1 trilinear textures why is it possible for the GC to take no hit from trilinear filtered texels?
If your cache was architected with enough output bandwidth, it wouldn't matter if it took 800 texels.....
It's a design decision the designer picks what he's optimising for and the extra resources are just wasted on simpler pixels.
When working with Audio how much bandwidth does streaming to the MCPX usualy take? How much externala bandwidth is generally used?
Clearly it depends on the number of channels in use, the sampling frequencies and the format.
reading 1 channel of raw stereo 44KHz audio takes 44000*4 bytes/sec 176Kbytes/second.
So to read 256 channels would take 256*176K = about 45Mbytes/second or not very much of the total available bandwith.
Now the MCP does a bit more than this, there are intermediate buffers and positional samples won't be stereo, but from a bandwidth standpoint, it probably isn't a major contributor.
Now what worries me is that these questions sound a lot like your trying to build some funny math to argue some point with someone, do your self and everyone else a favor, if this is your intention forget about it. There is no magical mathematic formula that will give you a number that will reflect real world performance.
Btw i have read a couple of post´s here about the xbox texture-cache, it seems many thinks it 128-256kb, going pretty rough out of my head when i was talking too a gamedeveloper.
It was in mars this year and we come upp with the subject when i asked him what he thought about the GC´s ram on-chip.
His "clear" opinion was 4kb, don´t that sound little??
Does someone here know?
I'm not sure who you were talking to, or whether you just misunderstood, but this statement is pretty much entirely inaccurate.
overclocked
11-Dec-2002, 07:54
ERP, i should say that i had taken a couple of beers and a few vodkashot´s so my memory is week.. :wink:
overclocked
11-Dec-2002, 08:23
I'm sure someone else can provide theoretical maximums, the Xbox numbers will be somewhat below the theoretical maximums, especially the 1/2/5/6 texture cases.
This is a relatively meaningless metric out of context anyway, on real models with real shaders you are always partly vertex limited, partly bandwidth limited and partly fill limited. Exactly how that breaks down varies with triangle size, shader complexity and the hardware in question. A model that is entirely vertex bound on GC could well be entirely fill limited on Xbox.
Attempting to analyze performance based on any single metric is a waste of time. It's useful from a developer standpoint to know what the maximums are, but only as a point of reference.
What are the shaderlimitations as you see them and what are the things we can expect from future games like halo2, that improves much.
I know there´s a lot of the art-direction involved but consider god art-direction and optimization of the Xbox.
ERP what do NV2X hardware do with their internal caches?
How if at all does the P3 733 within the xbox limit its performance?
performance wise how does the mobile P3 version of the 733 compare to the based socket P3?
how does a socket P3 733 compare to the slot p3 733? are there major or minor latency issue?
does the P3 733 support double percission integers?
ERP what do NV2X hardware do with their internal caches?
Cache stuff :P
There are a lot of caches and fifos of varying size throughout the chip, they're there to hide latency and prevent redundant reads/writes to main memory. Only NVidia know all the details, but a number are documented.
.... Uninteresting questions obout P3 .....
Can't compare it to other P3's, since it's the only one I've ever used, I've done sod all PC work. To me it was always just plain fast, but I'd previously been working on an N64 where an fp multiply takes 5 cycles and your clockrate is <100MHz.
In terms of bottlenecks, yes it CAN be a big one, but it needn't be. I've said before the Xbox is a machine that it is easy to get poor performance out of.
Like every processor I've used in recent memory (with the partial exception of GC) it's performance is gated largely by external memory accesses (latency), the less you do the faster it runs. If you try to be too clever with your graphics engine you will be CPU limited.
XBox games should be GPU limited not CPU limited, it can be less than trivial to get to this point, there are some non-obvious tradeoffs that need to be made especially when high polygon counts are involved. Even significantly increasing the speed of the processor probably wouldn't have made this much easier to be honest, though it would have been nice from a game logic standpoint.
BenSkywalker
11-Dec-2002, 22:03
Legion-
can some one then please provide with the theoretical maximums?
Single texture layer/ Two/ Three/ Four/ Five/ Six/ Seven/ Eight-
XBox- 932MPixels/ 932Mp/ 466Mp/ 466Mp/ 311Mp/ 311Mp/ 233Mp/ 233Mp
PS2- 1,200MPixels/ 600Mp/ 400Mp/ 300Mp/ 240Mp/ 200Mp/ 171Mp/ 150Mp
GC- 648MPixels/ 324Mp/ 216Mp/ 162Mp/ 130Mp/ 108Mp/ 93Mp/ 81MP
Certain effects will take a differing numbers of passes dependant on the hardware(as a generalized example, the XBox can do Dot3 in a single pass while the PS2 needs two), XBox overall can do the most in a single pass while the PS2 can do the least. Also, the above are Pixel rates, not Texel(and purely theoretical).
Edit- Screwed up recalling the GS clock rate :oops:
Tagrineth
11-Dec-2002, 23:04
Certain effects will take a differing numbers of passes dependant on the hardware(as a generalized example, the XBox can do Dot3 in a single pass while the PS2 needs two), XBox overall can do the most in a single pass while the PS2 can do the least. Also, the above are Pixel rates, not Texel(and purely theoretical).
WRONG! Flipper can do the most in one pass. It can do all eight layers, XGPU maxes at four.
Also PS2's GS is double fast (2.4Gp/s) when not texturing. :)
BenSkywalker
12-Dec-2002, 01:31
WRONG! Flipper can do the most in one pass. It can do all eight layers, XGPU maxes at four.
Utilizing pixel shaders the XGPU can do more then the Flipper can per layer is what I meant to say although it wouldn't surprise me if that carried over to absolutes as well. What takes Flipper six passes to do may take the XGPU four.
Tagrineth
12-Dec-2002, 01:33
WRONG! Flipper can do the most in one pass. It can do all eight layers, XGPU maxes at four.
Utilizing pixel shaders the XGPU can do more then the Flipper can per layer is what I meant to say although it wouldn't surprise me if that carried over to absolutes as well. What takes Flipper six passes to do may take the XGPU four.
Actually I'm pretty sure ERP said the TEV can interleave combines within a pass whereas XGPU can't, so that just went out the window ;)
BenSkywalker
12-Dec-2002, 02:16
ERP
The major difference is that NV2A does all of it's texture reads first, so you can't do a deffered read based on a calculated result (unless it's one of the calculations supported by the texture stage) without resorting to multipass.
In the TEV texture fetches are interleaved with the combiner operations(though not totally freely), so you can theoretically do arbitrary deffered lookups.
However, in most ways flippers texture lookups are more limited as are it's combiners flexibility.
Tagrineth
12-Dec-2002, 02:25
ERP
The major difference is that NV2A does all of it's texture reads first, so you can't do a deffered read based on a calculated result (unless it's one of the calculations supported by the texture stage) without resorting to multipass.
In the TEV texture fetches are interleaved with the combiner operations(though not totally freely), so you can theoretically do arbitrary deffered lookups.
However, in most ways flippers texture lookups are more limited as are it's combiners flexibility.
Yeah, so? That doesn't talk about most things in one pass.
Sounds to me like Flipper might never need multipass for anything... except obvious stuff like stencil shadows :P ERP, or anyone with some dev experience, could you at least answer this? Which core tends to need multipass more often?
BenSkywalker
12-Dec-2002, 02:43
Yeah, so? That doesn't talk about most things in one pass.
It allows the NV2A to do more in one layer(which is what I meant to say in the first place :) ).
Tagrineth
12-Dec-2002, 02:47
Yeah, so? That doesn't talk about most things in one pass.
It allows the NV2A to do more in one layer(which is what I meant to say in the first place :) ).
Well of course NV2A can do the most with one layer! :P
But seriously, I'd like to hear from ERP on this, just how well can Flipper work within a single pass compared to NV2A? I'd imagine Flipper can do a LOT more (quantity, not necessarily quality) per pass from what's been said thus far, but I dunno really.
about virtual texturing:
Did the 3DFX chips use a form of virtual texturing?
When in virtual texturing are texture segmented? Do the texture pages actual represent texel data?
Does the xbox support methods of virtual texturing?
texturing:
How many clock cycles does it take to read and write from ram with the xbox.
How does ram latency effect texture read writes?
what is are loopbacks and passes? Is a pass just the some number of clock cycles it takes to do something theoretically?
Simon F
12-Dec-2002, 08:58
(as a generalized example, the XBox can do Dot3 in a single pass while the PS2 needs two)
That doesn't sound right. Sony's own article on doing dotprod bump mapping on PS2 had about 6-8 passes! (I can't recall the exact figure)
BenSkywalker
12-Dec-2002, 10:23
That doesn't sound right. Sony's own article on doing dotprod bump mapping on PS2 had about 6-8 passes!
We had a discussion about this a while ago and Faf put forth how it was possible in two passes, IIRC the were a few people who explained how it could be done in five passes or less.
PC-Engine
12-Dec-2002, 10:36
I think it was only in theory and not in practice, otherwise common sense would tell us that there would be games using it. It's just like the great JPEG compression argument against S3TC which is possible in theory but not in practice. Heck PS2 can do RAY TRACING in theory :roll:
how does the dreamcast fit into that graph ?
BenSkywalker
12-Dec-2002, 11:01
I think it was only in theory and not in practice, otherwise common sense would tell us that there would be games using it.
Why do you think that? How many GC games support Dot3 despite being very easy to implement? Taking two passes and relying on the VUs for calcs isn't like it is a 'free' feature. Apply a base map and a Dot3(with nothing else) and you are looking at three passes and a spike in the T&L calcs that are on the VU compared to native support on the Cube or even the XBox for that matter(there are more games supporting Dot3 on the Box, but still not nearly all or even most).
It's just like the great JPEG compression argument against S3TC which is possible in theory but not in practice.
I don't recall Faf or Archie ever saying that that was reasonable for in game useage, while they did for Dot3.
PC-Engine
12-Dec-2002, 11:41
Why do you think that? How many GC games support Dot3 despite being very easy to implement? Taking two passes and relying on the VUs for calcs isn't like it is a 'free' feature. Apply a base map and a Dot3(with nothing else) and you are looking at three passes and a spike in the T&L calcs that are on the VU compared to native support on the Cube or even the XBox for that matter(there are more games supporting Dot3 on the Box, but still not nearly all or even most).
Well because there aren't any games using bumpmapping on PS2. If it only took two passes we would've seen games using it because it wouldn't be too expensive to implement. RL on GCN used bumpmapping to name one. I think developers will only use bumpmapping on GCN games if they think it would help the look of the game. All games aren't going to use it obviously.
Regarding the JPEG compression on PS2, a lot of the PS2 backers always bring it up in arguments because in theory the PS2 could do it, but we all know that it isn't being used in any PS2 games. It's only brought up to show how the PS2 is superior when compressing textures since JPEG could have 50:1 compression ratios. This didn't come from Faf, just the PS2 backers.
Fafalada
12-Dec-2002, 12:05
Simon,
while it's true that you touch pixels more times, you can't really count frame buffer math as 'passes'. First it's a fixed cost so it'll easily be a fraction compared to rest of rendering, and it would confuse the hell out of people around here that already have about 20 different definition of 'texture/rendering pass'.
Framebuffer math is cheap on GS - it's one of the things embeded ram is best at.
Anyway, the rendering cost per triangle is 2-4 passes depending on mesh topology and type of lights used.
PCEngine,
that particular argument was skewed out of proportion in the past by fanboys on both sides.
That aside, we've been using IPU in practice for some time now.
BenSkywalker
12-Dec-2002, 12:08
Well because there aren't any games using bumpmapping on PS2. If it only took two passes we would've seen games using it because it wouldn't be too expensive to implement.
Why do you think that? It is even less expensive on the GameCube, look at the "lengthy" list of titles that use it.
RL on GCN used bumpmapping to name one.
And? One launch game out of over 100 titles. It only takes a single pass on the GameCube and ~1% of games use it, while it is twice as costly on the PS2 and there should be several.....?
I think developers will only use bumpmapping on GCN games if they think it would help the look of the game.
EMBM would have made WaveRace an incredible looking title. Why wasn't it in(it is a supported feature)?
Regarding the JPEG compression on PS2, a lot of the PS2 backers
I'm talking about PS2 developers, not fans of the platform.
PC-Engine
12-Dec-2002, 12:20
From the top of my head I remember RE and RE0 uses it, but I'm pretty sure there are other games that use bumpmapping on GCN, I'm just too lazy to look up all the games to find out which ones supports it :D
Like I said in theory PS2 can do raytracing.
Regarding the IPU, I don't doubt it's being used for certain things :wink:
excuse me for interjecting but could some one answer my questions?
london-boy
12-Dec-2002, 18:56
RL on GCN used bumpmapping to name one.
And? One launch game out of over 100 titles. It only takes a single pass on the GameCube and ~1% of games use it, while it is twice as costly on the PS2 and there should be several.....?
:lol: :lol:
love to see clever people fighting and bitching..... :lol: :lol:
RL on GCN used bumpmapping to name one.
And? One launch game out of over 100 titles. It only takes a single pass on the GameCube and ~1% of games use it, while it is twice as costly on the PS2 and there should be several.....?
:lol: :lol:
love to see clever people fighting and bitching..... :lol: :lol:
what can you do?
london-boy
12-Dec-2002, 19:40
....and most of all, arguing and bitching about something that's got absolutely nothing to do with the acutal topic.....
:lol: :lol: :lol:
I am just waiting for ERP to come back. Everyone else is just arguing i guess.
Legion-
can some one then please provide with the theoretical maximums?
Single texture layer/ Two/ Three/ Four/ Five/ Six/ Seven/ Eight-
XBox- 932MPixels/ 932Mp/ 466Mp/ 466Mp/ 311Mp/ 311Mp/ 233Mp/ 233Mp
Doesn't xbox have a 4x2 architecture? it will give it double of the fillrate you have specified.
XBox- 1864MPixels/ 1684p/ 932Mp/ 932Mp/ 622Mp/ 622Mp/ 466Mp/ 466Mp
Am i correct? or maybe I had drink something "bad"? :?
EDIT: I didn't notice he specified it was PIXEL fillrate, instead or TEXEL fillrate, but, why to show pixel fillrate instead of the texel one, which is what is used when you texture?
P.D.: I'm a semi-total newbie in 3d graphics, so be nice if I make some mistakes wich is very very very likely to happen.
Personally I'm surprised ERP hasn't taken a gun to his noggin with all questions thrown at him (and being quoted elsewhere on top of that) :P
I thought a dev mentioned here that the MCP-X throws hundreds of megs of data around the HT bus for audio encoding?
btw, welcome back Ben! ;)
edit: nm, didn't know this thread was so old!!
Personally I'm surprised ERP hasn't taken a gun to his noggin with all questions thrown at him (and being quoted elsewhere on top of that)
I'm surprised that someone hasn't already taken a gun to Apoc's head, for draggin up a 14 month old thread ;)
Personally I'm surprised ERP hasn't taken a gun to his noggin with all questions thrown at him (and being quoted elsewhere on top of that)
I'm surprised that someone hasn't already taken a gun to Apoc's head, for draggin up a 14 month old thread ;)
Ouch! you know... i was reading all time threads, and I didn't notice this was a VERY OLD one, sorry boys :(, as i said, I think I smoked something....
I really can't see the problem of dragging an old thread up from the depths of the forum, if you think it's relevant and you want to ad something to it.
I really can't see what harm it should do, unless people wouldn’t want to see their old posts again for some reason. :)
Why do people call xbox a geforce 4 or a souped up geforce 3?
Pixel and vertex shading is far above that of a geforce 3(I think 2-3x?), but everything else is a good percentage less. Even with 2x the shader performance, that just brings the xbox shader performance into a usable area for games, but most other areas it is a bit lacking. Whoopie, the xbox can do halo 2(while a geforce 3 probably maxes out on halo 1), which still doesn't look or run as good as a geforce 3 with ut2003, or it can run full quality doom 3 at 10 fps while a geforce 3 does 5. You could call the xgpu a souped up Geforce 3 Ti 200 maybe. Of course, I could be wrong about all of this, I'm not positive about the performance figures on the xbox and the geforce 3.
Megadrive1988
11-Feb-2004, 01:29
Xbox's graphics processor (NV2A) is just short of a GeForce4 (NV25)
both NV2A and NV25 are essentially just souped up GeForce3s (NV20)
almost the same architecture. NV2A and NV25 both have 2 geometry/lighting units (Vertex Shaders) whereas the NV20 has one.
all of them have a 4x2 configuration
(4 pixel pipelines with 2 texture units per pipe)
Ok, well I believe a stock geforce 3 runs at 200 mhz core, 230 mhz memory.
From what I can find, xgpu runs at 233mhz core(I thought it was 200, I know it got a downgrading from the original rating), with 200 mhz ddr memory.
I believe geforce 4 speeds are something like 300mhz core, and 300 mhz ram.
Geforce 3 TI 500 runs at 240 mhz core, and 250 mhz memory.
Geforce 3 TI 200 ruuns at 175 mhz core, 200 mhz memory.
So the xgpu has higher fillrate and other features over the geforce 3, but lower memory bandwidth, made even lower since it has to share bandwidth with everything else. I guess you could say the xgpu has a good performance advantage over a geforce 3(but it's not the huge difference people make it out to be, it still isn't geforce 3 ti 500 level), but if it's bandwidth limited, it probably wouldn't be faster than a ti 200.(guess it wouldn't be bandwidth limited unless it does hdtv res or aa, and really not likely to reach bandwidth limit if it goes heavy on pixel shaders).
I don't personally, I would choose a geforce 3 in the xbox over the xgpu if it meant having halo with ut2003 like graphics at 60 fps over halo graphics.(assuming the pentium 3 would be fast enough to get it up to 60 fps)
I'm still just a little peeved that from day one since I had my ti 200, I found out it wouldn't be able to actually use any of those fancy pixel shader effects it had and maintain 30 fps.(well, it sort of did, I put it in a second computer and it runs halo at a pretty much solid 30 fps at 640x480, but looks horrible compared to my radeon 9700 pro at 640x480)
aaaaa00
11-Feb-2004, 03:40
Ok, well I believe a stock geforce 3 runs at 200 mhz core, 230 mhz memory.
From what I can find, xgpu runs at 233mhz core(I thought it was 200, I know it got a downgrading from the original rating), with 200 mhz ddr memory.
I believe geforce 4 speeds are something like 300mhz core, and 300 mhz ram.
Geforce 3 TI 500 runs at 240 mhz core, and 250 mhz memory.
Geforce 3 TI 200 ruuns at 175 mhz core, 200 mhz memory.
The xbox GPU is closest in terms of core clock and featureset to a GF4 Ti 4200, which ships at 250 mhz core clock.
The xbox GPU was originally to have been run at 300 mhz, but was downgraded to 233 mhz due to cost and production yield reasons.
You can't directly compare the core clock of the GF3 and the GF4 because of the differences in the design.
Megadrive1988
11-Feb-2004, 03:42
while the GeForce 3 Ti 500 has higher bandwidth and higher fillrate than XGPU, all of that is needed for higher resolutions than 640x480.
The XGPU crushes the GeForce 3 Ti 500 in terms of polygon & lighting power because XGPU has twice as many Vertex Shaders (2) compared to Ti 500 (just 1 like all GF3s)
DeathKnight
11-Feb-2004, 03:52
The NV2A even has higher shader ability than NV25 (more instructions for both pixel and vertex shaders).
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.