Old Cards Doing Better In New Games Than "Newer Cards&q

David G.

Newcomer
This is a very interesting situation to me as , in an interview with the UT2003 developers , when asked about VooDoo3 and TNT2 performances , they've said that VooDoo3 is able to do 20 and better FPS @ 640x480 in UT2002 x 16 Bit while TNT2 was able to get the same result but @ 320xwhatever . A nice show from the VooDoo3 cards .

Also , in the latest Anand UT2003 GPU test , ATi's 3 textures per pipeline feature shows it's power @ MAX quality getting always on the GF2 Ti , Pro and GF2 Ultra . This is interesting and JC stated that in DOOM III Radeon will be slower than GF2 GTS , 7500 slowr than GF2 Ti and R200 slower than GF3 ... Now ... I doubt that he was actualy right .
What do you think ? I mean I've never thought the 3 textures per pipeline for the Radeon would ever help ...
Would the R200 benefit noticeably from the 12 textures per pass capabilities ? especialy as drivers will develop much further untill the 2003 DOOM III release ..
 
The R200 cannot do 12 textures per pass. It has 4 pipelines with two texels per pipeline clock = 8 textures per pass.

I think you are confusing it with the R100 (original Radeon) which had2 pipelines and 3 textures per pipeline clock = 6 textures per pass.

Regards
 
misae said:
The R200 cannot do 12 textures per pass. It has 4 pipelines with two texels per pipeline clock = 8 textures per pass.

No, that woud mean two texel layers per pixel (4 pixels at a time) per pass if it weren't for the loopback. The R200 can do up to 3 loopbacks for up to 6 textures per pass AFAIK.
 
misae said:
The R200 cannot do 12 textures per pass. It has 4 pipelines with two texels per pipeline clock = 8 textures per pass.

I think you are confusing it with the R100 (original Radeon) which had2 pipelines and 3 textures per pipeline clock = 6 textures per pass.

Regards

The number of texture units has nothing to do with the number of textures per pass. It tells you how many textures can be blended per pixel pipeline per cycle.

DX9 has the requirement for 8 or 16 textures per pass. Do you really think hardware is going to have 16 texture units per pipe? LOL! most games today use only 1 or 2 textures, and only a few use 3. In Doom 3, they are using stencil buffers which has several rendering passes of ZERO textures per pass. If cards had 16 texture units per pipe, that would be a hell of a lot of silicon wastage 99% of the time. Even matrox's card is overkill with 4 texture units per pipe.

Radeon 8500 can do 6 texture per pass, except it takes 3 cycles to do it. You then effectively generate only 1.33 pixels per clock, but 8 texels per clock. It can also sample twice from each texture, for a total of 12 samples per pass. If any program used this, it would reduce the pixel rate to 0.67 pixels per clock, but again 8 texels per clock.

Kyro can do 8 textures per pass, but has only one texture unit per pipe, and only 2 pipes. In this situation, that's 0.25 pixels per clock, but 2 texels per clock.

The original Radeon does not have loopback, so you only have 3 textures per pass, and in this situation 2 pixels per clock. That's a total of 6 texels per clock, but only 3 textures per pass.


Anyway, back to the original post of this thread. I too was impressed with how well the Radeon 7500 performed. I didn't know that it was the third texture unit doing that, but it makes sense. Still, it shows how good the Radeon architecture is. The only problem was that ATI was too far ahead of the developers with that video card, and there was just way to much silicon sitting idle with the games of its time. If you upgrade your video card once every 2-3 years or so, Radeon was a great card to buy in late 2000, and the 7500 is a good budget card too.
 
I am not so sure as to whether the impressive Performance of the Radeon7500 is due to it's third TMU since this last TMU is AFAIR not a full-fledged TMU but can only do some of the Ops, a regular (i.e. the first two) TMU can.
IMHO it's rather a sophisticated Version of the G400's EMBM-HW, which shows in some 3D-Applikations (3DMark2001 & some of the PowerVR Tech-Demos) as a 3rd TMU.

UT2003 uses, AFAIK, only Cube-Maps which cannot be rendered wi this third TMU.

IMHO, the good performance compared to both the Radeon8500 and the GF2-Line of Cards is mostly due to driver problems (in case of R8500) and the lack of HSR/bandwidth-saving efforts (in case of the GF2-series), i could be wrong though....
 
WRT to the Radeon 8500 6 textures/12 textures/2TMU’s confusion there is an element of truth to what David G initially points out.

Radeon 8500 has 2 TMU’s per pipe – yes. It also has 6 texture registers per pipe, which is why ATi quote 6 texture per pass. Now, I believe that having 6 texture registers means the data you read from each of the textures is can be stored internally completely separately i.e. – the texture units do not combine the results in the intermediate stage (i.e. loops), the combination (or operation) only occurs when all 6 registers are filled.

Now, if we remember back to JC’s .plan update concerning Radeon 8500’s fragment abilities he actually talks about collapsing DoomIII’s rendering into 1 pass because Radeon 8500 can actually perform 11 texture stages in a single pass. How does this happen? On the first pass all 6 registers are filled (via the 3 loops) and then for the second pass the information in the 6 registers are combined – rather than writing this combined result back out to the frame buffer it is written back to a register leaving the other 5 register free to carry out more operations. This is how Radeon 8500 is able to carry out 11 texture operations in one pass when using its pixel shader functionality.

Now, I believe that’s how it works. I’m sure someone will correct me if I’m wrong.
 
This is a very interesting situation to me as , in an interview with the UT2003 developers , when asked about VooDoo3 and TNT2 performances , they've said that VooDoo3 is able to do 20 and better FPS @ 640x480 in UT2002 x 16 Bit while TNT2 was able to get the same result but @ 320xwhatever . A nice show from the VooDoo3 cards.

Yeah, Kyro II is also doing suprisingly well, especially well in outdoor area's and in high detail:

Medium detail outdoor area's at 640x480 = 84fps

High detail outdoor area's at 640x480 = 67.7fps (within a couple of fps of the Geforce 2 Ultra)

Medium detail indoor area's at 640x480 = 115fps

High detail indoor area's at 640x480 = 90fps

From what I can see from those tests Kyro II could be quite playable at 1024x768x32 high detail. In outdoor area's at 1024x768x32 high quality its getting 32fps (faster then Geforce 2 Ultra in this instance) and in indoor area's at 1024x768x32 high detail its getting 48fps. So it looks like Kyro II should be ok at 1024x768x32 at least until Doom3 comes out.

NOTE: while Kyro II is faster then Geforce 2 Ultra in some situations in UT2 one thing to remember is that Kyro II isn't rendering the cube maps while Geforce 2 Ultra is (Cube maps are on in this test right?). This also has to be taken into account with the Voodoo range of cards, as they are also not rendering the cube maps.
 
Re: Old Cards Doing Better In New Games Than "Newer Car

... they've said that VooDoo3 is able to do 20 and better FPS @ 640x480 in UT2002 x 16 Bit while TNT2 was able to get the same result but @ 320xwhatever.

Well, did they compare these two on equal settings? Or did they switch on the 32Bit-Mode on the TNTs just because they could do it?
 
Now, if we remember back to JC’s .plan update concerning Radeon 8500’s fragment abilities he actually talks about collapsing DoomIII’s rendering into 1 pass because Radeon 8500 can actually perform 11 texture stages in a single pass. How does this happen? On the first pass all 6 registers are filled (via the 3 loops) and then for the second pass the information in the 6 registers are combined – rather than writing this combined result back out to the frame buffer it is written back to a register leaving the other 5 register free to carry out more operations. This is how Radeon 8500 is able to carry out 11 texture operations in one pass when using its pixel shader functionality.

That's pretty much right.

PS1.4 capable hardware can do a particular number of lookups from textures in two different passes, and has a particular number of maximum different textures that can be looked-up-in. These limits aren't connected.
 
err no, at least for the cube maps, I'm not sure if there are any Shader effectes on Gf3/4/8500 which also need to be taken into account. UT2003 offcial FAQ's refer to them though..
 
They have a powerful Texture engine that will overlap a bunch of textures on top off each other to provide the desired effect. Thats what they are calling shaders which I though was the same as what Q3 did. Not sure if thats the same thing as PS. However in the UE's texture tool there is a preview window where they have a GF2 compatibility fall back check box which is suppose to render the shader for that class of hardware. I remember it made a difference when I saw it. I still dont have my copy of the demo yet so I can not show you :(
 
DaveBaumann said:
Now, if we remember back to JC’s .plan update concerning Radeon 8500’s fragment abilities he actually talks about collapsing DoomIII’s rendering into 1 pass because Radeon 8500 can actually perform 11 texture stages in a single pass. How does this happen? On the first pass all 6 registers are filled (via the 3 loops) and then for the second pass the information in the 6 registers are combined – rather than writing this combined result back out to the frame buffer it is written back to a register leaving the other 5 register free to carry out more operations. This is how Radeon 8500 is able to carry out 11 texture operations in one pass when using its pixel shader functionality.

Now, I believe that’s how it works. I’m sure someone will correct me if I’m wrong.

That's exactly what I was talking about .

JC stated that he best needs a card capable of doing 11 textures per pass while saying that R200 was able of doing 12 .
 
Re: Old Cards Doing Better In New Games Than "Newer Car

Quasar said:
... they've said that VooDoo3 is able to do 20 and better FPS @ 640x480 in UT2002 x 16 Bit while TNT2 was able to get the same result but @ 320xwhatever.

Well, did they compare these two on equal settings? Or did they switch on the 32Bit-Mode on the TNTs just because they could do it?

16 Bit for both .
 
My bad guys I messed up with what I said earlier.

Even though we have 11/12 passes on an R200 it should on paper be faster than the GF4 but something about GF4 being more efficient makes it faster. Or is it that hardware bug that JC found a couple of months ago in the R200 that everyone has forgotten about and thinks has been resolved with newer drivers?
Methinks the newer driver probably route around the problem rather than fixing it and there is some kind of limitation placed on the R200 core due to some errata (bugs). And I bet it is something to do with how the R200 access caches or looks up the needed data to be able to perform 11/12 textures per pass efficiently. Or was it a bug in the way the R200 processed high level of poly's?

Again apologies for sounding like I don't know what I am talking about (well in this case I really don't have a clue ;)).
 
An interesting point is that multipass vs. multitexture isn't necessarily a clear win.

If the rendering operation is very pixel-shader-execution-time limited and in no way memory bandwidth limited, they will both have identical performance assuming the rendering time in that state is long relative to the size of internal buffering for Z, C etc.

The Doom3 shader is pretty long...
 
Back
Top