8500 outscores the GF4 running vertex programs?

Luminescent · Feb 19, 2003

Everything else equal, Radeon 8500's seem to outscore Geforce 4's in vertex shader benchmarks (3DMark01, 3DMark03). The 8500's are averaging 6-7fps in 3DMark 03 while Geforce 4's average in the of 4-5 fps range. This is with vertex programs and not T&L, where triangle setup rate has little performance effect. With this in mind, is it a possibility that the 8500 has a more robust/efficient vertex program implementation?

In this thread:
http://www.beyond3d.com/forum/viewtopic.php?t=1952&highlight=r200+nv20+vertex
I brought up a similar topic. The processors are by no means cutting edge, but it is rare to have an older processor outperform a newer one. The attitude of many at the time of the Geforce 4's release was one of absolute superiority (albeit PS flexibility) against the competition. I beg to differ after encountering the above information.

Rancidm · Feb 19, 2003

It's funny...with all the talk about 3dmark and pixel shader 1.4, it seems like the 8500 is finally 'coming into its own' so long after it's debut. It's actually getting a lot more positive attention now (that's not about buggy drivers, potential performance, or messed up AA). It almost makes me feel like my card isn't obsolete anymore!

Luminescent · Feb 19, 2003

Same here!

MuFu · Feb 19, 2003

Yeah - your cards are still great for "playing" all the latest benchmarks.

Doesn't R200 outperform NV25/NV28 in the Nature Test of 3DM2001SE on a per-clock basis? Maybe I am mistaken...

MuFu.

Edit - sp.

Luminescent · Feb 19, 2003

I believe the R200 does outperform the NV25 clock for clock, Mufu. It is kind of odd, though, because the old nature scene only used VS and PS 1.1 and NV25 seems to outperform the R200 when running PS 1.1. I guess the benchmark makes heavy use of vertex shaders.

This link gives a little more insight to Carmack's comments on the R200 with regards to vertex/pixel processing:
http://www.tech-report.com/onearticle.x/3393

It seems that the R200 has a limited cache architecture which restricts its performance, even when working in PS 1.4. The RV250 improved upon this by including twice the amount of texture cache (from 3kb to 6kb), and as a result, desplayed much improved performance in shading- intense, low multitexturing scenarios.

MuFu · Feb 19, 2003

Luminescent said:
I believe it does, Mufu, although it seems funny because the old nature scene only used VS and PS 1.1 and NV25 seems to perform better with PS 1.1. I guess the benchmark makes heavy use of vertex shaders.

Cool - I was just trying to refresh my memory. Obviously a level playing field-type situation (i.e. no PS 1.4 advantage for R200) but a VS-centric benchmark, you're right.

Hmm... does NV25/NV28 execute fixed-function (DX7) geometry ops in it's VS unit? Perhaps there were some compromises made in order to emulate legacy calcs in the shader pipeline. R200 has a native TCL pipe IIRC. Might need to double check that though - can't remember exactly.

MuFu.

Luminescent · Feb 19, 2003

The R200 does, in fact, contain a fixed function T&L pipe, here is a snippet from the radeon sdk, found in the following thread:
http://www.beyond3d.com/forum/viewtopic.php?t=4426

Fixed function vs. programmable pipelines
RADEON 8500/9000 chips have implemented both fixed function
and programmable vertex processing in the silicon. Using fixed
function with these chips can be slightly more efficient than using
vertex shaders because of the optimized hardware implementation of
the TnL pipeline. Using fixed function TnL also simplifies shader
management and reduces the associated application and driver
overhead.

I don't know how this would change anything, the R200 is outperformed by the NV25 in regular T&L tests anyways.

JF_Aidan_Pryde · Feb 19, 2003

Same!!

How does the 8500 vs Ti4200 stack in 3DMark2003? Pixel shaders?

MrB · Feb 19, 2003

weird. If I remember correctly the 9000 eliminated the fixed portion of pipeline. You can notice this by running a comparison using 3DMark 2001 SE, the 9000's vertex performance is lower.

http://www.rage3d.com/reviews/radeon9000pro/images/3dmarkchart.gif

MuFu · Feb 19, 2003

MrB said:
weird. If I remember correctly the 9000 eliminated the fixed portion of pipeline. You can notice this by running a comparison using 3DMark 2001 SE, the 9000's vertex performance is lower.

http://www.rage3d.com/reviews/radeon9000pro/images/3dmarkchart.gif

Yeah - RV250 is like NV20. One pipe for VS and FF T&L.

MuFu.

Luminescent · Feb 19, 2003

Mufu, the radeon SDK seems to indicate the RV250 has a fixed function T&L unit, however benchmarks indicate otherwise. Is it possible that what they refer to as fixed function T&L is indeed something else?

Well, it would be nice to have someone post their 8500 128mb scores (3DMark03 game tests use more than 64mb of textures, 64mb may cause swapping to occur), and someone else their Ti 4200/4400/4600 scores.

MuFu · Feb 19, 2003

Interesting that the dual VS units take up a considerable amount of die space also:

PS unit is the same as the one in the NV20 I believe.

MuFu.

MuFu · Feb 19, 2003

Luminescent said:
Mufu, the radeon SDK seems to indicate the RV250 has a fixed function T&L unit...

Really? I have some excerpts from the register spec that state explicitly that one of the changes from R200>>>RV250 is the removal of the T&L pipe. Edit - said "TCL"; meant "T&L"

MuFu.

Luminescent · Feb 19, 2003

Really? I have some excerpts from the register spec that state explicitly that one of the changes from R200>>>RV250 is the removal of the TCL pipe.

Sorry, didn't know that. It does make sense, according to benchmarks.

MuFu · Feb 19, 2003

Well from a programming point of view it probably makes more sense to think of it as having a separate T&L unit, so I can see why they suggest that in the SDK.

Sorry - I didn't mean removal of the TCL pipe. I meant just one unified unit instead of separate FF/VS hardware (a la R200). The weird thing is that "TCL" makes you think DX7 (and all the Charisma engine blurb). It may just be that they have carried the terminology for the geometry unit over in the DX7>>>DX8 transition.

Have a look at this (from a reg spec header - didn't post it when I first got it for obvious reasons):

Hardware differences from RV250 to R200 and RV250/M

Problem texdepth, solution - remove Hierarchical Z

HOS removed

Single TCL pipe

One texture pipe for six texture

Texturization internal cache increased from 2K to 4K

Obviously those aren't the only differences - I think RV250 uses a single channel, 128-bit mem controller vs. the R200's 64bit x 2 etc etc.

MuFu.

demalion · Feb 19, 2003

My recollection is "vertex shader and memory controller borrowed from the R300". However, I think that was either second hand or a comment from an interview (i.e., PR "enhanced), but I'm curious about what makes up a single R300 vertex processor (perhaps minus some VS 2.0 functionality, I'd guess...?)

About the memory controller, the above would make it sound like 2x64-bit, wouldn't it? Again, that depends on the amount of PR/second hand distortion is in the above.

But, perhaps your info can address the above statements?

MrB · Feb 19, 2003

I'm pretty sure the fixed function part was removed. Also that chart I referred you to has 3dmark scores for a 128mb Radeon 8500. The AIW 8500 is a 128meg board.

Luminescent · Feb 19, 2003

Mufu, it is understood that the 8500 sports 2 VS units. This is why it doesn't to make sense (both transistor and performance-wise) for the 8500 to contain a fixed function TCL and 2 VS units. It seems to be more of a conceptual DX7 (fixed function, charisma engine mode) vs DX8 (programmable mode) thing, as you say (Mufu), for both the 8500 and the 9000. The 8500's T&L perofmance numbers seem to indicate a little less than twice the performance of the 9000pro (3DMark2001), which may go to show that the 2 VS units of the 8500 give the advantage, not a hardwired T&L unit. Otherwise the performance difference would not be justified. How could a fixed function T&L unit give such a boost?

In the following benchmark, by reactor critical:
http://www.reactorcritical.com/review-battletitans2/review-battletitans2_2.shtml
the 8500 is compared with the Geforce 3 Ti500 and the Radeon 7500 (which had a hardwired T&L unit), and it is almost twice as fast as the 7500 (@290MHz). To me, this signifies that the 8500 did not contain a hardwired T&L unit, but used its 2 VS pipelines to accelerate generic T&L. If this were not the case, the 7500 would have been able to beat the 8500, since its clockspeed is higher.

Just like the R300 scrapped legacy hardware, it seems more like Ati to have gone with a forward looking solution, completely doing away with the fixed function T&L unit of both the 8500 and the 9000.

P.S. The only reason I claimed the 8500 had a fixed function T&L unit was because I saw it in the radeon sdk. After Mufu brought to my attention that it might not be true, I recant to my initial theory that the 8500 had no fixed function T&L unit.

MuFu · Feb 19, 2003

It is my understanding that the 8500 has dedicated FF and VS hardware also - I didn't mean to suggest otherwise, sorry. Obviously occupying quite a bit of silicon and subsequently legacy T&L support was one of the first things to go when they drafted RV250. From The Tech Report:

Vertex shader â€” As in the GeForce3, the vertex shader replaces the old fixed-function transform and lighting (T&L) unit of the GeForce2/Radeon with a programmable unit capable of bending and flexing entire meshes of polygons as organic units.

The Radeon 8500 also includes an entire fixed-function T&L unit (PIXEL TAPESTRY II), which can operate in parallel with the 8500's vertex shader. The GeForce3, by contrast, implements its backward-compatible fixed-function T&L capability as a vertex shader program.

Heh - I don't think we'll get anywhere in this discussion without knowing more details about the actual throughput/degree of parallism of the T&L/VS units we're talking about. :-\

MuFu.

mczak · Feb 20, 2003

demalion said:
My recollection is "vertex shader and memory controller borrowed from the R300". However, I think that was either second hand or a comment from an interview (i.e., PR "enhanced), but I'm curious about what makes up a single R300 vertex processor (perhaps minus some VS 2.0 functionality, I'd guess...?)

I'd guess the VS is more "PR enhanced" than anything else. Maybe a bit optimized. It must be very close to the R200 VS, otherwise you couldn't just use the R200 driver on the RV250 (the R200/RV250 opensource linux driver is the same, when the RV250 was introduced someone just hacked the chip ids in the driver, and it worked. This driver does not expose all features of the R200/RV250, but still this suggests very close similarities. Interestingly enough, even multitexturing seems to work the same at the register-level.)

About the memory controller, the above would make it sound like 2x64-bit, wouldn't it? Again, that depends on the amount of PR/second hand distortion is in the above.

IIRC, there were different rumours about that. Some said its memory controller is borrowed from the R300, some said its borrowed from the RV200 (Radeon 7500, i.e. the same as on the original radeon). Benchmarks suggest it's 1x128bit (low single texture fillrate, but only at 32bit).

btw MuFu what is the texdepth problem on the RV250?

8500 outscores the GF4 running vertex programs?

Luminescent

Rancidm

Luminescent

MuFu

Chief Spastic Baboon

Luminescent

MuFu

Chief Spastic Baboon

Luminescent

JF_Aidan_Pryde

MrB

MuFu

Chief Spastic Baboon

Luminescent

MuFu

Chief Spastic Baboon

MuFu

Chief Spastic Baboon

Luminescent

MuFu

Chief Spastic Baboon

demalion

MrB

Luminescent

MuFu

Chief Spastic Baboon

mczak

Similar threads