8500 outscores the GF4 running vertex programs?

Xmas · Feb 20, 2003

MuFu said:
Interesting that the dual VS units take up a considerable amount of die space also:

I don't think this picture can be used as a 'correct' map of the chip.

Luminescent · Mar 11, 2003

Irrelevent as it may be, the age old public relations conundrum of whether or not the 8500 (and 9000, consequently) contains programmable and fixed function geometry pipelines has been solved (with the help of persistence and informative tidbits). After observing these benchmarks ( http://www.tomshardware.com/graphic/200207181/radeon9000-11.html ), reading the radeon sdk, and conducting a little research on the Radeon 7500 (which I now believe carries the fixed function TCL pipeline of the 8500/9000), I conclude that the R200 originally implemented 2 onboard geometry engines, each containing programmable and fixed function pipelines. As Mufu iterated, the 9000 removed one of those pipelines (fixed function and programmable units), which becomes evident in its performance in the Tomshardware benchmarks.

The evidence for my theory can be found in this information I have compiled:

Radeon SDK:
Fixed function vs. programmable pipelines
RADEON 8500/9000 chips have implemented both fixed function
and programmable vertex processing in the silicon. Using fixed
function with these chips can be slightly more efficient than using
vertex shaders because of the optimized hardware implementation of
the TnL pipeline. Using fixed function TnL also simplifies shader
management and reduces the associated application and driver
overhead.

-From this we can conclude that both the 8500 and 9000 (R200 & RV250) contain fixed function and programmable vertex pipelines

Mufu:
Hardware differences from RV250 to R200 and RV250/M

Problem texdepth, solution - remove Hierarchical Z

HOS removed

Single TCL pipe

One texture pipe for six texture

Texturization internal cache increased from 2K to 4K

-Indictates that the 9000 has only one of the geometry engines found in the 8500 (each unit contains both a fixed function and programmable part).

In the following benchmark, by reactor critical:
http://www.reactorcritical.com/review-battletitans2/review-battletitans2_2.shtml
the 8500 is compared with the Geforce 3 Ti500 and the Radeon 7500 (which had a hardwired T&L unit), and it is almost twice as fast as the 7500 (@290MHz).

-Demonstrates the 8500 has almost twice the performance of the 7500 when executing standard T&L

Radeon 7500 SDK ( http://216.239.37.100/search?q=cach...adeon+7500+polygons+per+second&hl=en&ie=UTF-8 ):
Up to 40 million transformed triangles per second

-Shows the 7500 to have approximately half of the claimed 69-75 million polygon per second rate of the 8500 (fits the fact that the 8500 has two fixed function pipelines).

-The fact that the 9000 remains behind the 8500 in vertex shading performance hints it only contains 1 programmable pipeline. *The 9000 is only slightly ahead of the 7500 hundred in the traditional 3DMark T&L tests, which points to a single fixed function T&L pipeline.

*Radeon 9000 implements a more efficient memory/pipeline architecture than the 7500 (correct me if I'm wrong), so the observed fixed function T&L discrepencies (in tomshardware article) were to be expected.

MrB · Mar 12, 2003

hehe Luminescent.

If I knew you were that interested to solve this problem I could've made a few inquiries at ATI and got a pretty decent answer.

Hyp-X · Mar 12, 2003

MuFu said:
From The Tech Report:

Vertex shader â€” As in the GeForce3, the vertex shader replaces the old fixed-function transform and lighting (T&L) unit of the GeForce2/Radeon with a programmable unit capable of bending and flexing entire meshes of polygons as organic units.

Click to expand...

The GeForce 3 seems to have a dedicated FF T&L unit as well, in OpenGL one can even use it together with VS (at least the 'T' part).
It might be so that the two units share arithmetic units.

Some benchmarks even indicate that the FX might even have dedicated FF T&L units.

KimB · Mar 12, 2003

Hyp-X said:
The GeForce 3 seems to have a dedicated FF T&L unit as well, in OpenGL one can even use it together with VS (at least the 'T' part).
It might be so that the two units share arithmetic units.

Where do you get this from?

From the NV_vertex_program specs:

What part of OpenGL do vertex programs specifically bypass?
Vertex programs bypass the following OpenGL functionality:
o Normal transformation and normalization
o Color material
o Per-vertex lighting
o Texture coordinate generation
o The texture matrix
o The normalization of AUTO_NORMAL evaluated normals
o The modelview and projection matrix transforms
o The per-vertex processing in EXT_point_parameters
o The per-vertex processing in NV_fog_distance
o Raster position transformation
o Client-defined clip planes
Operations not subsumed by vertex programs
o The view frustum clip
o Perspective divide (division by w)
o The viewport transformation
o The depth range transformation
o Clamping the primary and secondary color to [0,1]
o Primitive assembly and subsequent operations
o Evaluator (except the AUTO_NORMAL normalization)

It really seems to me like you cannot use fixed function programming in OpenGL along with the vertex programs.

Hyp-X · Mar 12, 2003

Chalnoth said:
Where do you get this from?

From the NV_vertex_program1_1 specs:

This extension also supports a position-invariant vertex program
option. A vertex program is position-invariant when it generates
the _exact_ same homogenuous position and window space position
for a vertex as conventional OpenGL transformation (ignoring vertex
blending and weighting).

By default, vertex programs are _not_ guaranteed to be
position-invariant because there is no guarantee made that the way
a vertex program might compute its homogenous position is exactly
identical to the way conventional OpenGL transformation computes
its homogenous positions. In a position-invariant vertex program,
the homogeneous position (HPOS) is not output by the program.
Instead, the OpenGL implementation is expected to compute the HPOS
for position-invariant vertex programs in a manner exactly identical
to how the homogenous position and window position are computed
for a vertex by conventional OpenGL transformation. In this way
position-invariant vertex programs guarantee correct multi-pass
rendering semantics in cases where multiple passes are rendered and
the second and subsequent passes use a GL_EQUAL depth test.

Should something be said about the relative performance of
position-invariant vertex programs and conventional vertex programs?

RESOLUTION: For architectural reasons, position-invariant vertex
programs may be _slightly_ faster than conventional vertex programs.
This is true in the GeForce3 architecture. If your vertex program
transforms the object-space position to clip-space with four DP4
instructions using the tracked GL_MODELVIEW_PROJECTION_NV matrix,
consider using position-invariant vertex programs. Do not expect a
measurable performance improvement unless vertex program processing
is your bottleneck and your vertex program is relatively short.

KimB · Mar 12, 2003

That has nothing to do with running the two in parallel. This is useful for computing one pass using the fixed-function pipeline, and a second pass using the vertex program pipeline.

Obviously there is some fixed-function specific hardware, but you can't run the two in parallel.

If it's any consolation, I thought the same thing the first time I read that.

8500 outscores the GF4 running vertex programs?

Xmas

Porous

Luminescent

MrB

Hyp-X

Irregular

KimB

Hyp-X

Irregular

KimB

Similar threads