Vertex Shader performance

RoOoBo · Nov 18, 2002

From Toms

In contrast, the GeForceFX uses a highly programmable floating-point array, which allows for a triangle transformation rate of over 350 Mverts/s. For comparison, the GeForce4 Ti can offers 136 Mverts/s while the Radeon 9700 PRO achieves about 325.

Normalized for clock speed, this gives us the following picture:

NVIDIA GeForce4 Ti4600 (300 MHz): 0,453 Mverts / clock
NVIDIA GeForceFX (500MHz): 0,7 Mverts / clock
ATI Radeon 9700 PRO (325 MHz): 1 Mverts / clock

Where does those numbers come from? Just PR? How are they calculated?

What is used for generating a single vertex? 1 VS shader instruction (exit may be)? 4 VS instructions (just a matrix transformation)? 0 VS shader instructions?

Because without this info I don't really know what they are telling me.

I guess that the 325 millions from a R300 could be the 4 VS instructions from the vector-matrix transformation using its 4 vertex shader (so it would be 4 vertex/4 clocks = 1 vertex/clock).

But I don't know how to explain GeForce4 Ti numbers.

ERP · Nov 18, 2002

Unfortunately without a lot of detail there is no way to tell.

It could be a setup limit, or a limit based on the number of FMAD units, or any number of things.

And as a number it's not a useful comparison anyway, just because using a minimal shader card A out performs card B it doesn't mean that with a 100 instruction shader that the same is true.

GraphixViolence · Nov 18, 2002

I think Tom's Hardware screwed up by listing "Mverts/clock" instead of "verts/clock". We know the R300 can do 1 vertex/clock, or 325 Mverts at 325 MHz. Nvidia claimed 136 Mvert/sec for the Ti4600 @ 300MHz, which works out to .453 vertices per clock. The GeForceFX can do 350 Mvert/sec @ 500MHz, which works out to 0.7 vertices/clock.

megadrive0088 · Nov 18, 2002

how many Vertex Shader pipelines does NV30 have?

RoOoBo · Nov 18, 2002

megadrive0088 said:
how many Vertex Shader pipelines does NV30 have?

Unknown or not aplicable.

It seems to use the same approach that 3DLabs P10 an array (or pool) of FP single precision units (I supose that they will be FMACs). Or that is what it is stated in Toms and other places. But there is no numbers of how many of those units are either.

Although I can see that there can be some benefit using this approach when executing scalar instructions (RCP and others), even more perhaps if they are able to detect instructions with masked components the fetch/decode logic seems for more complex. And that just for a (small?) performance improvement or to lower the transistor counts (less units could be required for the same performance). I think that ATI approach with a scalar unit and a SIMD unit is far better (without any other knowledge about NV30 real architecture).

Vertex Shader performance

RoOoBo

ERP

GraphixViolence

megadrive0088

RoOoBo

Similar threads