FX and PS 1.4, DX9 tests?

Bambers · Jan 29, 2003

er.. It took less than 5mins

Doomtrooper · Jan 29, 2003

Yes my excel skills need some work

Hyp-X · Jan 29, 2003

It would be interesting to check how the Geforce FX numbers are different when using FP16 instead of FP32.
One have to append "_pp" to shader instructions in D3D to hint that lower precision is sufficient.

I was expecting 8 FP16 per cycle or 4 FP32 per cycle, so I didn't expect the GF FX to turn out to be faster.
But that would be only 23% slower.
In many of these test it's more than 50% slower!

LeStoffer · Jan 29, 2003

Hyp-X said:
I was expecting 8 FP16 per cycle or 4 FP32 per cycle, so I didn't expect the GF FX to turn out to be faster.

Yeah? I thought the consensus here was 8 FP32 per cycle. OTOH this is the first evidence that you might well be right and would also explain why they were going full out for a very high core speed. Interesting indeed.

OpenGL guy · Jan 29, 2003

LeStoffer said:
I thought the consensus here was 8 FP32 per cycle.

I certainly am not so sure of it

The results are still inconclusive because there may be driver inefficiencies at work.

antlers · Jan 29, 2003

Lots of things point to NVidia knowing that they didn't have good execution speed on their shaders. Why would they keep their 32 bit fixed-function path "for performance on older applications" if they could run the functions in the shaders at high speed the way the R300 does? Why would they need to have a 64-bit option, twice as fast as the 128-bit, but still slower than the 32-bit fixed function?

Let's face it, if the 64-bit ran fast enough there would be no need for the fixed-function pipeline (which is there, and is clearly plenty fast).

I think they are dispatching fewer 64-bit instructions per clock per pipe than the R300 (maybe not doing different ops in parallel, the way the R300 does) and running at 128-bit halves the rate.

Why wouldn't NVidia design the chip this way? They had no way of knowing in advance that the R300 would run DX9 shaders as well as it does, and the shader performance from the design perspective would seem adequate for the 64-bit path (it's not like it's orders of magnitude behind).

I don't see how drivers could slow down synthetic shader benchmarks like this--the NV30 performance varies across the benchmarks in much the same way that the R300 performance varies, so both chips must be doing basically the same work.

Of course, I lack OpenGL's expertise in this area, but I'm not at all convinced that this is a driver problem--unless the drivers are doing some really silly things. How long has this chip been in a simulator? Why would drivers be doing silly things at this point unless they were patching over problems in the silicon?

Luminescent · Jan 29, 2003

According to Digit Life (their article gives a lot of information), the NV30's performance drops down twice with floating point in comparison to integer operation. Lets also remember that the NV30 hardware was developed around the time of the R300's, so it wouldn't suprise me if NV's implementation was inferior. However, without eliminating driver performance from the picture, we should not jump to conclusions.

The register combiners of the NV30 seem to be used for backwards compatability on any pixel shaders <1.4, this is why it seems the regular pixel shader performance is superior on the NV30, because it sports 2 combiners per pipeline. The advanced pixel shader performance of the NV30 is inferior to that of the R300, it drops down apparently when it uses its fragment processor. Something is fishy here.

It also seems that a great part of the 15 million transitor difference between the 2 processors comes from this legacy support in the NV30, which the R300 lacks. I mean, how many transistors do 16, 8-stage register combiners take-up. I know the R300 contains a simplified trufrom processor, and could encode and decode into some more texture formats, but that seems way less than the amount of transitors in 16 integer combiners.

Dave Baumann · Jan 29, 2003

According to Digit Life (their article gives a lot of information), the NV30's performance drops down twice with floating point in comparison to integer operation.

Din't that same article state they can do two integer ops per cycle?

Bambers · Jan 29, 2003

I can't actually download the shadermark test. Its showing a 404 on the website but assuming the tests are the same as the ati treasure chest demo its based on except using PS2.0 (the names are all the same) then the results of the last few most complex shaders are equal or even less than the original PS1.4 versions running on my 8500. (eg the 4 light diffuse bumb mapping gives between 30 and 40fps on my 8500). :|

tb · Jan 29, 2003

I've taken down the download of ShaderMark, because of some bugs in the 2.0 shaders, which I found (only by using debug runtime + maximum validation feature).
I'll release 1.6a in a few hours with corrected 2.0 shaders.

Thomas

Luminescent · Jan 29, 2003

Yes, the Digit life article states that the NV30 can execute 2 integers per clock/pipeline, however, I am not sure if these are integers using the register combiners or integers using the fragment processor. With PS=<1.4 it would use the fragment processor. The article also states 1 floating-point op per cycle, but never specifies whether it is fp16 or fp32 precision.

Brent · Jan 29, 2003

tb said:
I've taken down the download of ShaderMark, because of some bugs in the 2.0 shaders, which I found (only by using debug runtime + maximum validation feature).
I'll release 1.6a in a few hours with corrected 2.0 shaders.

Thomas

sweet, i'll re-benchmark and edit my posts when you do

Luminescent · Jan 29, 2003

If you guys take a look at the results of this codcreatures benchmark from HardOCP,
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfM18xX2wuZ2lm,

you'll notice the FX is not greatly ahead of the 9700 pro, even without AA or Aniso, at 1024*768. It is scary that the FX can execute 2 int ops per clock in each of its shader pipelines (the test uses PS1.1) and the 9700 only executes 1/clock, yet their framerates are so close. With a clockspeed advantage, low bandwith taxation (no AA and Aniso at 1024*768), shouldn't the FX be blowing away the R9700? Is something wrong with the latency of operations on the FX?

tb · Jan 30, 2003

@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas

Ante P · Jan 30, 2003

tb said:
@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas

Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.

Tahir2 · Jan 30, 2003

Er, Ante P, they are.

Ante P · Jan 30, 2003

Tahir said:
Er, Ante P, they are.

Perhaps it's kept in my browser cache then..
Well since the files shoudl be new that shouldn't be a problem.

OpenGL guy · Jan 30, 2003

Ante P said:
Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.

The version of ShaderMark I downloaded says 1.6a in the readme.txt. I think just the HTML has not been updated to reflect the new version.

Tahir2 · Jan 30, 2003

OpenGL guy said:
Ante P said:

Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.

Click to expand...

The version of ShaderMark I downloaded says 1.6a in the readme.txt. I think just the HTML has not been updated to reflect the new version.

Bingo!

tb · Jan 30, 2003

Ante P said:
tb said:

@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas

Click to expand...

Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.

Haven't updated the link descriptions....

FX and PS 1.4, DX9 tests?

Bambers

Doomtrooper

Hyp-X

Irregular

LeStoffer

OpenGL guy

antlers

Luminescent

Dave Baumann

Gamerscore Wh...

Bambers

tb

Luminescent

Brent

Luminescent

tb

Ante P

Tahir2

Ante P

OpenGL guy

Tahir2

tb

Similar threads