FX and PS 1.4, DX9 tests?

It would be interesting to check how the Geforce FX numbers are different when using FP16 instead of FP32.
One have to append "_pp" to shader instructions in D3D to hint that lower precision is sufficient.

I was expecting 8 FP16 per cycle or 4 FP32 per cycle, so I didn't expect the GF FX to turn out to be faster.
But that would be only 23% slower.
In many of these test it's more than 50% slower!
 
Hyp-X said:
I was expecting 8 FP16 per cycle or 4 FP32 per cycle, so I didn't expect the GF FX to turn out to be faster.

Yeah? I thought the consensus here was 8 FP32 per cycle. OTOH this is the first evidence that you might well be right and would also explain why they were going full out for a very high core speed. Interesting indeed.
 
Lots of things point to NVidia knowing that they didn't have good execution speed on their shaders. Why would they keep their 32 bit fixed-function path "for performance on older applications" if they could run the functions in the shaders at high speed the way the R300 does? Why would they need to have a 64-bit option, twice as fast as the 128-bit, but still slower than the 32-bit fixed function?

Let's face it, if the 64-bit ran fast enough there would be no need for the fixed-function pipeline (which is there, and is clearly plenty fast).

I think they are dispatching fewer 64-bit instructions per clock per pipe than the R300 (maybe not doing different ops in parallel, the way the R300 does) and running at 128-bit halves the rate.

Why wouldn't NVidia design the chip this way? They had no way of knowing in advance that the R300 would run DX9 shaders as well as it does, and the shader performance from the design perspective would seem adequate for the 64-bit path (it's not like it's orders of magnitude behind).

I don't see how drivers could slow down synthetic shader benchmarks like this--the NV30 performance varies across the benchmarks in much the same way that the R300 performance varies, so both chips must be doing basically the same work.

Of course, I lack OpenGL's expertise in this area, but I'm not at all convinced that this is a driver problem--unless the drivers are doing some really silly things. How long has this chip been in a simulator? Why would drivers be doing silly things at this point unless they were patching over problems in the silicon?
 
According to Digit Life (their article gives a lot of information), the NV30's performance drops down twice with floating point in comparison to integer operation. Lets also remember that the NV30 hardware was developed around the time of the R300's, so it wouldn't suprise me if NV's implementation was inferior. However, without eliminating driver performance from the picture, we should not jump to conclusions.

The register combiners of the NV30 seem to be used for backwards compatability on any pixel shaders <1.4, this is why it seems the regular pixel shader performance is superior on the NV30, because it sports 2 combiners per pipeline. The advanced pixel shader performance of the NV30 is inferior to that of the R300, it drops down apparently when it uses its fragment processor. Something is fishy here.

It also seems that a great part of the 15 million transitor difference between the 2 processors comes from this legacy support in the NV30, which the R300 lacks. I mean, how many transistors do 16, 8-stage register combiners take-up. I know the R300 contains a simplified trufrom processor, and could encode and decode into some more texture formats, but that seems way less than the amount of transitors in 16 integer combiners.
 
According to Digit Life (their article gives a lot of information), the NV30's performance drops down twice with floating point in comparison to integer operation.

Din't that same article state they can do two integer ops per cycle?
 
I can't actually download the shadermark test. Its showing a 404 on the website but assuming the tests are the same as the ati treasure chest demo its based on except using PS2.0 (the names are all the same) then the results of the last few most complex shaders are equal or even less than the original PS1.4 versions running on my 8500. (eg the 4 light diffuse bumb mapping gives between 30 and 40fps on my 8500). :|
 
I've taken down the download of ShaderMark, because of some bugs in the 2.0 shaders, which I found (only by using debug runtime + maximum validation feature).
I'll release 1.6a in a few hours with corrected 2.0 shaders.

Thomas
 
Yes, the Digit life article states that the NV30 can execute 2 integers per clock/pipeline, however, I am not sure if these are integers using the register combiners or integers using the fragment processor. With PS=<1.4 it would use the fragment processor. The article also states 1 floating-point op per cycle, but never specifies whether it is fp16 or fp32 precision.
 
tb said:
I've taken down the download of ShaderMark, because of some bugs in the 2.0 shaders, which I found (only by using debug runtime + maximum validation feature).
I'll release 1.6a in a few hours with corrected 2.0 shaders.

Thomas

sweet, i'll re-benchmark and edit my posts when you do
 
If you guys take a look at the results of this codcreatures benchmark from HardOCP,
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfM18xX2wuZ2lm,

you'll notice the FX is not greatly ahead of the 9700 pro, even without AA or Aniso, at 1024*768. It is scary that the FX can execute 2 int ops per clock in each of its shader pipelines (the test uses PS1.1) and the 9700 only executes 1/clock, yet their framerates are so close. With a clockspeed advantage, low bandwith taxation (no AA and Aniso at 1024*768), shouldn't the FX be blowing away the R9700? Is something wrong with the latency of operations on the FX?
 
@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas
 
tb said:
@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas

Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.
 
Ante P said:
Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.
The version of ShaderMark I downloaded says 1.6a in the readme.txt. I think just the HTML has not been updated to reflect the new version.
 
OpenGL guy said:
Ante P said:
Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.
The version of ShaderMark I downloaded says 1.6a in the readme.txt. I think just the HTML has not been updated to reflect the new version.

Bingo! :p
 
Ante P said:
tb said:
@Brent

http://www.tommti-systems.de/main-Dateien/files.html

Version 1.6a is online. 2.0 shaders are fixed and speed is improved. Maybe you could also try the dx8 version, the radeon 9700 is faster with the dx8 version, maybe the geforce fx too.

I've also releases version 1.5 of my dx9 overdraw/fillrate tester. It tests
-front to back rendering
-back to front rendering
-50% fb + 50% bf rendering
and the texel/pixel fillrate with 0-8 textures.

Texture size goes from 128x128 - 1024y1024 for all tests and you can switch from DXT1 to non compressed textures.

Thanks,
Thomas

Overdraw / Z-Reject Tester v1.3 (15.01.2003)
ShaderMark v1.6 Dazu (29.01.2003)

Doesn't seem like the new versions are online.

Haven't updated the link descriptions....
 
Back
Top