Futuremark Announces Patch for 3DMark03

Joe DeFuria said:
nggalai said:
Correction: it's not different smoke from 330 to 340, but with 340, two different runs. i.e. one run 340, take screenshot, do another run, take screenshot -> different smoke.

Hmmm...does the score for GT2 change from one run to the next?

I suspect not, but that would make the IQ tests for GT2 were NVIDIA got different smoke from 330 to 340 worthless, since ATI got similar problems.

Not following the whole thing a lot personally, so if that was stupid, please don't bash me :)


Uttar
 
AFAIK, Futuremark has not modified instructions order. They have made register name inversion (ie r0 -> r1 and r1 -> r0) and terms inversion (ie r0 + r1 -> r1 + r0).

Here's an exemple of a shader used in Mother Nature :

v330
ps_2_0

dcl t0
dcl t1
dcl t2
dcl t3
dcl t4
dcl_2d s0
dcl_2d s1
dcl_2d s2
dcl_2d s3
dcl_cube s4
dcl_2d s5
texld r0 , t0 , s3
texld r1 , t3 , s3
texld r2 , t0 , s0
mad r2 , r2 , c1.xxxx , c1.yyyy
rcp r3.w , t1.wwww
mul r3 , r3.wwww , t1
mad r3 , r2 , c0 , r3
mad r0 , r0 , c1.xxxx , c1.yyyy
mad r1 , r1 , c1.xxxx , c1.yyyy
add r0 , r0 , r1
mul r0 , r0 , c2.xxxx
dp3 r1.x , r0 , t2
add r2.x , r1.xxxx , r1.xxxx
mad r0 , r2.xxxx , r0 , -t2
add r4.x , c1.wwww , -r1
pow r4.x , r4.xxxx , c1.zzzz
texld r0 , r0 , s4
texld r1 , r3 , s1
texld r2 , r3 , s2
texld r3 , t4 , s5
mul r4.x , r4.xxxx , r3.wwww
add r1 , r0 , r1
lrp r0 , r4.xxxx , r1 , r2
mul r0 , r0 , r3
mov oC0 , r0

v340
ps_2_0

dcl t0
dcl t1
dcl t2
dcl t3
dcl t4
dcl_2d s0
dcl_2d s1
dcl_2d s2
dcl_2d s3
dcl_cube s4
dcl_2d s5
texld r0 , t0 , s3
texld r4 , t3 , s3
texld r3 , t0 , s0
mad r3 , c1.xxxx , r3 , c1.yyyy
rcp r2.w , t1.wwww
mul r2 , t1 , r2.wwww
mad r2 , c0 , r3 , r2
mad r0 , c1.xxxx , r0 , c1.yyyy
mad r4 , c1.xxxx , r4 , c1.yyyy
add r0 , r4 , r0
mul r0 , c2.xxxx , r0
dp3 r4.x , t2 , r0
add r3.x , r4.xxxx , r4.xxxx
mad r0 , r0 , r3.xxxx , -t2
add r1.x , r4 , -c1.wwww
pow r1.x , r1.xxxx , c1.zzzz
texld r0 , r0 , s4
texld r4 , r2 , s1
texld r3 , r2 , s2
texld r2 , t4 , s5
mul r1.x , r2.wwww , r1.xxxx
add r0 , r4 , r0
lrp r0 , r1.xxxx , r0 , r3
mul r0 , r2 , r0
mov oC0 , r0
 
digitalwanderer said:
What should? I didn't see nothing, I just heard that damned loud "WHOOOSHING!" noise overhead again. :(
DW these forums would be a lot less fun without you.
:LOL:
 
Xmas said:
That should be fine.

Fine? Try to convince nVidia of that! ;)

I'll give Futuremark top grade for this solution: They have now shown that they will only 'trust' nVidia by going through the released drivers and - if needed - building a validated patch to work explicitly with that driver version.

This is the only way it can work when you're beyond the point of trust. And for those of us that are pissed at this situation, Futuremark is not the one to blame here, m’kay?
 
nggalai said:
Correction: it's not different smoke from 330 to 340, but with 340, two different runs. i.e. one run 340, take screenshot, do another run, take screenshot -> different smoke.
Sounds like a pseudo-random number reseeding issue. They're probably seeding the generator once and not restoring the seed value between runs to ensure identical output.
 
The two shader codes presented by Tridam gave me an idea: there is an optimizing shader compiler in the 52.xx driver (don't know the exact number). The two different shader codes should be recognized and optimized equally (register name differences shouldn't make much fuss). So with and without the patch the compiled code should be equally good (if there is no application detection, of course). But it isn't, so I suspect that the driver first do an application check and if the application is found, then an internal, hand optimized shader is used and the optimizing compiler is "switched off". When the application is not found, the optimizing compiler do its work.
So if this is the case, the Gainward guy was totally wrong: the compiler is switched off when 330 is used and on when 340 is. Funny, isn't it?
 
digitalwanderer said:
Xmas said:
That should be fine.
What should? I didn't see nothing, I just heard that damned loud "WHOOOSHING!" noise overhead again. :(
Basically usage order inversions are about the least 'invasive' changes that Futuremark could have made to the code to try to get around any possible shader detection.

Consider :
Code:
mul   a, b, c
replaced by :
Code:
mul   a, c, b
... where a is the destination register and b and c are the two sources.

The data access of these two instructions is identical - even the register names and usage remain the same. A compiler that is performing aggressive register-usage optimisations should not be phased by this change (if it is affected then it is being seriously twitchy - having a compile-dependence on a change of this nature would give pretty appallingly unpredictable results). A compiler performing aggressive instruction scheduling should also not be affected by this sort of modification, since the instruction order and the data access patterns are identical.

Essentially these sort of changes should simply alter the binary 'fingerprint' of the shader to avoid direct binary recognition, and shouldn't really affect a generic optimiser.
 
digitalwanderer said:
Xmas said:
That should be fine.
What should? I didn't see nothing, I just heard that damned loud "WHOOOSHING!" noise overhead again. :(
The new code should produce the same results like the old code. Only some registers are different. But seems to be enought to disable the shader detection & replacement.
 
digitalwanderer said:
Xmas said:
That should be fine.
What should? I didn't see nothing, I just heard that damned loud "WHOOOSHING!" noise overhead again. :(

What Futuremark engineers did, was just register name change. When you look ath the code, all the different operations like "mul" for multiplication are IN THE SAME PLACES in both versions of the code. This is important, because the order of the different operations usually affects the way GPU (or CPU!) runs the code. Both ATI and Nvidia can increase the performance by reordering the instructions so that maximum amount of the instructions are executed (SP?) at the same time. Naturally, this reordering doesn't affect the calculations done. All the same operations (muls, adds etc.) are done and the result is same.
This is basic and valid code optimization technique.

Now, Futuremark people just changed the register names. It's just like if I would make for example code
A = 0
B = 1
C = 1
A = B + C.

Then I would make a code
B = 0
A = 1
C = 1
B = A + C.


These both code snippets do the exactly same calculation. The NAMES of the operands are just different. In this 3dmark ps 2.0 shaders case those registers are just like the names above. After the patch, Nvidia could not detect the shader code anymore, because the names of the registers are different. So they have to use normal shader compiler, instead of cheating using some custom made shader code which is "allmost the same but faster".

:D Ah, you are all too fast !
 
Xmas said:
Even simple code reordering can make a driver choke on your application, but that doesn't mean all drivers will.
Who said anything about code reordering? Shouldn't the driver/hardware give the same correct result in either case?
Code reordering was an example. As an experienced programmer, you should certainly be aware that even slight changes can result in big differences.
Even if there were some problem where an optimizer couldn't acheive the best result, the result should be the same, right? So please explain the descrepancies.
 
OpenGL guy said:
Even if there were some problem where an optimizer couldn't acheive the best result, the result should be the same, right? So please explain the descrepancies.
I can't, because
- I don't have an NVidia card right here
- I don't have 3DMark03
- I don't know enough about what 3DMark03 does internally
- I don't have time for it
- and finally, because I'm not arguing there are no application-specific hacks. But that's possible that part of the performance drop is a result of other changes.
 
Could someone with a gffx card try 3D-Analyze's "Anti-Detect-Mode"(shaders option only) with the new 340 patch and the official drivers? I only want to know, if the Pixel Shader 2.0 test performance will decrease... (two bmp/png screenshots would be nice, one with and one without "ADM" -> thomas@tommti-systems.com)

Best Regards,
Thomas
 
Heck, could someone try the 52.70 drivers that someone found on MSI's site and posted up over at www.guru3d.com out on the 3.40 patch to see if it scores the same as the 52.16 set or the same as the pre-3.30 build? (The files in the beta are dated October 23, how long was the 3.33 out for and did FM let nVidia have a copy of it? If so, when-ish? )
 
tb said:
Could someone with a gffx card try 3D-Analyze's "Anti-Detect-Mode"(shaders option only) with the new 340 patch and the official drivers? I only want to know, if the Pixel Shader 2.0 test performance will decrease... (two bmp/png screenshots would be nice, one with and one without "ADM" -> thomas@tommti-systems.com)

Best Regards,
Thomas

Result is the same with 330, with 340, with AMD and without ADM : 50.9 FPS.
 
I'm currently testing 3 driver revisions (44.03, 45.23 and 52.16 - all WHQL) with a FX5900 Ultra and a GF4 Ti4600. I'll post all the details up once I've finished but so far, on the FX, the 44.03 drivers take much bigger performance drops than with the 52.16 ones.
 
Tridam said:
tb said:
Could someone with a gffx card try 3D-Analyze's "Anti-Detect-Mode"(shaders option only) with the new 340 patch and the official drivers? I only want to know, if the Pixel Shader 2.0 test performance will decrease... (two bmp/png screenshots would be nice, one with and one without "ADM" -> thomas@tommti-systems.com)

Best Regards,
Thomas

Result is the same with 330, with 340, with AMD and without ADM : 50.9 FPS.

So, the shader compiler seems to do a good job. Thanks.
 
Back
Top