3DMark type cheating in DX9/DX8/OGL games?

As I understand it - thanks to Beyond3d and Extreme tech articles on 3d graphics and 3d Mark 03:

A shader throws 3d commands via an API (OpenGL or DX) to a graphics driver. The driver understands, optimises and then dispatches 3d assembler commands to the execution units of the GPU.

Optimisaton is normally application independent. What happens is that 1) all code is scanned for better ways to do an instruction in the command set of a GPU (e.g. why do a ADD, then a MUL if you can simply do a MAD) 2) instructions affecting independent data items may be re-ordered by a clever driver to avoid pipeline stalls 3) workload may be balanced against GPU execution unit queues to keep all execution units busy to the maximum to get the job done faster. Then after this optimisation instructions are dispatched.

Point 2 above is where ATi specifically intervened - giving their driver better than normal optimisation in its re-ordering optimisation capabilities because of the extent of their knowledge of the shader code. By identifying two shaders by name they were able to trigger human vs machine level optimisation at run time.

To me this is a gray area but I would prefer (in descending order of preference):

1) their drivers optimisation is made better for all shader code to optimise execution dispatch

2) users to be allowed to select human level intelligence shader optimisation or

3) machine level shader optimisation only or

4) no optimisation at all (to see how slow unoptimised code goes)

This is just like compiler theory. ATi slipped some of 2) into 3) above. In my mind that is a distortion that comes close to a cheat depending on what rules FM set but FM must rule on that.

All drivers do code optimisation, that is where major performance gains normally come from.

NVidia executed shaders that weren't what FM asked it to execute. Their drivers didn't really run the eight shaders in FM benchmarks than ran poor substitutes that gave fairly similar outputs - but skimped workload, colour precision and tidy up. NVidia have made no comment on this other than it costs alot to optimise drivers for our chip the way FM like to code DX9 shaders and they did it this way to sabotage us. Wow!!! FM's PS 1.4 and 2.0 shader code and fp precision are being argued on several web sites but it doesn't appear they were biased.

I fear NVidia shot above and below the mark for DX9 with NV3x cards. The need better PS1.4 and PS 2.0 capability and more seriously - lacking fp24 means in DX9 high colour precision means they have to default to fp32 which is seriously slow compared to their fp16 performance capabilities (it roughly halves the through put of their Pixel Shaders compared to their fp16 and scalar performance capabilities).

So NVidia is exceptional fast and well suited to DX 7.0 / 8.0 and 8.1 code, but in DX9.0 high colour precision for any large fragment of the screen causes them headaches. These two design constraints shown in FM may be why they are upset enough with FM to cheat in their tests. NV35 exceeds the requirements of DX9 - but not with speed. In some cases it can't meet the requirements of DX9 with speed.

So crazy but NVidia could do code level substitution in popular games in their drivers to speed games at the cost of image quality.
 
Humus said:
Brent said:
then to me, as a reviewer, this whole thing gives me something to definitely watch out for in games, i didn't know that a video card developer in their drivers could replace shader code like that with their own

Nothing gets out to the card without first passing through the driver. Well, except if you're locking a onboard vertex buffer and retrieved a pointer to it, but that's just about it. The shader is written in standard ASCII text files. In OpenGL the ASCII is just fed directly to the driver, then it's up to the driver to take on from then. It can do just about anything it chooses to. In D3D the driver recieved an array of tokens (afaik anyway), but that doesn't make it any harder to replace it.

I also thought it prudent to point out that there is almost certainly no direct 1->1 correspondence between DX9 pixel shader instructions and the hardware instructions that are actually executed.

Although Dx9 shaders are low level they are still compiled by the driver, and hence they are always "optimised" or rearranged to some extent.
 
As you guys know, this is really a fundamental problem. There needs to be some kind of reference that we can compare how cards render things. This is probably the most probable course to check the drivers, but there needs to be *some* way to do peer review. I really wish we had opensource video drivers. I understand the reasons why ATI/Nvidia want to keep things closed, but there would be so many benefits to the consumer. We'd have peer review in the drivers, not only to find/verify cheating, but also for optimization and bug hunting purposes. It would really allow much more involvement on the consumer side, and from what I've seen of the rage3d forums (don't know about nvnews) I think a lot of people would be happy to be involved. It would benefit those that don't want to be involved as well.

Probably a pipe dream though. :(

Nite_Hawk
 
Humus said:
Well, except if you're locking a onboard vertex buffer and retrieved a pointer to it, but that's just about it.
Where do you get that pointer from? How do you know it's really a pointer to the onboard vertex buffer? :)

There's no way at all of getting direct to the chip.
 
Is it terrible hard to reverse compile / assemble the drivers to look for application specific cheats? I know it wouldn't be much fun shift through several tens of thousands of lines of assembler code - but if we know what shaders they cheated on in 3d Mark- you could search for these and see if they're grouped as a table of application specific cheats. Imagine doing that and finding proof positive of cheats or named application mods for 10 - 20 top benchmarks... if quake or RTCW or Serious Sam or MoG or C&C or 3d mark lower quality stuff...

Is it possible to do this nowadays, given without the symbol table you'd just have raw assembler and a few clues to guide you?
 
Dio said:
Humus said:
Well, except if you're locking a onboard vertex buffer and retrieved a pointer to it, but that's just about it.
Where do you get that pointer from? How do you know it's really a pointer to the onboard vertex buffer? :)

There's no way at all of getting direct to the chip.

Ehm, because glMapBufferARB() returns a pointer and you can use it just like any other pointer.

Code:
void *glMapBufferARB(GLenum target, GLenum access);

Yes, it may return a pointer to a system memory array too, but there's nothing preventing that you get a pointer directly to onboard memory and is generally the expected behavior if you're setting your access parameter right.
 
Humus said:
Yes, it may return a pointer to a system memory array too, but there's nothing preventing that you get a pointer directly to onboard memory and is generally the expected behavior if you're setting your access parameter right.

Not that I'm an expert, but it sounds to me that unfettered access to on board memory is a recipe for WHQL failure.
 
I have the entirety of the shader code from the 44.03 drivers if anybodies interested, strangely enough it's been there for quite a few revisions but has gotten longer over time.

I don't know whether this has any direct implications on the current affair. But I won't post it straight up just in case ;)
 
I don't really have much of an objection to what the hardware guys do to optimize for actual 3D games, Brent. To me, the issues are completely different for software like 3D Mark and software like UT2K3.

The idea behind 3D Mark is that it serve as a neutral, 3rd-party base for comparing vpus to gpus, etc. That can't happen if one company is rigging its drivers to do a lot less work than the benchmark creators intended while appearing to run the benchmark, for obvious reasons. The solution there is for the hardware guys to produce drivers which don't recognize benchmarks in any fashion.

The idea behind software like UT2K3 is to play the game. If the optimizations one company implements versus another produce differences in image quality then such differences are fair game to the hardware reviewers--just as is presently the case. As long as a review of a product also contains a comparison of that product with another then any credible hardware review has to include an examination of image quality within the comparison. Otherwise, the product comparison is rendered moot and ineffectual. I don't see this as a problem because nVidia isn't likely to instruct its drivers to insert clip planes when running UT2K3 when the camera faces in a certain direction, right?...;)

Basically, my idea of what's acceptable is:

(1) The driver guys certify their drivers do not recognize benchmarks (all benchmarks.)

(2) 3D games are up for grabs--have at it, guys

The only exception I'd make to this rule is that of product demos, which openly make no pretense of being neutral are acceptable for that reason.
 
I do not have a problem with changing the code as long as the OUTPUT of the code remains the same. Changing code that reduces the rendered output explisitly for benchmarking and then back again for normal gaming is just plain sick and wrong.

Brent said:
Humus said:
Yes. It could be done in real games, DX and OGL.

then to me, as a reviewer, this whole thing gives me something to definitely watch out for in games, i didn't know that a video card developer in their drivers could replace shader code like that with their own

that is pretty sneaky, but now i'll have to really watch out for that and make sure the images look like they were intended to by the developer

i'm glad futuremark was able to investigate it and find out exactly what they were doing, it helps us all look for these things now in games, we know just how sneaky drivers can be

this right here shows you at least one need for synthetic benchmarks
 
RussSchultz said:
Humus said:
Yes, it may return a pointer to a system memory array too, but there's nothing preventing that you get a pointer directly to onboard memory and is generally the expected behavior if you're setting your access parameter right.

Not that I'm an expert, but it sounds to me that unfettered access to on board memory is a recipe for WHQL failure.

It works this way in both D3D and OpenGL. In OpenGL you also have the choise of letting the driver upload the data for you with glBufferDataARB() while in D3D locking the buffer is the only access method available. This should not affect WHQL one bit. If this functionality wasn't there in D3D you would not pass WHQL though since 99% of the apps just wouldn't work as their vertex buffer broke. The reason this exists is that is avoids a memory copy, which may speed up apps with a lot of dynamical data.

You can't directly access arbitrary memory though, just the previously defined vertex buffer. That memory is mapped onto the clients address space and can be accessed directly, but would you read or write outside it you will of course get an access violation exception and be terminated by the OS.
 
I think the secret is to create a wrapper for the API similar to what the MIT guys did for the dawn demo.

You could use your knowledge about what is passed into the API (like shaders) and alter them slightly (instruction shuffle) so that the driver is not aware that you are running quake, or serious sam, etc.

It would be real cool to see how fast the cards really are running the original code.
 
YeuEmMaiMai said:
I do not have a problem with changing the code as long as the OUTPUT of the code remains the same. Changing code that reduces the rendered output explisitly for benchmarking and then back again for normal gaming is just plain sick and wrong.
Agreed on the last part, but what about when the output remains the same in a game. The driver writers are effectively deciding that the game developer hasn't done a good enough job at producing optimal code despite their own developer relations help, and so ignoring the code the developer has produced in favour of their own. That's morally dubious, isn't it? The developer can no longer say they wrote the program that's running because they didn't. Unless you're referring to an intelligent assembler which recognises general cases, in which case I agree it's fine.
 
Back
Top