GF4 has inflated 3dmarks scores so says the INQ.....

OK, so these are the facts:
MadOnion is aware of this issue.
They claim that nVidias' drivers cause this behaviour.
They do NOT however state which score is the more "correct" - with or without splash screens.

That last observation is interesting - if the situation had been the reverse, ie if nVidias drivers encountered some problem caused by the splash screens, MadOnion would simply have said that owners of nVidia cards should run 3DMark2001 without splash screens to get an accurate result.

In this case however they make sure to wash their hands of any responsibility, say that it is nVidias doing and do not in any way indicate what is actually going on.


The only way I can see this as making sense, is if nVidia makes 3DMark specific adjustments triggered by or in some other way connected to the splash screens. Depending on your perspective, this may or may not be immoral. An nVidia PR spinner would probably phrase it as "nVidia makes sure that their hardware is taken full advantage of" which sounds reasonable enough, whereas others might feel that nVidia misrepresents the performance of their cards by making adjustments that are not representative for applications in general.

{This is similar to having a SPEC compiler that identifies specific subtests in the SPEC suite, and generates code in a specific manner only for that specific test. Note to MadOnion:similar problems have been encountered and dealt with previously.}

If the above assumption is correct, MadOnion is in a tricky situation. They know a vendor makes benchmark specific "optimisations", but also know that if they blow the whistle on them, it would both hurt the credibility of 3DMark, and hurt the relationship with the vendor. Besides, it's not obviously disallowed in any way for a vendor to apply application specific adjustments in their drivers. Furthermore, it is quite difficult to cover all possible methods of tweaking for a given application that can be done within drivers, and MadOnion probably has little interest in opening that can of worms.

Have nVidia done something wrong? Well, they have taken advantage of their knowledge of this particular benchmark in order to maximise their score. It is not explicitly forbidden anywhere I have seen, so nVidia is doing what might be expected in a competitive environment. From a customer perspective however, this gets in the way of trying to objectively compare products in the marketplace.


I'd say that both the ATI "quack" incident, and this one serves the purpose of making at least some part of the public more aware of the politics and pitfalls of benchmarketing. Given the competitiveness of the graphics business, these kinds of tricks are perhaps even more likely to get exploited in the future. To vendors, 3DMark2001 is a marketing tool. Lets make no mistake about that. But if customers start to realize that manufacturers can manipulate their scores to a large extent (particularly since the output isn't deterministic, ie pixel output ("quality") isn't measured), then their interest in 3DMark scores will wane.

Some serious thinking about policies might be in order.

Entropy
 
Matt,

We had some speculations already over at SC that it might be related to memory "pre-charging" (thus would affect GF4's I suppose to a higher degree), that's why I said that I doubt we'll see any kind of behaviour in normal games.

Call me naive if you want, but either "quack" in the past or this one now get in my mind always considered as driver bugs/quirks.

Of course -whatever it should turn out to be- if NV and madonion were aware of it and it does in fact produce unrealistic scores and both were suspiciously silent about it all the time, then there is something fishy about it in the end.

I agree with Entropy's last post; combine that with the fact that a user detected the said issue as early as November last year and the silence in between is not exactly flattering, especially for madonion.

edit:

Some serious thinking about policies might be in order.

Entropy,

You don't mind if I add ethics to that sentence too, do you?
 
You will not find either policies or ethics in any large corporation, its a dog eat dog world out there and everyone wants a piece of somebody.
My main isssue with this is what is being done with the ORB scores if the database now holds incorrect benchmarks...

I can never understand how someone can hate a company. A company comprises of many individuals who, in NVIDIA's case, has gone to college, gained a degree, got a good job, and work daily and long hours. Usually they have a family - a wife, and even kids. They pay the bills, put food on their table, and they try to get by day after day by working at that company

Hating a company is one thing but having a preference is another, Be it:

1) I'm a SONY man
2) I'm a Ford Guy
3) I'm a GM guy
4) I'm a Import Guy
5) I only like Logitech mice
6) I only like Linux
7) I only like Microsoft OS
8) I like the frosted side of shreaded wheat
9)....


I could go on forever, and for everyone that states they have no Bias towards any company I could easily point out something (be it a toaster) that they grew up with thinking was supeior be it a Sony TV's or Ford Trucks either influenced by family or friends. Thats life...
As for your paying the bills comment from company employees etc...well I look after my own family 1st which mean making smart decisions about purchases...I don't give a Rats Ass what somebody is making in a big corporation for a living. I care what they are giving me for the value of MY money...
 
Matt Burris said:
Here's what he wrote to me:

Saw your comment on the nVidia "bug" that causes results to drop in 3DMark 2001 without splash screens. That's an optimization for sure, we used to do the same sort of tricks at Real3D in Winbench 2D and 3D. We would detect the splash screen (check for a texture of a certain size and format, etc) and at that time we would flush all our buffers and stuff. At that point you know that your scores aren't being measured so that's a REAL good time to do any critical "clean up and get ready" work or anything that takes more time then normal.

Gonna post this up on 3DGPU tomorrow morning, I'm about to head to bed, but thought I'd throw this out there for you guys to talk over.

That would require that the texture used for the title screen is unique, not just a standard size/color texture. I suppose that's possible, but it sounds odd. The title screens are not stored in the texture.ras file along with the other textures, and are probably compiled into the executable itself, so I don't see how there's any way for us (or NVIDIA) to know what size/format they are. You can also tell by the blurring of the font in the title screens that the texture itself isn't the same size as the display, so you can't just check for a 1024x768x32 bit texture being used. The logical assumption would be that the screens were made in a 4:3 format and downsampled to a standard 1:1 texture size (like 512x512), then stretched out to the original dimensions when displayed. If that's the case, the best you could hope for in detecting it, would be to check and see if you're drawing two polygons that take up the entire screen. I don't even know if that's possible for a driver to know when you're using hardware T&L.

Besides, the display is reset before each test, and since it has to load completely new textures and geometry for each test, one has to assume it dumps out all the old info anyway, so I don't see what kind of "clean up and get ready" work could be done that isn't already being done.
 
Ailuros said:
Doomtrooper,

I know the usual reply.....3dmark yadda yadda.....industry standard etc etc. I take it for what it really is and the last thing I base my purchase decision on is a 3dmark score.

Well you are also alot more versed in 3D than the average JOE that just looks at the box of a retail video card and because it looks 'cool' buys it..
This site along with the other fan sites accounts for what maybe .05 % of the people that actually buy video cards, but OEMs like Hercules and Visiontek ship 3Dmark with their product and now possibly its wasn't giving accurate info.
When I had my computer business, younger kids that are in their late teens would come in and all they cared about was 3Dmark scores...its is a huge marketing tool for video card sales.
 
Optimizing for a game is a lot different than optimizing for a benchmark. The fact that Quake III is a benchmark as well, blurs the line a little. 3D mark is a tea kettle tho.
I appreciate a company doing a survey of popular games to ensure they have all their ducks in a row. Still. keying off of the exe name in the driver is a little extreme, and not very robust.
Optimizing for 3d Mark specifically is just wrong, since it's purpose it to test various areas of a card in a general way and return numbers. However, if it inspires resorting of code paths otherwie ignored by most games, I think this is it's function and a good thing.
I have the general sense after reading reviews and message board posts that NVIDIA cards are sort of benchmark darlings and don't aways pan out as well in the real world across a wide spectrum of games. On the flip side, a vendor would be foolish to market a card that wasn't at it's best when rendering a teapot.
 
It is VERY VERY simple to detect or even "see" a texture that is compiled into an executable. Any decompiler can grab it.

Playing with that, one could easily see what texture file is being used. Video Card drivers CAN pick up that texture use easily as well. Tell it to look for a certain instruction the same as it would do to look for a texture map to be placed on a polygon.

The drivers could then say "here's the splashscreen, lets dump our cache and precharge again".

It wouldnt be that hard for any moderately skilled programmer to take adavantage of.

One, this either points out a good trick that nVidia is using for gaining performance.

Or it shows that nVidia's drivers are not able to take as good as an advantage with the hardware without some assistance.


Looks like Splash_a.pcx might be the file, anyone else wanna confirm this?
 
Jerry Cornelius said:
Optimizing for a game is a lot different than optimizing for a benchmark. The fact that Quake III is a benchmark as well, blurs the line a little. 3D mark is a tea kettle tho.
I appreciate a company doing a survey of popular games to ensure they have all their ducks in a row. Still. keying off of the exe name in the driver is a little extreme, and not very robust.
Optimizing for 3d Mark specifically is just wrong, since it's purpose it to test various areas of a card in a general way and return numbers. However, if it inspires resorting of code paths otherwie ignored by most games, I think this is it's function and a good thing.
I have the general sense after reading reviews and message board posts that NVIDIA cards are sort of benchmark darlings and don't aways pan out as well in the real world across a wide spectrum of games. On the flip side, a vendor would be foolish to market a card that wasn't at it's best when rendering a teapot.

I have no problem with optimizations as long as its not lowering IQ etc...Applications are optimized for specific processors all the time to runs as fast as possible with SSE etc.., Sysmark for example:

http://www.anandtech.com/cpu/showdoc.html?i=1543&p=5

When Kyle Bennett posted the Quack thing on Hardocp they posted 'found a reference to Quake' in the drivers... Whoopee that reference had been there since the Radeon 1 and whatver ATI was doing with their optimization was not affecting Radeon 1 quality, so to put it more clearly ATI was doing application detection with the Radeon 1 too.
The Radeon DDR's won all the IQ crowns with that reference in the drivers so whatever the optimization was doing it was not lowering IQ on the Radeon 1 but didn't work so well with the 8500.
Quack was not a 8500 debut thing that Hardocp led people to believe :rolleyes:
 
Side note: it would also help if people mentioned which OS they were running. Makes it a little easier to figure out what the cause is when you know all the conditions affecting it.
 
Pete said:

roflmao.gif
 
Hey!! :devilish: I work for a company who is owned by a company that is owned by Enron!!! :oops: You may too and not know it :rolleyes:

Still it falls on MadOnion to ensure there benchmark is accurate. It looks like using their benchmarks can lead to some very false conclusions if you are trying to compare card A to card B. If MadOnion don't at least attempt to explain what is happening then it is as good as them secretly conspiring with Nvidia to have a higher score. I think this is a test of MadOnion credibility.

I've found 3DMark to be very useful for optimizing my hardware. Now saying it is as useful in comparing other video cards maybe very faulty and this is just one more reason why this benchmark shouldn't be used for that.

Now how is flushing caches, pre-charging etc will help the score I've havn't a clue. Isn't caches constantly changing depending on the data going to the GPU? Or are you saying the Cache is being filled with fixed data which is used most throughout the benchmark game 4 test giving the increase frame rate due to the greater amount of hits in the cache? Something that has to be very specifically tested and optimize for. Which wouldn't reflect any actual game play due to the dynamic differences in games where changes allowing such an optimazation to be impossible. The frame rate is indeed faster using spash screens, now what is the reason for that?
 
I by no means know a whole lot about coding video drivers... but I do know that there are all kinds of optimzations that can be done that can increase performance without reducing IQ. (though speaking of reduced IQ, remember not all that long ago there were Dets that really did have crappy 3dmark IQ?? Would have to do some forum searches, but there were pics and posts about it all over the place... perhaps an early implementation of the bench detection routines that had bugs? just like ATI's bug regarding their Q3 optimized code path? Would be incredibly interesting to see what happened with those dets if the splash screens were turned off...)

Anyway, that wasn't why I posted. Regarding the increased performance... If you know that a Game(or bench) only uses a subset of required branches in your driver code, you can create a specific code path that is launched, say on detection of each splash screen in 3Dmark, and when detected, that code-path in the drivers only follows the needed driver functions for that bench.

It makes great sense as a way to optimize for 3dmark... just like people will run each individual game test seperate from each other in one bench session (once all 4 have been run, you do get a 3dmark score) so that they can overclock their cards to the extreme limit for each of the test. Different tests allow you to reach different maximum clock speeds without causing a crash.

So... why not do the same in the drive code paths. Nvidia knows they don't need to support pixel shaders in the first 3 game tests, so when they detect those splash screes, the code branch is optimized for their old T&L path and specific driver functions needed for rendering those tests. When they get to the Nature game test and detect that specific splash screen, the drivers go down a different code path which optimizes the pipeline more for accomplishing the pixel shader effects while doing whatever else that test needs.

Seems like exactly the kind of thing a video company would do to get the most performance out of their card in a specific bench. Is it accurate, right, or good? well, everyone is going to have a loud opinion on that... myself, if it is rendering everything it should, the way it should, you can't knock them too badly for it. It should however be made known because it invalidates those bench as references for general performance of the card and drivers.
 
(though speaking of reduced IQ, remember not all that long ago there were Dets that really did have crappy 3dmark IQ??

Do you mean that one set that displayed heavy dithering in the fog in the Dragothic scene?
 
Matt Burris said:
(though speaking of reduced IQ, remember not all that long ago there were Dets that really did have crappy 3dmark IQ??

Do you mean that one set that displayed heavy dithering in the fog in the Dragothic scene?

I'm not entirely sure... i don't know if it was just heavy dithering or was more washed out textures or whatever else. I really don't remember what the issue was, but as I was writing up my post I was reminded there was some hubub about something like that regarding crappy IQ on GeForce cards a while ago in 3dmark. That might have been something entirely different from what I was getting at in my post, it's all theory anyway as i've no way to test it myself.
 
Yeapper I remember that dithering. I have tested it out and do not see it. I also saw about a 500 pts difference in run 3dmark with or without splash screens on my gf4 ti4200. I have used the lastest det drivers on nVidia's web site. But I did not try the new 3dmark patch.
 
yep thats the set he means. Shark has mentioned it before, but it got fixed pretty damn quick IIRC.
 
The danger is that driver engineering decisions will be made that favour benchmark programs, or that time is wasted specifically optimizing for benchmarks instead of improving the driver.
Unfortunately, they have little choice since scores in 3d mark and Quake III seem to decide the comparative performance of grfx cards.
Look at the Parhelia. It will likely have the fastest hq filtering available, but it's already yesterday's news because it won't take a GF4 in standard QIII benches.
 
Jerry Cornelius said:
The danger is that driver engineering decisions will be made that favour benchmark programs, or that time is wasted specifically optimizing for benchmarks instead of improving the driver.
Unfortunately, they have little choice since scores in 3d mark and Quake III seem to decide the comparative performance of grfx cards.

Of course it focuses on optimizing for benches, and not for general performance... but there is indeed a great pressure for companies to perform the absoulte fastest possible in 3DMark and other mainstream review benchmarks.

It certainly provides a strong motive for a company to do something along the lines of what I suggested in my earlier post.
 
Back
Top