Use of Custom Demo's In Reviews

Hellbinder said:
Ok I am having some problems with some of the other numbers in this review like these

http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page12.asp

I simply do not believe that the Nvidia card is really doing AA+AF on this game. There is just no freking way that the Fx is going to be 39 FPS faster than the 9800pro with AA+AF applied.

I repeat.. NO WAY.

I find a couple other oddities in the numbers as well. Some of it can be accounted for becuase of the Nvidia cards Core Speed advantage. But..



They certainly look much different than these numbers:

http://www.extremetech.com/article2/0,3973,846476,00.asp


Granted these are without AA and AF, but I still wouldn't expect such a drastic flip flopped difference.

Actually, here are some with AA and AF:

http://www.extremetech.com/article2/0,3973,920348,00.asp

and more comparing the 9800 to the 5900 without AA and AF:

http://www.extremetech.com/article2/0,3973,1075334,00.asp

and

http://www.extremetech.com/article2/0,3973,1075381,00.asp


Definitely quite different results from the FS review. Maybe AA isn't actually enabled? We have seen more and more of that lately. Maybe it's a "bug" in the NV drivers. The auto disable AA that is used for Splinter cell is being carried over to other benchmarks?
 
Now that is getting frightening...
Although nVidia *could* be using game-specific optimziations which would be reflected in gameplay but which couldn't be enabled in the latest point release.
I doubt it makes all of the huge performance boost though! Certainly deserves some further investigation.


Uttar
 
Brandon sent me these demos the other night to veryify that I was getting similar results and, yes, more or less I was under SS:SE (not tested Q3 yet though). He has about 4 or 5 Q3 demos that he's recoreded and I think he's also looking at them as well.
 
I told Kyle and Brent their desire to use time demos from games will just shift the attention to 'optimizations' on these timedemos, anything with shipping time demo on a CD like Serious Sam 2 and UT 2003 are targets, even [H]'s own UT 2003 timedemo as it is freely downloadable.

I've watched Nvidias scores increase after driver release on their time demo...

42.72 driver [H]'s UT 2003 Benchmark 350/350

1047311589S4MzihRI0V_6_3.gif


44.03 driver [H]'s UT 2003 Benchmark 400/350

1053587960qo2f5WxATf_4_3.gif


5600 Ultra scores 13.4 with the 42.72 driver (the day the benchmark was released to the public) here:

http://www.hardocp.com/article.html?art=NDQ0LDY=

Then a 6 weeks later the magical 44.03 drivers increase [H]'s benchmark in UT 2003 to 39.7...now a 50 mhz core clock doesn't triple performance :LOL:

Every website should be using their own custom timedemo and NOT allowing it to be downloaded, and I suggest contacting some clans on the ladder in UT 2003 as I've played against them/with them in my day and would be a far more accurate real world stress tests :)
 
Geforce FX ..... “Fâ€￾ is for Fiction. “Xâ€￾ is an algebraic expression representing any bench mark result you wish. ;)
 
T2k said:
Unbe-frikkin'-lievable... is this just me or...?

Compare this one:
http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page9.asp

to this older one: http://firingsquad.gamers.com/hardware/nvidia_geforce_fx_5900_ultra/page18.asp

Obviously I'm not talking in terms of fps but the %!

That's the one that caught my eye too. :oops:

One thing's for sure - there will be a cute little video released about all this in a few months' time... "My, what fools we were back then!"... "Dude, I told you 'CheatonatorFX' would have been a better name!"... "Dooooood, I hope the enthusiast community will forgive us!", blah, blah, blah... :LOL: :rolleyes:

Edit - potential for a Kyle Bennett cameo somewhere in there, I feel.

MuFu.
 
Custom benchmark timedemos should indeed be rotated by publicly available recorded sessions with vast differences in order to rule out timedemo specific "optimizations" by IHV's..

BUT.. that being said.. there are also a number of legitimate reasons why you might see a flip-flop in performance superiority between different IHV's hardware just by selecting a different timedemo.

Since different 3D hardware has different strengths and weaknesses, it would be common sense that a timedemo that exercises one IHV's weakness for the majority of the script would have diminished performance on that particular IHV. Timedemos that illustrate the reverse (i.e. contain a majority of time illustrating a weakness of another IHV's hardware) would then also possibly reverse the situation.

I can give you a good example, albeit not with a timedemo since the game at hand does not have such a feature: Morrowind. If you put a 9800 Pro head to head with say a Geforce4 card.. with all settings maxed (max view distance, texture quality, shadows, etc.etc.) the 9800 Pro will benchmark much faster in most all circumstances. A current weakness exists with either the drivers or shadow method to where if you spawn a Winged Twilight (fairly rare creature), the 9800 Pro's framerates nosedive whereas an NVIDIA card has no problem. So if you could theorize a timedemo in this game.. one that is taken in a busy town with a long horizon viewing distance, 10 guards walking around.. you'd have a 9800 Pro powering past the NVIDIA card in framerates. Now theorize a timedemo in an underground dungeon, with little viewing distance do to tunnels and passageways and 3 or 4 Winged Twilights.. Viola- you have a graph that is the exact opposite.

In a Quake3 sense, the same circumstances might be relevent. Theorize a particular IHV sees a reduction in performance when looking at the skies. A timedemo outdoors with a viewpoint that always have 20-30% of the screen with the skybox would yield poor performance on this IHV... yet timedemos recorded in indoor levels with no skybox, or the player's viewing angle mostly looking downwards at the ground might yield the opposite effect.

This is nothing new- and in some cases, certain websites would use this to their advantage to push a particular advantage. By cherry picking certain benchmark scripts, you could pretty much fabricate a "win" just by ensuring a demo/benchmark made ample use of a weakness of a particular IHV that you were trying to compete with. It didn't require any driver "cheats" or "optimizations"- just a good eye and understanding of things that particular platforms did well.. and what other platforms didn't handle well. (alpha effects/smoke and explosion effects on generations past comes to mind the most..)

So whether this is a case of driver "cheats" still remains to be seen. If the nature of the timedemos is vastly different, or if some sort of shortcoming can be found (performance wise, be it driver or hardware) between the timedemos is what should be researched before crying foul play.

If we also take a page from the past, cases like these that turn out to be shortcomings usually improve things for consumers.. a lot of times they point out bugs or inefficiencies in drivers that were just simply not caught by IHV X. They might simply be pointing out a few lines of dead code or less than optimal approach to a particular function or feature. When mistakes or inefficiencies like these are illustrated by opposing benchmarks, the end user usually sees fixes/improvements in drivers from these rather quickly. :)
 
The only thing I just can't understand is: how could Nvidia be so stupid in its ways? What were they thinking, that reviewers would be too stupid to notice the sudden performance changes between different driver releases? They could've done it "properly", I mean, hold back the FX cards from reviewers until the "optimized" drivers are ready and then this whole thing might have remained a secret... But no, they're so confident that they are doing it in front of the whole world... Amazing. So are they just stupid, or that much arrogant?
 
It worked in 1998. In 1999. In 2000. In 2001. In 2002. Maybe they didn't do it to such a level. I don't know. But they did cheat according to some people here and at other places.
So, well... "Why not in 2003?"...

I've got only one explanation:
We're becoming smarter 8) ;)


Uttar
 
Doomtrooper et al, we do know like everyone else about all the timedemo issues.

In fact Kyle has already asked me what I think about recording our own timedemos and not releasing them to anyone.

Please don't think we are oblivious to what is happening out there, we do stay informed.
 
After talking with the reviewer at FS (Brandon). He assures me that AA is on in nascar. He told me he updated the review with screenshots.

http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page13.asp

check the above links for the update.
Editor's Note: We've received emails from a few of you questioning whether the GeForce FX cards are performing anti-aliasing in IL-2 Sturmovik or not. In light of the Splinter Cell AA issue, we felt it best to provide you with screenshots directly from IL-2 demonstrating AA in action on the GeForce FX 5900 Ultra reference board. As you can see, AA is working (4xAA with 8xAF, in one shot, just like our test conditions) but just barely for a "4x" setting. Please download the bitmaps to see for yourself.
So while AA is on it looks like they are stretching the limits of what *4x* AA is supposed to be... :rolleyes:

Still, i have having problems accepting the stock Nascar and IL scores as well... which I have passed on to the ATi Driver Guys for analasys.
 
In fact Kyle has already asked me what I think about recording our own timedemos and not releasing them to anyone.

This would be more detrimental than using timedemos that could possibly have driver "optimizations" in use within them.

Non-publically available timedemos are basically results in a vacuum. They also strain credibility since you are not giving the readership the right to reproduce the results. You also then become incomparable with other websites if the demo isn't somewhat accepted as the "new" standard to be used.

The true solution lies in having a rotated, fresh demo pool that stays fairly current and generally doesnt have any favored conditions that may support one IHV over another (see my previous post about non-"cheat" possibilities that have been used in the past). If you rewind back to the Quake1 days, if you wanted to show a massive lead by a 3dfx card over a PowerVR, you just used a scene with 10 people chain firing rockets.. if you wanted a marginal lead, you just had 5 people running around using standard guns... etc.etc.

Having a proprietary and "secret" timedemo is as good as using non-public released drivers. The conditions and specifics of the timedemo are undefined, so the end results are therefore undefined (and useless..) as well.
 
Sharkfood said:
In fact Kyle has already asked me what I think about recording our own timedemos and not releasing them to anyone.

This would be more detrimental than using timedemos that could possibly have driver "optimizations" in use within them.

Non-publically available timedemos are basically results in a vacuum. They also strain credibility since you are not giving the readership the right to reproduce the results. You also then become incomparable with other websites if the demo isn't somewhat accepted as the "new" standard to be used.

The true solution lies in having a rotated, fresh demo pool that stays fairly current and generally doesnt have any favored conditions that may support one IHV over another (see my previous post about non-"cheat" possibilities that have been used in the past). If you rewind back to the Quake1 days, if you wanted to show a massive lead by a 3dfx card over a PowerVR, you just used a scene with 10 people chain firing rockets.. if you wanted a marginal lead, you just had 5 people running around using standard guns... etc.etc.

Having a proprietary and "secret" timedemo is as good as using non-public released drivers. The conditions and specifics of the timedemo are undefined, so the end results are therefore undefined (and useless..) as well.

yeah, there are many sides for sure

i think rotated is best as well

just letting people know we aren't sitting on our hands here, we are actually discussing these things, we know the current method isn't perfect
 
Having a proprietary and "secret" timedemo is as good as using non-public released drivers. The conditions and specifics of the timedemo are undefined, so the end results are therefore undefined (and useless..) as well.

Why? They are more likely to bear some relation to a game you are likely to play than the prerecorded ones, since most of the time they are just made of games we've played.

Anyway, there is a slight variation on a theme...
 
Hellbinder said:
After talking with the reviewer at FS (Brandon). He assures me that AA is on in nascar. He told me he updated the review with screenshots.

http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page13.asp

check the above links for the update.
Editor's Note: We've received emails from a few of you questioning whether the GeForce FX cards are performing anti-aliasing in IL-2 Sturmovik or not. In light of the Splinter Cell AA issue, we felt it best to provide you with screenshots directly from IL-2 demonstrating AA in action on the GeForce FX 5900 Ultra reference board. As you can see, AA is working (4xAA with 8xAF, in one shot, just like our test conditions) but just barely for a "4x" setting. Please download the bitmaps to see for yourself.
So while AA is on it looks like they are stretching the limits of what *4x* AA is supposed to be... :rolleyes:

Still, i have having problems accepting the stock Nascar and IL scores as well... which I have passed on to the ATi Driver Guys for analasys.

What's interesting here is if you look at those screen shots, the "4xAA" just appears to be a blurred mess.

Look at the brown thing just below the gun sight...
 
Hellbinder said:
After talking with the reviewer at FS (Brandon). He assures me that AA is on in nascar. He told me he updated the review with screenshots.

http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page13.asp

check the above links for the update.
Editor's Note: We've received emails from a few of you questioning whether the GeForce FX cards are performing anti-aliasing in IL-2 Sturmovik or not. In light of the Splinter Cell AA issue, we felt it best to provide you with screenshots directly from IL-2 demonstrating AA in action on the GeForce FX 5900 Ultra reference board. As you can see, AA is working (4xAA with 8xAF, in one shot, just like our test conditions) but just barely for a "4x" setting. Please download the bitmaps to see for yourself.
So while AA is on it looks like they are stretching the limits of what *4x* AA is supposed to be... :rolleyes:

Still, i have having problems accepting the stock Nascar and IL scores as well... which I have passed on to the ATi Driver Guys for analasys.

well it looks like 2AA with a blur filter to me , a pretty hefty blur at that. Perhaps it is quincunx or something , anyway it looks ugly IMO
 
It would have been nice to see results with the standard timedemos as control, so we could isolate the effects of changes in settings, platform, moon phases, etc. As it is we're left feeling suspicious, but without any proof that the change in demos really accounts for the change in rankings.

Although, as Sharkfood cautions, one has to be careful not to interpret such a direct head-to-head comparison of standard vs. custom timedemos as a final test of whether the drivers are cheating--different GPUs react to different situations differently. Perhaps a good solution would be to try to catch cheats that special-case specific timedemos by recording a custom timedemo which matches as closely as possible the standard timedemo, and comparing results on the two. Or even--I know this sounds silly, but, siller things have happened--renaming the standard timedemos? Or editing them in trivial ways? It seems to me that it should be quite possible to throw off any timedemo-recognition mechanisms that may be in place.

Then, one can check to see if the standard timedemos accurately represent dissimilar situations by recording dissimilar timedemos. I don't think it would surprise anyone if the driver teams at both IHVs prioritize game-specific optimizations based on how much the effects they address contribute to the workload of standard timedemos, rather than how frequent they are in "normal" gameplay.
 
I just got another reply from brandon, he says Nascar AA is on and looks fantastic. he is posting shots now.

How is this possible??? one Application has nice AA the other is a blurry mess. Could Nvidia be doing per application tweaked AA???? I mean for more than just performance.
 
Back
Top