Use of Custom Demo's In Reviews

micron said:
Ok, which PowerVR board was he talking about then?
Sorry if I'm showing blatant stupidity :?

The PCX / PCX-2 ... ?
 
Brent said:
Doomtrooper et al, we do know like everyone else about all the timedemo issues.

In fact Kyle has already asked me what I think about recording our own timedemos and not releasing them to anyone.

Please don't think we are oblivious to what is happening out there, we do stay informed.

The problem I have with entirely exclusive demos is that it then becomes impossible to verify for yourself the results the reviewer found. I am not implying you are fudging your figures in the slightest but the only check we have as readers is access to the demos used in the review.

Perhaps a half and half solution?

Use one demo publicly and have a private unreleased demo for the same game that sanity checks the public demo's figures.

Philip
 
BRiT said:
micron said:
Ok, which PowerVR board was he talking about then?
Sorry if I'm showing blatant stupidity :?

The PCX / PCX-2 ... ?
Shit. Everytime I think I know what I'm talking about(happens rarely), I dont know what I'm talking about. :(
 
Doomtrooper said:
Yep I linked the wrong graph, still a significant increase without AF..so my assumption is just fine.

1053587960qo2f5WxATf_4_4.gif


1047311589S4MzihRI0V_6_3.gif


Still over double from the 42.72 drivers:

2X FSAA/8X AF 42.72: 20.2
2X FSAA/8X AF 43.03: 51 FPS !! or almost 2.5 times the performance on the same benchmark, and no 50 mhz core clock doesn't account for it.
I compared no AF scores also the speed improvement is the same, so your optimized AF idea is wrong.

Well, yeah. Now you are proving your point :)

But your final sentence is still wrong. The no-AF scores do not show the same speed improvement. In fact, in the original review there are only no-AF no-FSAA scores, which increased from 40/60 in the old review to 60/90 in the new review, respectively. While the FSAA/8xAF scores show 2.5 times the performance, the no-FSAA/no-AF scores only show a +50% improvement.

Semantics aside, I think this thread proves without a doubt that nvidia is detecting and "optimizing" UT2003: http://www.beyond3d.com/forum/viewtopic.php?t=6480.
 
micron said:
It wasnt too long ago I was playing games on a Kyro1 and I dont recall experiencing 1-2 fps at any time during Quake...
Though I might have been having too much fun to notice.....

1-2 FPS?! Luxury!! In my day.... :LOL:
 
Hanners said:
micron said:
It wasnt too long ago I was playing games on a Kyro1 and I dont recall experiencing 1-2 fps at any time during Quake...
Though I might have been having too much fun to notice.....

1-2 FPS?! Luxury!! In my day.... :LOL:
NOooooo :D
 
It wasnt too long ago I was playing games on a Kyro1 and I dont recall experiencing 1-2 fps at any time during Quake...
Though I might have been having too much fun to notice.....

Yes, notice Quake1 + Voodoo. I'm talking about Voodoo, PVR, Rendition, Rage days.. looong before even the Voodoo2's and Geforces. PCX1 versus Monster3D was the Website shootout at the time.
 
I also would look at Mike Chambers numbers from Nvnews, as he uses fraps to get a average and his numbers don't match up with any 'timedemo' reviewed UT 2003 numbers.

I don't think Mike's numbers were meant to be used as end-all, be-all numbers. They were just some "I just slapped the card in my system and ran a few quick benchmarks" and weren't in a review, just casual information posted on a forum thread. He also explained he's not using the full-retail version of UT2003 (just the freebie download/demo version most likely) which is most definately an old revision and may differ from benchmarks elsewhere.
 
I think FRAPS is useful probably most for in-game checking. For an absolutely fair benchmark of an app, you'll need a demo run with fixed number of frames that must all be rendered. FRAPS doesn't work that way.
 
Is it possible for a driver to cause FRAPS to report incorrectly ? Is there no end to what a reviewer has to look out for nowadays ?
 
Is there no end to what a reviewer has to look out for nowadays ?

No, there is definately an "end"- but the degree of knowledge that is assumed is directly the responsibility of the reviewer. It's totally self-inflicted.

An example- look at typical Gaming magazines and their hardware reviews. They list the product, price, run a 3DMark and Quake3 timedemo- then score the hardware "86%" or "9 out of 10" and they are done. They haven't taken a bite into anything tangible or technical. Just a joe-generic, low-tech, medium value hardware review.

The problem starts when a reviewer decides to try to create some sort of technical baseline of comparison. When they start running histograms, providing "comprehensive" image quality analysis, start running side-by-side featureset comparisons and charts.. and all done with less than stellar knowledge of the hardware they are comparing. It leads to putting in print less than scientific results and making a basis which MAY not be correct. It's the time honored "biting off more than one can chew" syndrome and should only be attempted by trained stuntmen. :)

So that is the reality of the situation- you have MANY sources trying to create scientific/comprehensive comparisons when they lack the knowledge or savvy to attempt such a feat. It's the 10% rule- you have to be 10% smarter than your readers, and if you are instead 50% less knowledgable, sparks are going to fly.

Benchmarking is no joke, and to do so objectively and thoroughly, you need to understand the elements at play, create controls to ensure your results, and test, test, test and retest! When revealing comprehensive results, a reviewer needs to pretend like some multi-million dollar decision is riding on the accuracy of their results- because in the long run, there very well might be. A review that creates a (false) image of a product shortcoming might cause a delta of >$1 million for lost sales of one product and increased sales of another, with the end-consumer being the real victim. I'm sure if there was accountability with reviewers, you would see MUCH less false benchmarks, AA/AF tests where neither are actually enabled on one IHV versus another, or actual in-depth study/research to figure out what featuresets are supposed to do and how they are delivered by different IHVs.
 
Doomtrooper said:
I also would look at Mike Chambers numbers from Nvnews, as he uses fraps to get a average and his numbers don't match up with any 'timedemo' reviewed UT 2003 numbers.

Yes, you are correct. My results will not match those from the typical timedemos since they are from "gameplay" sessions. During these sessions, I play the game and use the logging mode in FRAPS to capture frame rates.

When I publish the results in a review, I always make sure the settings I use are provided. This allows the reader to conduct similar tests on their PC in order to make a comparison. For example, in UT2003 I normally play against six skilled bots on DM-Antalus. I also list the in-game graphics settings and enable high quality sound.
 
Sharkfood said:
I don't think Mike's numbers were meant to be used as end-all, be-all numbers. They were just some "I just slapped the card in my system and ran a few quick benchmarks" and weren't in a review, just casual information posted on a forum thread. He also explained he's not using the full-retail version of UT2003 (just the freebie download/demo version most likely) which is most definately an old revision and may differ from benchmarks elsewhere.

I finally purchased the full version of UT2003 yesterday :) But I am eager to get started testing the Radeon 9800 Pro under the same gameplay scenerios that I did with the NV35.
 
MikeC said:
Yes, you are correct. My results will not match those from the typical timedemos since they are from "gameplay" sessions. During these sessions, I play the game and use the logging mode in FRAPS to capture frame rates.

When I publish the results in a review, I always make sure the settings I use are provided. This allows the reader to conduct similar tests on their PC in order to make a comparison. For example, in UT2003 I normally play against six skilled bots on DM-Antalus. I also list the in-game graphics settings and enable high quality sound.

Taking that one step futher and jumping online for a few matches on a populated server would be the cats ass IMO.
 
question becomes (I doubt that you would do this Brent) what prevents Kyle from giving your recorded timedemo to his friend at nVidia?

Brent said:
Doomtrooper et al, we do know like everyone else about all the timedemo issues.

In fact Kyle has already asked me what I think about recording our own timedemos and not releasing them to anyone.

Please don't think we are oblivious to what is happening out there, we do stay informed.
 
Doomtrooper said:
Taking that one step futher and jumping online for a few matches on a populated server would be the cats ass IMO.

Sounds like a great idea! Will probably do the same with Wolfenstein Enemy Territory. I'm one heck of a medic :)
 
eVGA e-GeForce FX 5900 Ultra Review

Anyone read the new FX9500U review from Firingsquad? They've again recorded new timedemos and the scores are again changed: FX is beating R9800Pro with a great margin. And yep, this card is higher clocked than the card from last test.

In Serious Sam2 they went from this http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page8.asp to this http://firingsquad.gamers.com/hardware/evga_e-geforce_fx_5900_ultra_review/page8.asp

For example in Quality 1024x768x32 4xAA and 8xAF GeForce scored 161fps, Radeon 9800pro 114fps. It's strange though that in 4 days they have recorded a new demo and have changed the testing methods completely: You can't find SS2 scores with aniso&aa-turned on from the review which was posted few days ago, now you can't find SS2 scores without aniso or aa. Same goes to UT2k3 tests.
 
Re: eVGA e-GeForce FX 5900 Ultra Review

Miksu said:
Anyone read the new FX9500U review from Firingsquad? They've again recorded new timedemos and the scores are again changed: FX is beating R9800Pro with a great margin. And yep, this card is higher clocked than the card from last test.

In Serious Sam2 they went from this http://firingsquad.gamers.com/hardware/msi_geforce_fx5900-td128_review/page8.asp to this http://firingsquad.gamers.com/hardware/evga_e-geforce_fx_5900_ultra_review/page8.asp

For example in Quality 1024x768x32 4xAA and 8xAF GeForce scored 161fps, Radeon 9800pro 114fps. It's strange though that in 4 days they have recorded a new demo and have changed the testing methods completely: You can't find SS2 scores with aniso&aa-turned on from the review which was posted few days ago, now you can't find SS2 scores without aniso or aa. Same goes to UT2k3 tests.

yes it's strange - did I read it correct, NASCAR is now more stressful, but SS2 and Q3A are less stressful?
 
Back
Top