fraps Benchmarking

Nathan

Newcomer
Introduction

With the current difficulties inherent in obtaining reliable benchmarks, many reviewers and enthusiasts are starting to use Fraps in order to benchmark games that do not have any benchmarking capability. One method of benchmarking with Fraps is detailed in Reverend's recent review of the GeforceFX 5600 Ultra (Rev2) (edit by Rev -- this review is by Neeyik, not me). However, this method can only be used with a few games. Using Fraps to record a standard gameplay situation provides a wealth of benchmarking opportunities. The problem is, no one knows how repeatable the results are. So let's find out...


Test System

CPU: AMD 2800+ running at 180*11.5
Motherboard: Epox 8RDA+, unified nForce drivers 2.45
Ram: Geil PC3700 256meg x 2
Video card: Gigabyte 9700pro, Cat 3.6
Operating system: Windows XP, service pack 1

Unless otherwise specified the following driver settings were used:
AA: 4x
AF: Quality 8x
Texture Preference: High Quality
Mipmap Detail Level: High Quality
Wait for vertical sync: Default Off
Truform: Always Off


Benchmark

Quake3, point release 1.32
Level: Q3DM4

The following settings were used:
Graphics Settings: High Quality
Screen Resolution: 1152 x 864
Geometric Detail: High
Default Level Bots
Difficulty: Hardcore

Q3DM4 was selected because I like the level, and since I had to play it a lot - I might as well like it. :)


Test Procedure

The actual death match length and the average, min and max frame rates were recorded with Fraps for 1, 2, 4 and 8 minute nominal length death matches. A minimum of 10 death matches were played for each death match length. Fraps was started as soon as the death match started and stopped when the match finished. No attempt was made to visit every part of the map or to produce consistent scores in anyway.


Results

Code:
1 min
4AA
Sample                     Time (s)            Min (fps)           Average (fps)       
1                          63.797              113                 216.232             
2                          61.39               147                 223.342             
3                          61.375              148                 220.806             
4                          61.235              101                 218.747             
5                          61.156              117                 212.963             
6                          61.219              154                 234.649             
7                          60.641              130                 217.229             
8                          61.328              137                 204.963             
9                          62.813              145                 211.58              
10                         61.2973             111                 212.816             
Sample Mean                61.62513            130.3               217.3327            
Sample Standard Deviation  0.939704570655907   18.6252516761519    8.00590182372571    
Max                        63.797              154                 234.649             
Min                        60.641              101                 204.963             
Range                      3.156               53                  29.686              
Confidence Interval (95%)  0.58242386898597    11.5438314134092    4.96201514869886    

11                         60.641              156                 236.721             
12                         61.297              143                 222.8               
13                         60.985              153                 223.136             
14                         61.078              140                 208.307             
15                         61.125              140                 218.159             
16                         62.078              75                  224.507             
17                         61.109              71                  214.289             
18                         61.547              129                 197.296             
19                         61.046              126                 223.765             
20                         60.985              126                 218.201             
21                         60.609              93                  220.775             
22                         61.109              153                 231.111             
23                         61.125              143                 226.47              
24                         61.063              142                 205.967             
25                         60.984              105                 204.233             
26                         61.047              101                 216.357             
27                         60.782              120                 214.685             
28                         61.766              156                 230.045             
29                         61.156              148                 221.09              
30                         61.234              151                 228.925             
Sample Mean                61.3005766666667    129.133333333333    218.6722            
Sample Standard Deviation  0.6367920494667     23.8034093955167    9.19708126639583    
Max                        63.797              156                 236.721             
Min                        60.609              71                  197.296             
Range                      3.188               85                  39.425              
Confidence Interval (95%)  0.227868781953625   8.51777893559193    3.29107077806208    

2 min
4AA
Sample                     Time (s)            Min (fps)           Average (fps)       
1                          121.093             144                 225.933             
2                          121.015             137                 212.882             
3                          122.359             88                  207.425             
4                          122.765             131                 217.423             
5                          121.016             74                  206.518             
6                          121.204             86                  207.187             
7                          121.219             96                  212.804             
8                          121.094             72                  215.089             
9                          121.687             139                 217.697             
10                         121.047             108                 216.7               
Sample Mean                121.4499            107.5               213.9658            
Sample Standard Deviation  0.625364507046685   28.0960653631232    6.0051247521504     
Max                        122.765             144                 225.933             
Min                        121.015             72                  206.518             
Range                      1.75                72                  19.415              
Confidence Interval (95%)  0.387597578105219   17.4137911031454    3.72194421641432    

4 min
4AA
Sample                     Time (s)            Min (fps)           Average (fps)       
1                          241.453             103                 211.892             
2                          241.578             126                 215.098             
3                          241.485             78                  213.532             
4                          241.515             73                  209.601             
5                          240.797             136                 213.827             
6                          241.563             117                 214.565             
7                          240.797             141                 213.258             
8                          241.266             117                 211.003             
9                          241.343             88                  213.389             
10                         240.438             115                 216.247             
Sample Mean                241.2235            109.4               213.2412            
Sample Standard Deviation  0.400535266857954   23.4245834778574    1.9654169927948     
Max                        241.578             141                 216.247             
Min                        240.438             73                  209.601             
Range                      1.14000000000001    68                  6.64600000000002    
Confidence Interval (95%)  0.248249617032201   14.5184316056224    1.21815494450061    

8 min
4AA                                                         
Sample                     Time (s)            Min (fps)           Average (fps)       
1                          480.656             124                 211.059             
2                          481.281             75                  209.757             
3                          480.016             85                  208.313             
4                          481.344             87                  212.968             
5                          480.547             127                 214.732             
6                          481.172             90                  207.726             
7                          480.562             84                  212.958             
8                          481.25              65                  211.929             
9                          480.609             120                 214.823             
10                         481.328             74                  218.456             
Sample Mean                480.8765            93.1                212.2721            
Sample Standard Deviation  0.457161350176893   22.3728506106049    3.26615827506475    
Max                        481.344             127                 218.456             
Min                        480.016             65                  207.726             
Range                      1.32799999999997    62                  10.73               
Confidence Interval (95%)  0.283346160735418   13.8665731973385    2.02434743714822    

4 min
2AA
Sample                     Time (s)            Min (fps)           Average (fps)       
1                          241.063             155                 233.026             
2                          240.968             126                 231.478             
3                          241.125             138                 225.215             
4                          241.234             140                 226.062             
5                          240.625             153                 238.636             
6                          243.797             145                 237.988             
7                          241.281             118                 231.257             
8                          240.907             145                 231.786             
9                          241.282             127                 234.584             
10                         241.016             132                 233.702             
Sample Mean                241.3298            137.9               232.3734            
Sample Standard Deviation  0.889376660622191   12.0963722752824    4.36261005464231    
Max                        243.797             155                 238.636             
Min                        240.625             118                 225.215             
Range                      3.172               37                  13.421              
Confidence Interval (95%)  0.551230899413252   7.49726686584818    2.70392238821223

If you need a statistics refresher, a glossary of statistics terms can be found here:

http://www.cas.lancs.ac.uk/glossary_v1.1/main.html


Analysis 1 - The Effect of Sample Sizes

30 death matches of the 1 minute duration were recorded in order to get an insight into the amount of samples required to get a reasonably accurate 95% confidence interval. Because the confidence interval is based on the sample standard deviation and not the population standard deviation, the accuracy of the confidence interval depends on the immediacy of the sample standard deviation to the population standard deviation. The first series in the graph below shows the 95% sample confidence intervals of the average frame rate for vary amounts of samples. The second series approximates the population confidence intervals for the same sample sizes by using the 30 sample standard deviation for each confidence interval calculation. The general trend is that increasing the number of samples reduces the size of the confidence interval. But the glaring exception is at 5 samples, which produces a fantastically low result. This is merely a statistical anomaly produced by a tight grouping of the first 5 samples. From 10 to 30 samples the sample confidence intervals match the "pseudo" population confidence intervals much more accurately. However, it should be remembered that the population confidence intervals are only an approximation, and the apparent converge of the intervals occurs quicker here than it would with the actual population standard deviations.

The main result that can be gained from this is that 10 samples is the minimum number required to achieve reasonably consistent results.

Graph1.jpg



Analysis 2 - The Effect of Death Match Length

In order to test the effect of death match length, 10 matches were played of 1, 2, 4 and 8 minutes duration. The graph below shows the average of the min, average and max frame rates for each of the death matches lengths. The error bars show the 95% sample confidence intervals for each result.

Graph2.jpg


If we examine the average frame rates first we see that they all have very similar means and confidence intervals. Performing an analysis of variance (ANOVA) confirms that it is likely that population averages for each death match length are the same, as F is considerably lower than F crit. The average frame rates can be considered to be quite accurate, and are certainly statistically significant.

Code:
Anova: Single Factor - Nominal Death Match Length, Average Frame Rate

SUMMARY
Groups              Count       Sum         Average     Variance                          
1 min               10          2173.327    217.3327    64.0944640                        
2 min               10          2139.658    213.9658    36.0615232                        
4 min               10          2132.412    213.2412    3.86286395                        
8 min               10          2122.721    212.2721    10.6677898                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      145.047199  3           48.3490665  1.68630159  0.18721065  2.86626544
Within Groups       1032.17977  36          28.6716602                                    

Total               1177.22696  39

As expected the, longer death matches have closer means and smaller confidence intervals than the 1 minute matches. Though, it does seem that increasing the death match length above 4 minutes will not yield a significant increase in accuracy over either the 2 or 4 minute match lengths.

The min frame rates tell a very different story. There is a general trend towards lower frame rates for the longer death match lengths, which is not surprising given the nature of what is being measured. The longer a match is, the more chances there are for computationally and graphically intensive scenarios to arise. Also of interest are the relatively large sample confidence intervals. Performing an ANOVA on the min frame rates shows that it is unlikely that they have the same population mean. F is considerably larger than F crit in this case. Despite the min frame rate being quite an important statistic, it is difficult to recommend using the min frame rate values obtained from Fraps

Code:
Anova: Single Factor - Nominal Death Match Length, Min Frame Rate

SUMMARY                                                                                   
Groups              Count       Sum         Average     Variance                          
1 min               10          1303        130.3       346.900000                        
2 min               10          1075        107.5       789.388888                        
4 min               10          1094        109.4       548.711111                        
8 min               10          931         93.1        500.544444                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      7042.875    3           2347.625    4.29664106  0.01086496  2.86626544
Within Groups       19669.9     36          546.386111                                    

Total               26712.775   39

Finally, the relatively useless measurement of max frame rates. These are basically the same as the min frame rate measurements, except the general trend is of increasing max frame rates for longer match lengths. Performing an ANOVA shows that it is likely that the measurements have the same population average. However, given the fairly large confidence intervals and the fact that it is not a very useful statistic, there is not much to be gained from using the max frame rate results.

Code:
Anova: Single Factor - Nominal Death Match Length, Max Frame Rate

SUMMARY                                                                                   
Groups              Count       Sum         Average     Variance                          
1 min               10          3179        317.9       458.988888                        
2 min               10          3249        324.9       109.211111                        
4 min               10          3301        330.1       269.655555                        
8 min               10          3340        334         119.111111                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      1455.27500  3           485.091666  2.02762200  0.12734480  2.86626544
Within Groups       8612.69999  36          239.241666                                    

Total               10067.9749  39


Analysis 3 - 2xAA - 4xAA Comparison Test Case

In order to test the legitimacy of the results above, 10 death matches of 4 minutes duration were recorded with 2xAA enabled. Comparing the results with the 4xAA, 4 minute results produced the graph below. Despite concluding that the min and max measurements were not accurate/useful enough, I have included them anyway. Why? Because I can.

Graph3.jpg


Doing yet another ANOVA on the average frame rates shows that it is extremely unlikely that both sets of data have the same mean. The P-value gives the probability of incorrectly rejecting the null hypothesis (The null hypothesis for this test it: average 2xAA = average 4xAA), in this case a very-easy-to-ignore 2.16e-10. We can be quite certain that there is a statistically significant difference between the measured average frame rates.

Code:
Anova: Single Factor - 2xAA/4xAA, Average Frame Rate

SUMMARY
Groups              Count       Sum         Average     Variance                          
2AA                 10          2323.734    232.3734    19.0323664                        
4AA                 10          2132.412    213.2412    3.86286395                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      1830.20538  3           1830.20538  159.876564  2.16-10     4.41386305
Within Groups       206.057073  18          11.4476152                                    

Total               2036.26245  19

[edit]Fixed incorrect Average Frame Rate ANOVA results[/edit]

Checking out the min frame rates now, we see that even they manage to produce some useful results, with a paltry 0.3% chance of having the same average.

Code:
Anova: Single Factor - 2xAA/4xAA, Min Frame Rate

SUMMARY
Groups              Count       Sum         Average     Variance                          
2AA                 10          1379        137.9       146.322222                        
4AA                 10          1094        109.4       548.711111                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      4061.25     1           4061.25     11.6864898  0.00306301  4.41386305
Within Groups       6255.29999  18          347.516666                                    

Total               10316.55    19

Anova: Single Factor - 2xAA/4xAA, Max Frame Rate

Doh! The max frame rates let us down. Or do they? It's quite possible that the max frame rates are more system limited than fill rate limited. Without more information we can only guess.

Code:
Anova: Single Factor - 2xAA/4xAA, Max Frame Rate

SUMMARY
Groups              Count       Sum         Average     Variance                          
2AA                 10          3441        344.1       251.433333                        
4AA                 10          3301        330.1       269.655555                        


ANOVA
Source of Variation SS          df          MS          F           P-value     F crit    
Between Groups      980         1           980         3.76135442  0.06828722  4.41386305
Within Groups       4689.79999  18          260.544444                                    

Total               5669.79999  19

Conclusion

In general, it seems that Fraps works very well as benchmarking program. A fairly minor change from 2xAA to 4xAA produced some very conclusive results for the average frame rates. The statistical significance of the max and min frame rates is dubious and, in my opinion, they should be used with caution - or not at all. The sweet spot for minimizing benchmarking time and maxmizing accuracy is probably around 10 - 20 samples with death match lengths of around 2 - 4 minutes.

The way the results are displayed should also be considered. In general, I think it is prudent to display the calculated confidence intervals with any graphs as they give a very could visual indication of the ANOVA results.

It must also be remembered that only one level in one game was tested. It is probably quite dangerous to extrapolate the results obtained here to other games and levels. More testing is required to get a fuller picture of how games behave when using Fraps. Depending on interest, I may update this with extra levels at a later date. Ideally, we need someone with both a GeforceFX 5900 Ultra and Radeon 9800 Pro to do some testing, then the truth will finally be revealed. :D
 
Interesting...not sure if I understand your methodology, though. Were you running the same pre-recorded timedemo over and over again, or were you simply playing each sample individually according to your time limits?

If you were playing each sample "fresh" then it's not suprising you'd see major deviations the fewer the samples which would progress towards a mean the more samples you took...and it would seem your scores were simply averaging out because of the sheer number of repetitions you exectuted. In a small number of repetitions your individual play would deviate significantly since you'd never be repeating the previous game's play exactly. These differences would naturally tend toward a mean the more samples you add until, inevitably, with enough samples (since you are constrained to the same software, system, and level) you'd hit a mean--which you seemed to do at 30. I'm not sure what this tells us about Fraps, though...;) Now, if you got the major deviations with fraps running pre-recorded timedemos 10 times, then I think you could build a good case for Fraps being totally inaccurate in the absence of averaging out the results over dozens of repetitions. Heh...;) But maybe I misunderstand your point...

Out of curiosity, I'd like to see someone compare Fraps averages to the averages of a game's inbuilt frame counter, such as found in UT2K3. If your hands haven't expired by now out of sheer activity, maybe you could look into it...;)
 
Were you running the same pre-recorded timedemo over and over again, or were you simply playing each sample individually according to your time limits?

I was playing a bog standard deathmatch with a time limit instead of a frag limit.

If you were playing each sample "fresh" then it's not suprising you'd see major deviations the fewer the samples which would progress towards a mean the more samples you took...and it would seem your scores were simply averaging out because of the sheer number of repetitions you exectuted. In a small number of repetitions your individual play would deviate significantly since you'd never be repeating the previous game's play exactly. These differences would naturally tend toward a mean the more samples you add until, inevitably, with enough samples (since you are constrained to the same software, system, and level) you'd hit a mean--which you seemed to do at 30. I'm not sure what this tells us about Fraps, though... Now, if you got the major deviations with fraps running pre-recorded timedemos 10 times, then I think you could build a good case for Fraps being totally inaccurate in the absence of averaging out the results over dozens of repetitions. Heh... But maybe I misunderstand your point...

The point was to find out what sort of accuracy could be obtained by using Fraps. For instance the comparison of 2xAA and 4xAA shows an overwhelming probablilty that the means of the average framerates are different. If the standard deviation was too high then it would be difficult to conclude whether the result were significant of not, which is the case for the maximum frame rates.
Clearly an infinite number of sample will provide the highest accuracy. Though in the real world, it seems that 10 samples may well be good enough. Quite a reduction, if I do say so myself. ;)

Out of curiosity, I'd like to see someone compare Fraps averages to the averages of a game's inbuilt frame counter, such as found in UT2K3. If your hands haven't expired by now out of sheer activity, maybe you could look into it...

Sheesh, your a slave driver, Walt. Unfortunately, I don't have a copy of UT2003.[/quote]
 
so, let me try and understand the results of your fraps experiment here. What you are saying is that playing a game - any game on the same map or taking the same routes or whatever - about 10 times using FRAPS framerate averages, you can be fairly accurate in your average framerate for that game. If I play Vice City for example and leave my hideout, drive around roughly the same area of the city causing various random bits of mayhem then restart and repeat about 10 times - that would be a good indication of framerate for that game and that if I take roughly the same ingame route on various systems, I can get perfectly comparable results ?
 
Ratchet said:
so, let me try and understand the results of your fraps experiment here. What you are saying is that playing a game - any game on the same map or taking the same routes or whatever - about 10 times using FRAPS framerate averages, you can be fairly accurate in your average framerate for that game. If I play Vice City for example and leave my hideout, drive around roughly the same area of the city causing various random bits of mayhem then restart and repeat about 10 times - that would be a good indication of framerate for that game and that if I take roughly the same ingame route on various systems, I can get perfectly comparable results ?

That's pretty much it. It's a little bit more complicated than that though. Once you have your 10 results, you should find the mean and the standard deviation. This will allow you to calculate a confidence interval. A confidence interval basically gives the range of values that the POPULATION mean (ie, the average framerate obtained from driving round for an infinitely long time) will be in. For example, a 95% confidence interval of 3fps and a mean of 100fps means that there is a 95% chance that the population mean wll be between 97fps (100-3) and 103fps (100+3).

This explains why we have to do multiple runs, and not one really long run. The more sample sizes you get the more accurate the value for the standard deviation gets. It's actually impossible to get a standard deviation for a single run, so there is no way of telling how close the result is to the population mean.

A problem also occurs if you try to do lots and lots of really short runs. In this case, its possible to get a large variation between runs due to variations in scene complexity, etc. This causes the standard deviation to be larger which increases the value of the confidence interval.

There is an optimum point somewhere in the middle, using a reasonable amount of samples of a reasonable length. For Quake3 on map Q3DM4 this is about 10-20 samples of 2-4 minutes.
 
I don't understand the obsession with using 8x AF on ATI cards when benchmarking. I think we could probably count on one hand the number of gamers who use a setting besides 16x AF or application AF.

Certainly an interesting benchmarking take though. An advantage to the method not stated in your conclusion is not only does the reviewer collect FPS information, but they also get a sample of how the game looks and more importantly, feels while using card X.
 
Quitch said:
I don't understand the obsession with using 8x AF on ATI cards when benchmarking. I think we could probably count on one hand the number of gamers who use a setting besides 16x AF or application AF.
How about all those gamers who have NVIDIA cards then?
 
The problem is when people are reducing ATI cards at 8 AF because Nvidia cards can't do more, then say let's compare the best IQ.
 
Quitch said:
I don't understand the obsession with using 8x AF on ATI cards when benchmarking. I think we could probably count on one hand the number of gamers who use a setting besides 16x AF or application AF.

Um, that's not really the point. I could have run it at any settings at all, as long as they are consistently applied. The key to note is that when a fairly minor setting is changed (ie 4xAA to 2xAA), that there is a statistically significant change in the mean frame rate. I could have changed the resolution, AF setting, or even an application setting - I chose to change AA.

This is NOT a video card test. It is testing the usefulness of Fraps as a bench marking tool. I did it because Fraps is becoming quite widely used, yet people don't seem to understand exactly what it is they are measuring with Fraps. Have you ever seen a review that even mentions the standard deviations of the results they obtained. Admittedly, most readers couldn't care less about confidence intervals - that's fine. However, the reviewer should have a good idea of the accuracy of the data they present. For highly repeatable bench marking utilities like 3DMark, doing a couple runs then averaging them is good enough. But if you use Fraps to record the frame rates of a single deathmatch then present the information as a consistent result, you are not providing an accurate picture of the data you have obtained.
 
Nathan said:
This is NOT a video card test. It is testing the usefulness of Fraps as a bench marking tool. I did it because Fraps is becoming quite widely used, yet people don't seem to understand exactly what it is they are measuring with Fraps. Have you ever seen a review that even mentions the standard deviations of the results they obtained. Admittedly, most readers couldn't care less about confidence intervals - that's fine. However, the reviewer should have a good idea of the accuracy of the data they present. For highly repeatable bench marking utilities like 3DMark, doing a couple runs then averaging them is good enough. But if you use Fraps to record the frame rates of a single deathmatch then present the information as a consistent result, you are not providing an accurate picture of the data you have obtained.

Wipes a tear from my eye.
There is still hope.

Entropy
 
Nathan said:
I was playing a bog standard deathmatch with a time limit instead of a frag limit.

....

Sheesh, your a slave driver, Walt. Unfortunately, I don't have a copy of UT2003.

Heh...;) Sorry about that--but I thought I'd ask...;)

I guess my only point here is that the fact that your samples are not identical would tend to undercut your statistical conclusions about Fraps, that's all. By "not identical" I mean that you aren't doing exactly the same thing in each compared sample. It may be similar but it can never be exact. The fact that each repetition isn't an exact duplicate of the one that came before means that the frame rates will be different between repetitions simply because you aren't doing exact repetitions.

My opinion would be that if you used a pre-recorded timedemo for every sample with Fraps (which you could construct to meet your time limits), that you'd hit a statistical mean in frame rates much faster than you would using your current method. You might see that you could hit a norm in 2-3 samples as opposed to 10, or certainly 30.

But perhaps I'm not looking at it as you intended...;) That's what I'm interested in you explaining. (Not trying to be picky...)
 
Out of curiosity, I'd like to see someone compare Fraps averages to the averages of a game's inbuilt frame counter, such as found in UT2K3. If your hands haven't expired by now out of sheer activity, maybe you could look into it...
I've often had FRAPS running while using an in-game framerate counter (QIII, UT2K3, 3dmark, Tenebrae, Elite ForceII) and they are pretty much identical - they update at slightly different intervals, so the counters are w/in 1 fps of each other.
I guess my only point here is that the fact that your samples are not identical would tend to undercut your statistical conclusions about Fraps, that's all
no, that's not correct. The whole point of needing statistical analysis in the first place is because the samples are not identical.
What Nathan has done (very nice work, btw :)), is show the viability of using FRAPS to benchmark games that do not have any recorded timedemo (or those whose recorded timedemos have become suspect due to their being targets for excessive "optimisations") - just fire up a set of standard gameplay conditions, and then perform some stats analysis to determine confidence intervals. Sure, results will not be completely repeatable, but they will give valid benchmarks w/ a high rate of confidence.

The real problem is, of course, the large amount of work this would mean for a reviewer. Nevertheless, it is a viable possibility for certain less-benchmarked games.
 
WaltC said:
My opinion would be that if you used a pre-recorded timedemo for every sample with Fraps (which you could construct to meet your time limits), that you'd hit a statistical mean in frame rates much faster than you would using your current method. You might see that you could hit a norm in 2-3 samples as opposed to 10, or certainly 30.

I agree completely. The method used by Neeyik in his latest review (sorry about giving credit to Rev instead you, Neeyik :oops: ) does just that and is a very well done.
 
Nathan said:
Quitch said:
I don't understand the obsession with using 8x AF on ATI cards when benchmarking. I think we could probably count on one hand the number of gamers who use a setting besides 16x AF or application AF.

Um, that's not really the point. I could have run it at any settings at all, as long as they are consistently applied. The key to note is that when a fairly minor setting is changed (ie 4xAA to 2xAA), that there is a statistically significant change in the mean frame rate. I could have changed the resolution, AF setting, or even an application setting - I chose to change AA.

This is NOT a video card test. It is testing the usefulness of Fraps as a bench marking tool. I did it because Fraps is becoming quite widely used, yet people don't seem to understand exactly what it is they are measuring with Fraps. Have you ever seen a review that even mentions the standard deviations of the results they obtained. Admittedly, most readers couldn't care less about confidence intervals - that's fine. However, the reviewer should have a good idea of the accuracy of the data they present. For highly repeatable bench marking utilities like 3DMark, doing a couple runs then averaging them is good enough. But if you use Fraps to record the frame rates of a single deathmatch then present the information as a consistent result, you are not providing an accurate picture of the data you have obtained.

You are right of course, my apologies. Just bringing a grudge into this topic I guess :)
 
WaltC said:
My opinion would be that if you used a pre-recorded timedemo for every sample with Fraps (which you could construct to meet your time limits), that you'd hit a statistical mean in frame rates much faster than you would using your current method.

Of course we know that there is negligible variation with recorded timedemos compared to random scenes. Which is exactly why we need statistical techniques to find out how we can arrive at reliable results using random scenes.

There are some good reasons for using bot deathmatches (or randomised flybys) instead of recorded timedemos. First, they are immune to the so much publicised "on-rail" cheats found in 3dmark03 and possibly in some widely used, popular benchmarking timedemos for actual games. Secondly, they might be more representative samples of graphics card performance because they might expose the card (and driver) to a wider selection of scenes to render than a short timedemo that is always identical.
 
Has anyone made a functional yet remarkably stupid bot that takes neglegible amounts of CPU power?
 
WaltC said:
Out of curiosity, I'd like to see someone compare Fraps averages to the averages of a game's inbuilt frame counter, such as found in UT2K3.

I did that ages ago, and psoted the results on here I think. Need to search, they were close enough for me :)

BTW nice detailed work Nathan.

edit:

yep here it is http://www.beyond3d.com/forum/viewtopic.php?t=3056&postdays=0&postorder=asc&start=80 note average is very close, min and max arent though. I basically started & stopped FRAPs just after the start and just before the end of the benhcmark.
 
Back
Top