new way of benchmark results exposition possible?

PP

Newcomer
I'm preparing a product review for my site (in spanish), and I was thinking in a new way of exposing benchmark results. We have averages, max and min values. IMO, max and minimun FPS that some sites show in their reviews are completely unuseful. Graphs of FPS logs like HardOCP of PCPers don't give you precise data, just some headaches occasionally when you try to find out what FPS averages are more consistent.

I did some investigation with excel and a fraps fps log, and made this (some figures aren't real, because I only have a 6800 gt now):

gpumania31fc.png


top chart is an FPS average, bottom chart is a "new" (at least I have'nt seen before) FPS fall average below a given FPS point (<60, <40, etc). This is a method to expose how solid FPS are, do you think I should use it?
 
Last edited by a moderator:
Well, fraps gives a FPS figure every second, right? All values that fall below X are taken (FPS value minus X, so result is negative) and an average is made with those.
 
Last edited by a moderator:
PP said:
Well, fraps gives a FPS figure every second, right? All values that fall below X are taken (FPS value minus X, so result is negative) and an average is made with those.

Sounds mathematically incorrect.
The average of an average is nothing in maths, except if there are as many samples in each average being 'averaged'.

A much more correct thing to do would be standard deviation, however that only tells about dispersal around the average framerate, so might be much usefull to see how 'stable' a framerate is.

Well at least you can correct your average formula to take into account the number of samples per 'average'. (ie compute it with all the values instead of FRAPS averages)
 
Now that I'm thinking about it, why not show :
-min FPS (bigger is better)
-max FPS (bigger is better)
-average FPS (bigger is better)
-standard deviation (smaller is better)

on the same graphic.
With a good scale is should look fine, that means 4 values per column, but that should be very informative.
 
IMO max FPS means nothing for a gamer, and it's not statistical data, it's just a curious data. Same for min. Average framerate itself isn't valid for another reason: averages can be made of high and lows FPS, so we can't predict how stable are the framerates. I speak from a gamer standpoint, of course, that's the reason because I think the average frame rate fall helps more than anything.

The problem is that I can't compute with all values instead Fraps averages because I need a fixed input (60, 120, 1000 samples, etc) to calculate the whole thing. Even if my method isn't completely accurate I think is valid because possible deviation will be the same for all cards, isn't it?
 
PP said:
IMO max FPS means nothing for a gamer, and it's not statistical data, it's just a curious data. Same for min.
Well MinFPS is clearly a meaningfull data since it tells you how bad it goes at worst.
MaxFPS is its natural counter part, although you might not need it, I would put it so that you can see if your average is closer to the worse or the best.

PP said:
Average framerate itself isn't valid for another reason: averages can be made of high and lows FPS, so we can't predict how stable are the framerates. I speak from a gamer standpoint, of course, that's the reason because I think the average frame rate fall helps more than anything.
Well the average framerate gives you an idea of how well it goes between min and max, if it's closer to min than to max it's not too good, because it means you have more small values than high.
About framerate stability, it's EXACTLY what standard deviation tells you, it tells you how much it deviates from average framerate on average.

I disagree while framerate fall is an interesting idea, you choose the treshold which is not universal, some user will ask for 30FPS, others for 60FPS...
It tells how often things went bad (under treshold).

PP said:
The problem is that I can't compute with all values instead Fraps averages because I need a fixed input (60, 120, 1000 samples, etc) to calculate the whole thing. Even if my method isn't completely accurate I think is valid because possible deviation will be the same for all cards, isn't it?

Would be better to have frame 'duration' or frame date for a number of samples, and then being able to work on that data, that would be 'pure' data, as opposed to already 'refined' data that FRAPS is giving you.

Watching the 4th column carefully I wonder how you got 15FPS framerate fall for the card running @ 30FPS on average with a treshold set @ 60FPS.

----------

The boxplot is basically what I was thinking (although I was unaware it existed), except it miss the standard deviation which tells you how steady your framerate is.
 
Well the average framerate gives you an idea of how well it goes between min and max, if it's closer to min than to max it's not too good, because it means you have more small values than high.
About framerate stability, it's EXACTLY what standard deviation tells you, it tells you how much it deviates from average framerate on average.

ok, I see you point. I gree that stardad deviation is another useful way of extrapolate figures. But, tell me if I'am wrong because maths are not my strong point. You are taking into account FPS on average and under average, right? My method only takes into account FPS under a given point (that couldn't be average because others card result in the same resolution will have its own averages) just because the interesting thing is what falls under, not everything. Do I explained it correctly? (sorry for my english)

The treshold could be a problem, but most people will agree that under high end graphic cards should be <60, except in some resolutions or games with low FPS average, then <40 or <24 could be useful. That could be adapted to other situations, except in a large comparison chart where card from all segments are mixed.
 
Ingenu said:
The boxplot is basically what I was thinking (although I was unaware it existed), except it miss the standard deviation which tells you how steady your framerate is.
The inter-quartile range (the 'box') serves a similar purpose, though it isn't quite as descriptive.
 
IMO, min and max fps are only useful information when accompanying an average framerate. This allows the user at least some extra information to theorize how wide the framerate variance may be (best + worst) attributed towards the average. This, in itself though, can also be misleading.

In all actuality, full timescale histograms would better depict how a videocard truly performs. Many benchmark "averages" (even if accompanied by min/max fps) may give artificially suggestive results if a particular driver revision runs 80% of the benchmark like dirt, but then spins at insanely high peak framerates for the final 20%.

"Peaky" or "bursty" framerates versus more consistent, solid framerates really don't buy most consumers what they are looking for, so I think it's a bold (and much needed) step to start at least discussing this need for hardware reviews. Even if only one or two sources adopt this, it would be great cross-reference material to compare with other reviews that simply post (possibly misleading) average framerates.
 
PP said:
Well, fraps gives a FPS figure every second, right? All values that fall below X are taken (FPS value minus X, so result is negative) and an average is made with those.
So if X is 60, and you get something like 61, 61, 61, ..., 61, 40, 61, 61, ..., you would get -20 because 40 is the only value below 60?
Now imagine something like 59, 63, 59, 63, ..., 40.

Why not have a probability curve with average, median and standard deviation?
 
PP, take a careful look at the Boxplot and explanation Fodder linked, and think up some ways to graphically present it. I think you will find it efficiently conveys the information you are trying to get across, in a graphically expressive way.
 
My first inclination was Ingenu's idea.

The boxplot is an intriguing compromise. Accomodating outliers is an especially nice touch, as I'm never sure if min framerates are isolated instances or indicative of repeated lows.

I also thought Halo's built-in benchmark reporter, which basically spat out histograms, was a good idea. Similarly, you could change your graph to show the percentage of total frames that fall below a set framerate (rather than their average drop). It's not as detailed as a histogram, but you're looking for a compromise b/w detail and simplicity (both in presentation and in screen space).

HOCP's graphs are probably the most obvious, but they're also (obviously) cumbersome in that you're restricted to a single res per graph (and thus either more screen space or fewer tests).

I wonder if B3D's fillrate charts could accomodate at least the box part of the boxplot without becoming illegible.
 
Last edited by a moderator:
Max/min figures, even when quoted with average values, are fairly meaningless unless repeated consistently over multiple benchtests (with the testing procedure fully disclosed) - otherwise, one is left to judge whether the max/min are simply testing glitches producing anomalous results. Therefore, it would seem logical to incorporate histograms or boxplots in reviews.

However, I would say that sometimes there can be a case where too much information is a bad thing - there are so many variables behind the benchmark results one sees that the results cannot be taken as anything more than an overview of the product's performance; if one attempts to explore every variable and account for it within a review, then one is bound to reduce the reader's interest level to the point where they will go somewhere else.

Timedemos are not truly indicative of real-time gameplay anyway so the benchmark results should not be used to give impressions as to how a product will produce actual max/min/SD results.
 
I think what's most important is what you want to reveal in your benchmarks. Which means you need to be careful about the kind of demo used in a benchmark.
 
Neeyik said:
Timedemos are not truly indicative of real-time gameplay anyway so the benchmark results should not be used to give impressions as to how a product will produce actual max/min/SD results.

Exactly; and as you mentioned min fps are usually due to some unpredictable disk page access or a spike in CPU utilisation by some rogue Windows service.
 
Time series are nice as they give all the info, but it could be too much info.

If there is no dependence between successive framerates then an empirical probability distribution (histogram) gives the same info.

But there is dependence in the time series....maybe autocorrelation plot would be intersting just to check if any noticable difference in cards (although likely not).

Interesting would be if you computed the empirical probability distribution and then averaged a loss function over it. For example suppose anything greater than 50 is worth 100, 40-50 worth 60, etc.

Of course problem might be that everyone has their own loss function (but could use something reasonably generic that most would agree with and go ahead with it).

But if you like you could get the empirical probability distributions and then have a web application that asks the viewer to provide a simple personal loss function. Then you could produce personalized card comparisons on the fly!
 
Back
Top