What sort of information do you seek in a game benchmark?

Reverend

Banned
I know what I would like to see from the POV of a hardware reviewer and programmer but that isn't necessarily the same as "the public" who seeks information when reading a hardware (mainly video card) review... after my years of experience as a hardware reviewer, I am no longer sure what the public "wants" as these "wants" appear to be ever-higher in terms of amount of information as time goes by (Beyond3D, of course, is a reason for this :) ).

So, besides the usual stuff like :

- Average framerate (during a "timedemo" of a, well, demo)
- Minimum framerate (same)
- Maximum framerate (same) <-- pretty useless this one, yes?

... what other stuff would you, the Beyond3D crowd specifically, like to see in a game benchmark? Go on... you can say anything you want... like texture details (amount of textures, in average, min, max over the course of the timedemo... or tex usage over time... or whatever), polygons stats, index buffers (since a game may have many objects), SW-vs-HW vertex processing, etc. etc. (so many things!!!).

I like to bug the hell out of (many) developers about these kinda stuff (and can pay a price for doing so)... and I will most probably continue to do so as more and more development houses sprout all over the game development industry. I already have the list of stuff that I would like to see reported in a game benchmark (instead of a single AVG framerate thing) but I'm not sure my "wants" are sufficient, or the same as you guys, who undoubtably have higher needs for information from a game benchmark than most "regular" hardware-review websites that I know.

PS. I have suggested, and currently still am suggesting, stuff to include in an eventual "patch" release for a newish-DX8.1 game by a certain well-known developer that will incorporate a benchmark mode in this upcoming patch (not available officially, and publicly, before this upcoming patch). Some, not all (because it can't be done with the release of this existing game due to either engine-limitations or priorities re the developer's next game), of my suggestions will be included in this forthcoming patch. This is the reason and basis for this post of mine, and I will alert this particular developer (and many others) to this thread.
 
A histogram that charts not only framerate, but texture load(mem), vertices, texture passes(average, low and peak would be nice), and shader ops would be very nice IMO. If we could also get some CPU useage stats thrown in sampled at the same rate I think it would be near perfect.

That would allow us to see not only which particular things a card has problems with, but also which combination of rasterizing effects a board does or does not have trouble with.
 
Speaking as part of the public: A framerate graph.

As datapoints maximum framerate is useless, average framerate gives consistent relative values, and that is useful, but the number itself is less interesting. Minimum framerate is interesting, but is prone to artifacting and can be caused by atypical circumstances.

A framerate graph solves this.

The reason is the classical, everyone who has played an action game has run along smoothly only to wind up in the thick of battle where the framerate has gotten choppy at the very movement where you have to dodge and spin around to take in and out enemy. Nobody is interested in what framerates you get when running along in a corridor, nor their contribution to the average. The problem has been adressed by making demos that are basically all thick fighting, and use these for benchmarking. Useful, but primitive.

And as part of the public, a framerate graph is all I want.

BenSkywalker is more ambitious, but I feel that there are differences between the gaming public and the ultrageeks, in that the public is primarily interested in the results, and not the underlying mechanisms.

Entropy
 
Hmm...reminds me of a UT2k3 discussion, and so does my guess for what game you are talking about...

Anyways:

With frame based rendering, a time to completion of the rendering workload seems a suitable data point, that could be related to figures on triangle count sent and shader instruction counts (both vertex and fragment) in use during that time. I.e., more CPU style metrics where they make sense.

Image quality tests would be neat too, if some way to express deviation from expected results could be done efficiently, but I may be dreaming. Being able to specify a frame to snapshot for a frame based rendering run might be sufficient.

Accurately representing the proportion of time CPU game calculation, Sound, and Graphics spent each frame would be nice...and maybe some thought to expressing "idle" time (with bus transfer time isolated if possible).
 
Frame time logging would give us everything we needed when coupled with even rudementary analysis tools, like Excel.
 
RussSchultz said:
Frame time logging would give us everything we needed when coupled with even rudementary analysis tools, like Excel.
Sure, but I thought the question was related to ways to present the data, and specifying which additional data for the frame could be useful, not just the base functionality...

Since my guess for the application he is mentioning is Splinter Cell, and that is based on the Unreal engine, and UT 2k3 has what you mention, it didn't seem important to me to emphasize it since it would both have to be done to do what I said in any case, and seems to be easy to implement in the base engine. Of course, my latter assumption could be wrong depending on how my guess fares.
 
Would this game happen to be Splinter Cell?

Anyhow, Minimum, Maximum, and average of both would be good.

Have each frame in the test numbered so that the problems of comparing IQ will be gone...and the ability to go back to each frame at will be implimented.
 
I'll echo the frame rate graph request. With one graph we can see the max, min, and avg frame rate.
 
An average frame rate, with a fps cap (set to the required quality), of different timedemo of the same game.
 
If they don't want to save the data for a frame rate graph, the standard deviation gives the same sort of info.
 
As other people have mentioned, histograms, standard deviations, and standard errors would all be extremely helpful. Some method for comparing image quality to a control image would be excellent too.

A bit more unlikely, I'd like to know for each frame how much bandwidth is being used, how many triangles are generated, what shaders are being used and how quickly they are being executed, and any other information that could help explain the current framerate.

Nite_Hawk
 
If there is a benchmark demo the ability to specify a point in the demo at which to take a screenshot and have that be repeatable at different frame rates would be nice.
 
Id like to see image comparisons where the reader doesn't know from which tested card it's from and votes for what he/she feels is best.
There should be no option to preview the results until the vote has either reached a certain number or ends on a certain date.
 
I think the histogram idea would be the most useful from a gameplay perspective. Even though this would be simple to derive from the same data used to generate a frame rate graph, I think it would more clearly convey the gameplay quality of a given setup.

The problem with graphs is that they are often difficult to interpret at a glance. What you really want to know is, how smooth will my gameplay be on average? Will it be smooth most of the time? Will it be choppy occasionally? Will it stutter?

My suggestion is as follows - generate a histogram with the following four buckets:

Ideal - fps >= display refresh
Good - fps >= 30 fps (perhaps up this to 60 fps for fast shooters)
Acceptable - fps >= 10-25 fps
Unacceptable - fps < acceptable threshold

Now for a given period of time (e.g. 60 seconds), you determine what percentage of the period is spent in each bucket. For example, if you took 30ms to render every frame during that minute, you would have 0% for ideal, 100% for good, and 0% for acceptable/unacceptable. On the other hand, if you had 50 frames with 60ms render times and 2 frames with 1000ms render times during the minute, then you would be spending 50x60ms = 3 sec in the acceptable category and 2x1000ms = 2 sec in the unacceptable category. Your histogram would then show 0% ideal, 92% good, 5% acceptable, and 3% unacceptable. Note that this method weights the percentages of longer update times proportionally higher than a standard histogram method, which I think makes sense because longer delays are that much more obvious during gameplay.

This simple chart would give you a lot of information on the playability of a game at a quick glance. Presumably you'd want to spend most of your time in the good/ideal range, with occasional drops into the acceptable range, and no time in the unacceptable category. This method could be applied over the length of a time demo, or displayed as a moving average during actual gameplay. If the bars aren't where you want them, you could adjust your game settings until you get something you are happy with. Slightly more sophisticated users could play around with the thresholds to match their own preferences.
 
yes i agree, the serious sam are histogram's rule, asside from the fact that i never cared for ss. also the performace in it doest seem to match up well with quake and unreal engine games which i do tend to stick to, so i perfer to see the information on those. i recomend droping rtcw in favor of q3 as the latter is less cpu limited and therefor a better benchmark videocards imho.

other than that i say just give me every damn benchmark you can, the more popular the games the better but don't be afraid to throw in less popular ones along with good reasons for useing them. when i look at bechmarks i try to take into account that i never know how one game will preform based on the numbers for another. hence, i look at all the benches i can find that suit the settings i would use, give them all more or less equal weight and combine them to formulate my openion of where each card stands.


also this:
Acceptable - fps >= 10-25 fps

hell no! droping into the 20s durning real-time scripted sequences such as when the reactor blows in half-life, that is ok. if the framerate drops below 30fps while i am trying to acutaly play the game, it is a huge distraction and simply destroys the any chance of "fun" and therefore it stops being a game.
 
THe_KELRaTH said:
Id like to see image comparisons where the reader doesn't know from which tested card it's from and votes for what he/she feels is best.

with screenshots - pointless iro texture shimmering.

I would like framerate graphs ala SS and reviewers opionion of IQ v playable game settings i.e.

Y Game 4xAA/16xAF is smooth, no hud, menu issues with AA, texture clarity, no obvious slowdowns, fps spikes in one game

X Game - Only 2xAA is playable, mip-mapas are odd, with high LOD bias that MSAA and 16xAF cannot sort so a SS solution would be better

That kind of thing.
 
Standard deviation / graph as well here.

Also, I would prefer benchmarks where each run executes the same data, not try to maintain a certain frame rate, if it's slow motion I don't care, do it frame by frame. At least as an option, I realise more people want to get more of an actual game experience measured.
 
Back
Top