View Full Version : Are synthetic benchmarks relevant?
Reverend
07-Jun-2002, 15:43
They can be useful for specific investigations of 3D technology. If they are used in a "review" of a card in such a manner, how relevant are the results to you, supposing that the reviewer explains what the results could mean, whether you know the reviewer knows what he's doing or not? Especially if the reviewer gives an opinion if the tests can be relevant and duplicated in real games.
I think a synthetic benchmark is still very important for evaluation.
Real application benchmarks are good for determining the current performance of a hardware. However, they generally tell nothing about the potential of a hardware. I don't want to buy a hardware just for playing today's games. I want it to last for one year or longer. So it is still good to know how hard you can push a hardware.
Entropy
07-Jun-2002, 16:05
I would turn it around and propose that application benchmarks are typically useless for everyone who doesn't use the particular application, and even those individuals are very unlikely to use the particular benchmarking settings, but will rather modify parameters until they get the performance they desire.
Synthetic benchmarks expose the underlying basic performance capabilities of a card, which is interesting in and of itself, and which can then be used for predicting application behaviour (insert caveats).
Entropy
Id like one more choice for the poll :
"No less relevant than any other form of benchmark ..."
Joe DeFuria
07-Jun-2002, 16:07
I vote for a 3rd option:
* Yes, according to what I say below. :)
I'm just going to build on what Simon F mentioned in another thread:
Rev, since current games don't seem to be doing much with stencils, you may well be correct. OTOH for future games that start doing shadowing in a serious way, I would think it would be quite relevant.
Synthetic tests are all about trying to gauge how well today's hardware might run future games. As such, well designed "Synthetic" tests are always relevant to me. Problem is, many reviewers take the results of synthetic tests, and extrapolate poorly, or not at all. But I would say there are two types of synthetic tests:
Type 1) Those such as 3D Mark 2001 "high quality" Game tests. These are not actual games, but are designed such that the goal is to use the hardware is ways that actual games would...with the exception of being much more stressful than today's "typical" game scene. I would also classify The UnrealTournament 2003 "torture test" that Anand has used as such a test.
Type 2) Those that target specific sub-systems of a card, such a pixel shaders, vertex shaders, fill rate, poly throughput, z-rejection / occlusion culling, etc. For example, the 3D Mark "non game" tests, village test, etc.
The tests that fall into category 1 will allow us to rather directly extrapolate how today's hardware might run future games. The "drawback" of category 1 "stress" tests, is that we don't always know if future games will progress in complexity in the same way that the torture test is defined. (More stress on polys, fill-rate, per pixel lighting, etc.)
Tests that fall into categroy 2 may help identify potential "bottlenecks" or otherwise strengths / weaknesses in a particular architecture. This can useful if we learn that "future games" will tend to stress one subsystem more than another. It can help give us an idea of what cards would fare better in the future, if we have our own ideas of how future games will increase in complexity. The drawback of category 2 tests, is that it may be the case that stressing individual sub-systems can be misleading...results may differ when stressing multiple sub-systems together. (One card might have both raw higher poly rate and higher fill rate, but lower peak "filled poly" rates relative to another card.)
The most important thing to do when using "synthetic tests" in a review, is to describe what that test does, and what it is "designed to test". It's also important to define how such a test might manifest itself in a game, so that the reader can draw his own conclusions on how much relevance such a test has.
Magnum PI
07-Jun-2002, 16:28
i think synthetic benchmarks can be misleading for the consumer !
and that it can be of some interest but you have to take into account they don't translate into realword parformance
what to think of this review that try to compute some synthetic indicators for graphic cards ?
http://www.digit-life.com/articles/digest3d/index.html
i think it has improved a lot from older versions but it seems very difficult to do such a thing.
I think synthetics have a place and agree to what Rev is saying. Yes you can test all forums of function of a card and see whats its potential which is great to know.
But until a game comes along that actually using that potential, then does it really matter to the average Joe (not you Joe D.) To expose and use the Potential of a card you need software support usually. And we know that for any major function/feature it usually takes time for that to happen. And the rate at which cards are turning out (newer hardware every 6 months) its really hard for that potential to get used before the card is replaced. Agreed not everyone upgrades often, but some do.
To me I think real world benchmarks are more relevant as that's what the gamer will be doing, playing games on it.. Well unless your the typical 3dmark nut then you will be OC, tweaking and doing everything in your power just to get another 200 more points :P
I guess I would like to see a health mix with a bit more emphases on real world results. Trying to predict future performance on a synthetic bench is almost completely useless exercise. You would almost be better off calling a 1-900 Physics hotline. A better the way it to look at the effective memory bandwidth and effective fill rate to see how a card will hold up in future games as for now those always seem to be the limiting factor (note the effective is needed due to memory saving tech or other forums of rendering, differed).
Kristof
07-Jun-2002, 20:29
IMHO they are very valid BUT they require a lot of thinking and knowledge from who-ever is judging the results. Just showing the graphs and not commenting is dangerous since most people will be unable to appreciate the result, with clear comments and explenations the results will be meaningfull and almost always will there be some kind of link with reality.
Knowing the polygon throughput with certain vertex shaders, pixelrate with certain pixel shaders, stencil performance, EMBM performance... is useful and has an impact on real game performance. Synthetic tests allow judging of the subsystems of a graphics chip so you can understand why certain games might perform better or worse on certain hardware. A game uses a scene so varied that its only usefull to get a general impression of the hardware, one troubled feature or driver flaw is hard to identify but a synthetic test will immediately pin point the inbalance in the hardware or driver.
But as said great care has to be taken when judging the results since its very easy to make a wrong conclussion or statements.
K-
Chalnoth
08-Jun-2002, 01:01
I think that synthetic tests like 3DMark2k1 have absolutely no place being shown. Neither 3DMark2000 or 3DMark2001 have shown anything close to what real game benchmarks show. The only thing cool about the 3DMark's is the ability to compare scores online.
Synthetic tests that can be beneficial are ones where we know exactly what is being stressed, for example, high-poly tests, multitexturing tests, and so on. Tests like these, for example, exposed some problems in the Radeon 8500 when more than two textures were in use (I think I had heard that the problems have since been fixed through more optimized drivers...).
Even better, of course, are real game benchmarks, but the very specific synthetic benchmarks can expose problems with hardware or drivers. Of course, for the casual user, such synthetic benchmarks are just confusing. Real game benchmarks are the only ones that should matter to the casual user.
Personally , I dislike using many synthetic benchmarks in my reviews. It's far more important to me, as to how a game performs . That doesn't mean I won't use synthetic benchmarks if they serve a purpose, such as measuring performance under a specific situation, but it isn't a priority to me .
You all know my opinion on most reviews in general. Numbers are one thing, but not everything ;). I'm a little guilty of it myself (take my Ti4200 preview)
Nice demo btw Kristof , it's cute :D .
BTW I voted yes, but am troubled by the choices...
Don't you need synthetic benchmark, when you try to develop your application, just too see what's possible ?
Yes. If nothing else, it shows what the hardware can do when the companies focus their attention on extracting maximum performance without negatively impacting image quality.
Chalnoth
08-Jun-2002, 03:41
Yes, it is most certainly nice to use to expose weaknesses in architectures, so that enthusiasts can campaign for improving certain aspects of 3D rendering, but it doesn't mean a thing to the more casual user...
Tests like these, for example, exposed some problems in the Radeon 8500 when more than two textures were in use (I think I had heard that the problems have since been fixed through more optimized drivers...).
And at the same time they can show some things that look liks bugs but are they? Take the impact ansio has on the fill rate test for the GF4 cards. Looking at just that test you would belive it has issues in games with low aniso perfromance and yet we know thats not the case....
Joe DeFuria
08-Jun-2002, 05:38
think that synthetic tests like 3DMark2k1 have absolutely no place being shown. Neither 3DMark2000 or 3DMark2001 have shown anything close to what real game benchmarks show.
I disagree. The last time I looked at any rankings (and remember, 3DMark rankings are NOT simply FPS rankings, but they also take into account some feature support), I would consider it a fairly good generalization of which cards are "best."
And I could name same "game benchmarks" (like Quake3) that are far less representative of how the MAJORITY of today's games actually run on today's hardare.
No single benchmark is perfect, and any one benchmark is only really good at telling you how that card runs that benchmark. 3D Mark certainly has its uses, but not in isolation.
Tests like these, for example, exposed some problems in the Radeon 8500 when more than two textures were in use (I think I had heard that the problems have since been fixed through more optimized drivers...).
I don't remember that issue...(seriously...link?) maybe you are confusing that with the "GeForce4 can't multitexture when Aniso is enabled" issue that someone noticed when running the tests fill rate tests.
Chalnoth
08-Jun-2002, 06:00
I disagree. The last time I looked at any rankings (and remember, 3DMark rankings are NOT simply FPS rankings, but they also take into account some feature support), I would consider it a fairly good generalization of which cards are "best."
Heh, just looked at the overall rankings over at Madonion, and I noticed two big problems:
1. The Radeon 8500 is significantly above the GF3 Ti 500, and just barely below the GF4's (After looking at some real scores, the 8500 is usually about 1000 points, around 15%, ahead of the GF3...are there any real-game benchmarks where the 8500 wins by that much?).
2. The GF4 Ti 4200 is on top (though this will most likely change in the fairly near future).
Anyway, this is also similar to 3DMark2000. That benchmark showed an incredibly unrealistic advantage to nVidia's video cards.
The other big problem is that 3DMark2001 shows far, far more difference between different processor speeds than real game benchmarks show.
And I could name same "game benchmarks" (like Quake3) that are far less representative of how the MAJORITY of today's games actually run on today's hardare.
Which is why reviewers should try to benchmark as wide a variety of games as possible. Fortunately, some do.
I don't remember that issue...(seriously...link?) maybe you are confusing that with the "GeForce4 can't multitexture when Aniso is enabled" issue that someone noticed when running the tests fill rate tests.
A link is hiding over at Digit-Life somewhere, searching now...
Update:
Found it!
Check out the synthetic performance graphics here:
http://www.digit-life.com/articles/r8500.new/index2.html
Take note of these benchmarks:
Vertex Shader
Vertex Matrix Blending
Pixel Shader
The connection between these? All use combinations of separate data sets that lie in different locations in video memory. nVidia's crossbar memory architecture has a much easier time with these things, though it is concievable that through smarter use of the on-chip caches, the Radeon 8500 could have improved performance in these programs (Which really are technology demonstrations...).
Update2:
If you'd like to look at current performance in those situations, download the DX8 SDK. I'd gladly pit my GF4 Ti 4200 against a Radeon 8500 in these benches...
Sharkfood
08-Jun-2002, 06:23
I believe that in theory, synthetic benchmarks can be very useful and handy to use for extrapolating conditions that are of interest concerning 3d hardware.
That being said, that is all theory and isnt the case today. Unfortunately, performance and conditions in benchmarks have become so specific, tailored, and directly optimized in drivers and hardware that they serve little to no function today.
It is useless to look at X benchmark marks concerning texturing, polygon throughput, p/v shader performance, fillrate scores, etc.etc. as these tests have not only been written to exploit a particular chipset feature that not a single game or application will exert the same effort in implementation, but IHVs have added lines of code to optimize for these specific paths... so extrapolation of synthetics is no longer feasible.
Benchmarks in applications do hold some merit, albeit not normally of much use either. On the surface, it would seem a 55% improvement in Quake3 timedemos might lead one to believe hardware XYZ will perform 55% improved in the game Quake3. Unfortunately, once again, optimizations that are specific to saturated pipelines or corners cut for these specific conditions do not assure running the game in it's natural state will be such improved.
When you build a better mousetrap, nature always builds a better mouse. Thus is the way of things.
I still believe the most value can be received from fairly subjective findings from a trusted source. It's a lot like buying a car- you are less interested in what it's 0-60 is, how many G's it pulls on the skid track and what it's braking distance is (although these are very interesting to read).. it's more how the reviewer describes it's road manners... how nimble it "feels" and how quiet/smooth it handles different roads compared to competing models.
dksuiko
08-Jun-2002, 07:17
The 3DMark series of benchmarks is a DirectX benchmark. That doesn't necessarily mean it's a gaming benchmark, though. Each 3DMark released uses most, if not all, of the features in the DirectX of its time. Games don't. That's the big difference. Then again, it might be said that it's a benchmark for the 'future' games, because developers might use those features then..
But will they? T&L, texture compression (even DX6, I think) and bump mapping were a big part of 3DMark 2000, yet we still don't see those features used much today. 3DMark 2001 has those features, plus pixel and vertex shaders, will we see those used in more games before 3DMark 2002? If not, then 3DMark is not good for guaging todays or even tommorrows gaming performance.
Anyway, my opinion is that it's not a good gaming benchmark, but it's a great DirectX benchmark. That, btw, isn't a really bad thing unless you totally base your buying decision on 3DMark.3DMark is always fun to watch for the first time. :)
Chalnoth
08-Jun-2002, 10:27
If it's just a great DirectX benchmark, then what good is it?
If it's just a great DirectX benchmark, then what good is it?
It's as good as any other 3D graphics benchmark. Nobody can seriously claim that something like Quake 3 is a much more relevant benchmark just because it's using the same engine as the game itself - the timedemo's just run through a fixed sequence of frames just as with 3DMark. Certainly the maps the timedemo's use are the same that are in the game (which isn't the case with 3DMark and Max Payne) but neither take into account AI routines, input, or any networking going on.
All benchmarks are "synthetic" in that they will never truly represent how well that product will work on your own system. It can only give an indication of how good/fast/etc it is in general - so that a "high" score in 3DMark is just acceptable as a "high" frame rate in Q3A. I get average frame rates of 50+ in the Nature test but I don't in Morrowind. I get average frame rates of 150+ in Q3 timedemos but I don't in SOF2.
Ah, just don't get me all started with 3DMark...I get rather defensive about it at times :wink:
Entropy
08-Jun-2002, 13:25
If it's just a great DirectX benchmark, then what good is it?
It's as good as any other 3D graphics benchmark. Nobody can seriously claim that something like Quake 3 is a much more relevant benchmark just because it's using the same engine as the game itself - the timedemo's just run through a fixed sequence of frames just as with 3DMark. Certainly the maps the timedemo's use are the same that are in the game (which isn't the case with 3DMark and Max Payne) but neither take into account AI routines, input, or any networking going on.
All benchmarks are "synthetic" in that they will never truly represent how well that product will work on your own system.
Well put.
The purpose of benchmarking is to gain information.
This discussion isn't new. :) Back before SPEC was formed these issues were endlessly rehashed on comp.benchmarks (this was before PC-users had much access to usenet/internet).
The overwhelming consensus was, and still is, that you should benchmark with the application you are interested in running, using typical settings, and a representative working set of data. The SPEC initiative was grown out of a need to compare machines that you didn't have access to (rather than try to predict how they would perform on unknown problems of future need - this is obviously IMPOSSIBLE.)
Even with this limitation, many rejected the notion of a creating a general benchmark suite as pointless, because they felt that the predictive value of such benchmarks would effectively be too small to be useful.(Ironically, in the end the creation of a vendor independent SPEC suite was
ensured by vendors supporting the initiative.)
Major pros/cons of synthetic benchmarking:
# You know exactly what you are testing.
# It provides basic information as to an architectures capabilities.
# Isolating particular performance areas is artificial and cannot detect problems which arise when subsystems must work together. (This is true to some extent with application benchmarks as well as Neeyik argues above.)
# In order for to gain any insight from the data, you must be knowledgeable enough about the applications you are interested in that you can translate the basic performance numbers into a prediction about application performance. (This is often difficult to do beyond the trivial.)
Application benchmarking has only one potential advantage over synthetics:
# If you benchmark with the application you are interested in, using appropriate settings, working set, and surrounding computer subsystem, the predictive capabilities are better for that particular problem.
# In all other cases you run into transferability problems - how valid is this result for other parts of the same game even, or in actual play as opposed to the benchmarking scenario? The same engine, but used in another game with different polygon/texture loads? Is it useful in predicting behaviour using other graphics engines?
Frankly speaking - for predictive purposes application benchmarking is largely useless, and doesn't have the redeeming quality of synthetics that it provides some insight in the underlying architecture. The one real life benefit is that it provides a feeling for the performance span you might encounter with various programs on the market, if you test a sufficiently large number of them, under conditions that are relevant to an actual user. Good luck. Sounds like an awfully large amount of work for very little benefit, but if someone would like to do this, OK.
Madonion has the right of it. They provide a single figure of merit that keep the kids occupied and happy, and provides a small but hopefully somewhat relevant sample of underlying performance metrics for the geeks to try to make sense of, or at least garner a feeling of superiority from understanding. :) Since it is so well understood, and so widely used, Quake3 is the best application benchmark around. We have a good idea of polygon loads and overdraw, so we can relate performance to the requirements of the application, and there is lots of comparative data around. Since the program spends most of its' time in OpenGL calls you can additionally hypothesize about the maturity of drivers if you see cards that perform poorly at low resolutions but better at high.
Is it possible to do much better, really?
I honestly don't think so.
Entropy
Type 1) Those such as 3D Mark 2001 "high quality" Game tests. These are not actual games, but are designed such that the goal is to use the hardware is ways that actual games would...with the exception of being much more stressful than today's "typical" game scene. I would also classify The UnrealTournament 2003 "torture test" that Anand has used as such a test.
The tests that fall into category 1 will allow us to rather directly extrapolate how today's hardware might run future games. The "drawback" of category 1 "stress" tests, is that we don't always know if future games will progress in complexity in the same way that the torture test is defined. (More stress on polys, fill-rate, per pixel lighting, etc.)
The drawback you mention is pretty important. Future Games will run on future systems with faster CPU, faster FSB, a faster Host memory subsystem and may be even faster AGP. Future games will also run on future 3d engine versions, which may behave differently and show different bottomlecks. That said, every attemt to "simulate" how future games will perform by just stressing a current engine with higher poly count or higher texture load will fail.
Type 2 you mention is the way synthetic benchmarks should go. For all the rest, we need benchmarks of relativly wide used and up to date games (as they will hit the store), as much as possible.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.