[H] 3D Testing Methodology Discussion

Discussion in '3D Hardware, Software & Output Devices' started by Arty, Jan 28, 2008.

Thread Status:
Not open for further replies.
  1. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    I like Hardocp's review style because I can quickly get a general idea of the settings I can run with my computer in various games. Their "optimal" settings have been pretty accurate for me. I have to admit, no other site provides me with that service and I am very appreciative of Hardocp's reviews. Believe it or not, I don't even view Hardocp's reviews as reviews; I view them more as a source of information. Many of you have criticized their conclusions, but to be honest their conclusions are probably the least valuable part of their reviews. I think the point is that you are supposed to make your own conclusions from the information that Hardocp provides (I view their conclusions as something extra to the reviews).

    But it is true that not everyone can use the information that Kyle/Brent provide. Some people have LCD's with a set resolution, some don't care about AA, etc, and those people have different "real world" scenario's than Kyle/Brent. However you can't fault Kyle/Brent for that. They make it very clear what information they are providing (and if that information doesn't apply to you, go to another site). Some of you believe this information is meant for the whole general public; it's clearly not. If their review style doesn't apply to you, simply stop visiting their site.
     
  2. Skrying

    Skrying S K R Y I N G
    Veteran

    Joined:
    Jul 8, 2005
    Messages:
    4,815
    Likes Received:
    61
    Clearly not for everyone? I'm sorry but they're the ones who constantly slam every other review sites method... Maybe you should tell them that.
     
  3. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    It's their opinion and they are entitled to it. Doesn't mean I agree with them, nor does it make their information any less worthy to me. It seems to me people have more of a problem with their conclusions than their information. I believe the thread's title is [H] 3D Testing Methodology Discussion not [H] 3D Conclusion Discussion.
     
  4. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    I have a feeling--though I could be wrong--that Kyle and Brent might take exception to your saying that their reviews aren't "meant for the whole general public"...;) I mean, I don't see any disclaimers there which would lead me to believe that they aren't, like: "This review is only for people with 30" LCD monitors" or "This review is meant for people who never drop below 1280x1024" or "This review is meant expressly for people who don't care about FSAA or AF or setting their textures to maximum, " etc., ad infinitum. I mean, if all of that was true then [H] couldn't very well use the term "real world" with a straight face, could it?...;)

    As I mentioned in my earlier post, the things I talk about are by no means limited to [H], so this isn't to single out [H] exclusively. What I wanted to do was to bring up some of what I think are serious shortcomings in the *general crop* of 3d reviews I've read of late. I think a sort of tunnel vision about "how things should be done" is afflicting this entire industry, and it's costing the industry at large something very valuable, and that is "perspective." I think in the interests of time that a lot of corners are being cut and that the makers of 3d-cards, monitors, and game developers in general are getting the short end of the stick--not to mention of course "the general public." 3d-gaming offers a lot of cool stuff that I lament I'm not seeing mention of in today's typical 3d-card review.
     
  5. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    I think that an argument can be construed for either approach. If you have a long enough playthrough, even if you have the inherent variance associated with playing it, the length of the run will give enough data-points in order to neutralize the outliers like extreme lows or extreme highs, and there should be a trend, a common pattern, a general value around which the framerate will vary around.

    The trouble is that it takes a buttload of time, and it forces you to invent a reason for not spending your entire lifetime testing vid-cards-see the supposed need of finding the best possible playable settings. This is a noble endeavour...for your mentally challenged reader-base who couldn't figure out by themselves what is playable FOR themselves and thus have to use what is playable for either Kyle or Brent. For the others, I think they can handle the task of figuring out what they're comfortable with in terms of settings by themselves.

    You also lose some parts of the picture that may or may not be relevant, depending on ones interest WRT the architecture:scaling with a number of settings like nr of AF samples, AA samples, resolution and whatnot. On the flipside, you get to know how a particular card will play a particular game in reality. A timedemo might lose the AI aspect of gameplay, or some other computations in the background. MIGHT is the key word here.

    OTOH, with timedemos you can investigate a pile of things quite efficiently in terms of consumed time and repeatability.

    The [H] obviously has its place in the panoply of review sites...but it's a part of a whole, not the be-all-end-all of benchmarking, as some would have you believe. And you should congratulate Kyle on how he generated controversy with the article thus generating hits and traffic for the [H]. Or does somebody think that's accidental?:)
     
  6. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    why aren't you testing on a more modern PC? dual-core with 2GB of RAM seems a bit mundane when comparing $450 cards at this point.
     
  7. Brent

    Regular

    Joined:
    Apr 11, 2002
    Messages:
    584
    Likes Received:
    4
    Location:
    Irving, TX
    We use the highest playable setting, which means it is the highest playable setting (resolution, aa/af, in-game settings) with every consideration throughout the game. For example in Crysis, we test from the beginning beach maps, to assault crysis harbor map, to the ice maps, to the final boss scene. We find the setting that is playable in all of this, and that's what is reported in the table. The graphed framerate over time is completely secondary, it is simply there as a visual aid to backup our experience.
     
  8. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    I'm not as convinced (I have seen on [H]'s forums time and time again Kyle telling people who don't want to use their information to go to another site), but something you should definitely take up with Kyle/Brent if it's that imperative to you.
     
  9. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    It's ironic that if they'd included a 3870 xfire as a control in the test this whole controversy probably wouldn't of happened. If the 3870 backed up what they were saying then noone would be able to doubt their testing methods. I like the testing methods, it actually gives different information than other review sites. You can read half a dozen reviews, but theres only one like the OCP.

    It's funny, Brent and Kyle come here to defend their own review methods! Wow, thats personal!

    AMD hacks, you can give up now! We've discredited it enough for one day! (JK)

    Back to my pain point, having a control card in the testing is an excellent way of reinforcing your results. 8800gtx or crossfire x2, something standard in every test as a baseline. I.E review the card every time so people don't call the methodology into question as you can point out consistancy between tests.
     
  10. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I think before bashing ourselves for different methods, it might be a good idea to realize everyone's method right now could use a lot of improvement. The fundamental problem with HardOCP's method IMO is that it just takes too much time which means the amount of data generated isn't enough in many people's books: you can't test as many games, your data isn't as objectively repeatable, you only set one sweetspot resolution, etc.

    Maybe I'm a tad subjective because I've been thinking a fair bit about how all this can be improved, but I'm extremely convinced there's still the potential to make benchmarking processes a lot better, and give a lot more data for the user to look at - without overloading him with information unless he wants to be.

    I won't be commenting on the techniques I've been investigating (except for the fact 'disruptive' would be the understatement of the year *if* it happens), but regarding the idea to play through the entire game to determine if a setting is playable... This seems suboptimal to me. A more logical way to handle the problem is to play through the entire game once, generatng fraps data, so that you know what the game-wide bottoms are and where you'll want to benchmark.

    With that amount of information, you'd likely want to save screenshots every X seconds and associate them to the fps data, and auto-generate based on a simple program what are the game-wide bottoms. Then you'd give the screenshots to the reviewer to let him know where you should consider making gameplay benchmark runs.
     
  11. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    17,884
    Likes Received:
    5,334
    i like squills idea of a control card and maybe card x = 20% faster + card y = 10% slower type of graph
     
  12. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    My idea was similar, it was actually to keep hold of 1-2 specific cards of a known and generally accepted quantity. For me this means either a 8800gt,gts,gtx and keep that card until such time as direct x 11 comes out in games and then replace them with another known quantity. It's to give objectivity to what is (or percieved to be) somewhat a subjective test. Think of it like the control of using the fastest CPU on a GPU test or vice versa. That way as the test rig evolves there is still a baseline that stays the same.

    Your idea is excellent as well! It can be used to show how cards evolve over time. Thats how I see it, although you may have other insights that I haven't thought of regarding this.
     
  13. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    That's a detail implementation of the graph mechanism, however. It does imply something else though: standard benchmark runs so you can reuse old data from old cards for new reviews without having to rerun everything. Of course, they would need to be redone from time to time due to driver changes anyway but heh.
     
  14. Borsti

    Newcomer

    Joined:
    Feb 14, 2003
    Messages:
    91
    Likes Received:
    0
    I think both mehtods have their pros and cons. Considering multiple reviews, all readers should get an idea of a procut (and if they also read text and not just watch graphs, they also find out what's behind)

    In the end, seeing how many descission makers - who should know better - base all their descission on just 3DMark scores (even overseeing the CPU factor in the score of 06), then this discussion is "complain on a very high level".

    On top, just see how 3DMark scores of unnanounced hardware are being discussed all over the web weeks before launch. Does it make sense without knowing which CPU etc? No!

    Just my personal 2 cents

    Lars
     
  15. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,756
    Likes Received:
    722
    Location:
    Germany
    Honestly: All these "real world performance" reviews with their pre-recorded timedemos (in particular flybys) do never show the real world performance.
    If you want to see the real world performance, you play the game: You search the heaviest part of the game (eg a fight with some enemies) and record the fps with FRAPS (for example) during your fight. And you do it again and again and again. Then you look for a part of this game, that's not so heavy and you do another FRAPS record. ...
    Somewhen you get so many results so you can evaluate them and say "hey, if you want to play this game, you need XX cpu with XY video card".
    But...
    How many people want to read such reviews?
     
  16. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Everyone would like to read such reviews if it wasn't the only data available ;) It's just like perf/dollar charts at sites like Hexus: they're a nice bonus, but you wouldn't want that to be the only chart in the entire review. The difficulty thus lies in having that data available with significantly less effort. Traditional techniques (i.e. doing it manually) are fundamentally flawed and doomed.
     
  17. Veridian3

    Newcomer

    Joined:
    Jan 31, 2003
    Messages:
    120
    Likes Received:
    0
    2 comments on a couple of things from this thread that I can answer.

    1) In response to post 2: The fan noise and HD-DVD playback not working.
    Fan noise was a problem in one of the drivers ATI provided between the first press driver and the final driver on Thursday of last week. I haven't seen the same problem with spinning up and down the latest driver. HD-DVD playback works fine and our results show that: http://www.driverheaven.net/reviews/AMDX2Review/movies.php

    2) I have to agree with Kyle/Brent on the Crysis changes. We saw much smaller performance increases in our testing compared to what ATI said we should see.
     
  18. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    I guess the proper question to ask now is:why?Why are the performance gains apparent only in the included GPU_benchmark. I can see why ATi got it from there(I don't think they have the time to do real-world testing, they need an operative way to test what their optimisations do, and a game with an included benchmark script is probably a gift to em-code driver, install, quick test, voila:D ), but why does whatever they did work so great there and do little in other scenarios?

    I'm sure a lot of ppl will start talking about shens and optimizing only for benchmarks, but I don't think they're naive in the way of sending the 60% improved driver out and touting that knowing all-well that it's limited to the benchmark portion of the game. Not when there are a number of sites testing outside of that who were just waiting for the opportunity for some reaming:)

    Maybe it's something that has to do with what Crysis the game does, that it doesn't do in that particular benchmark sequence?Something not necessarily graphics related/related in some other then obvious way?This turn of events is interesting, that's certain.
     
  19. Richard

    Richard Mord's imaginary friend
    Veteran

    Joined:
    Jan 22, 2004
    Messages:
    3,508
    Likes Received:
    40
    Location:
    PT, EU
    I've always disliked [H]'s move to the "real-world" benchmark methodology because of two factors:

    First, it fails at the goal it sets out to accomplish: "real-world performance" because every setup is different and even when the hardware and driver versions are the same games will still behave differently depending on the person's Windows setup so characterising this method as better to determine real-world performance is misleading.

    Secondly, by using a subjective sweet spot for comparing between video cards they're only testing Brent's or Kyle's views on what makes a game playable, whether they prefer higher resolutions versus more AA or whether they feel HDR is more important than AA for instance. So they're not testing "real-world gameplay" either, only their view on what "real-world gameplay" should be.

    Both of these tell me their method does not test video cards but the games instead. "With the same video card I can play game A at 1600x1200 at 40fps while game B only managed 1280x1024 at 35fps".

    My problem is not with playing the game versus running timedemos to get raw data (even though not all games' timedemos are divorced from the real gamplay performance). My problem is with how they choose to interpret and present that data.

    Having said that, no method is perfect and the default "see which apple is fastest" does have problems too. But to me, I'd rather get more objective data and reach my own conclusions rather than trying to interpret subjective results that happen to be quite disparate from my own needs. Their method is quite valid and I'm sure people with similar tastes in gameplay fps thresholds and IQ decisions similar to those of Brent and Kyle will appreciate them but I don't fall into that group.
     
  20. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    I think many of us have been very clear over time that we see some value in the way you do it, particularly as part of the larger mosaic that gets built up across all the sites. In the last couple years it's really felt like you are the one who has become intolerant of others hearing a different drummer.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...