On TechReport's frame latency measurement and why gamers should care

Discussion in '3D Hardware, Software & Output Devices' started by Andrew Lauritzen, Jan 1, 2013.

  1. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    2,462
    Location:
    Funny, It Worked Last Time...
  2. caveman-jim

    Regular

    Joined:
    Sep 19, 2005
    Messages:
    305
    Location:
    Austin, TX
    Yes, I concur. What 99% doesn't show is uneven frame render times inside that target time. If you're aiming for 10ms a variance of 40% / 4ms (e.g. a section of game performs as 10ms, 10ms, 10ms, 6ms, 6ms, 10ms, 6ms, 8ms, 10ms but overall reports 99% time 10ms) might not be perceptible but at 22ms a change like that might be (a section performs as 22, 22, 13, 13, 13, 22, 22, 22, 13, 13, 17, 22, 22, 13, 13, 22, 22 but overall the 99% time is 22ms). If you were looking for a 45fps performance baseline the 99% time of 22 would satisfy you but the experience wouldn't because of the variation inside the 99% time.



    I wonder how power limiting technology will affect gaming performance, right now GPU's clock down under full load to stay in TDP (yeah I know, the message is they turbo when load is light, same difference). Frame rate limiting and vsync leave TDP on the table, I wonder if the next step is power aware geometry/AA/compute :D ... this may be off topic for this discussion though.
     
  3. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    I don't know what is going on with Borderlands 2, I work on OpenCL. However, I think you are being a bit too judgmental here. First, when a new effect is used, the driver must compile the shaders involved. Yes, this means optimizing the shaders too. You can experience that as a "hitch" during gameplay, but once the effects are compiled, you won't experience that hitch again, unless the shaders are recreated (i.e. next level :p). Second, some features are emulated now (think fog). So if an API feature were enabled that required recompilation of shaders, then you might experience a "hitch". Third, consoles can avoid all of this easily as applications can ship precompiled shaders for every effect used.

    OpenCL also allows for precompiled kernels to be saved/loaded. This is an important feature as some kernels are huge and can take several minutes to compile. This also allows for IP protection as you can just ship a stripped binary and not your source code. This would be a nice feature for Direct3D, but I don't imagine game developers would want to ship precompiled shaders for every possible GPU out there. So it would be better if the game could compile the shaders as they are used and save them to the hard drive. Then the game would just reload the binaries as they are used/needed. If you happened to change GPUs in your machine, then the game would have to recompile the shaders again, but that wouldn't be a huge deal unless you were changing GPUs constantly. Of course, this wouldn't necessarily catch cases where state changes caused recompilation. Perhaps you could create a query to check if there was recompilation.

    PCs and consoles give different experiences, but some of this is not really under the control of the driver/GPU.
     
  4. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    Haha, agreed! And ideally vtune traces with driver symbols too ;)

    Not really, no. As Scott mentions, you basically just check off an option in FRAPS that dumps the raw data to a file, and FRAPS works in basically everything.

    Definitely true, and there are potentially some other metrics that would capture this better, such as some sort of running variance/deviation metric. I think there are definitely other interesting ways to analyze the data than what Scott has done for instance, but I just want to get over the first hurdle to start :)

    It's definitely going to get really interesting, no doubt...

    As I've mentioned in previous responses, I wasn't referencing the AMD issue there specifically, just using examples of why FPS is a bad metric.

    Right, i.e. "state-based recompiles", but these get nastier than just fixed-function features. Still like changing certain rasterizer/blend state, bound texture types (2D/3D/Cube/etc) and so on can also cause recompiles and the state that is "dynamic" vs. "compiled" is implementation dependent and completely opaque. ... and it definitely does happen in the middle of gameplay. This sort of stuff has to stop in the long run, but it implies more general hardware and people don't want to pay the price.

    But yeah, to reiterate, I'm not claiming that's what's happening here (it's probably more memory-related stuff, as usual), but it's yet another "hitch/stutter" that is ignored when measuring using FPS. Also, I think I'm allowed to be "judgmental" about the state of the industry here, both as a gamer and a developer at an IHV (although I don't specifically do drivers) :) I just want us all to concentrate on improving the gaming experience in this area a little bit... I don't think it'll be a ton of work, but it requires redefining our performance metrics.
     
    #24 Andrew Lauritzen, Jan 2, 2013
    Last edited by a moderator: Jan 2, 2013
  5. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Yep, I know. I was just saying that "avoiding all hitches" isn't practical in some cases, at least with current APIs.
    More general hardware implies more recompilation, not less. It's the removal of fixed-function bits that trigger some of these things.
    This is one issue with benchmarking. Getting a hitch on the first instance of a new effect might be annoying, but if you're using those same effects for the next 30 minutes, who will remember the hitch that happened 29 minutes ago?
     
  6. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    What I mean by "more general" here is stuff like texture units should be able to handle all of the formats and modes exposed by the APIs totally dynamically (ideally per-lane once you go bindless). It's allowed to slow down on more complex requests of course, but it should not need software statically looking at the bound state and putting little tweaks into the shader when it changes.

    Being part of several API and hardware iterations, I'm well aware of the trade-offs here, but there really ultimately are just two solutions... APIs go lower level and expose implementation details or hardware actually obeys the higher-level commands. The current status quo is not really great.

    But "new effects" come up all the time in games, so it's not really okay to hitch when you see something you don't recognize. When I walk around a corner and oh my gosh there's a new shader, it's not okay to take 100ms to work it out. We only accept it because that's how it has always been, not because it really needs to be that way.

    And like I said, I'm well versed in theproblems as both a gamer and developer and I'm not trying to trivialize them; rather I'm trying to get us focused on metrics and hardware/software improvements that actually better model the gamer experience and stop just mindlessly cramming more ALUs onto GPUs so that I can render with 6 30" monitors instead of 5 (slightly kidding here, but you get my point). And I'm preaching to myself and my own employer as much as anyone else :)
     
    #26 Andrew Lauritzen, Jan 2, 2013
    Last edited by a moderator: Jan 2, 2013
  7. CNCAddict

    Regular

    Joined:
    Aug 14, 2005
    Messages:
    288
  8. I.S.T.

    Veteran

    Joined:
    Feb 21, 2004
    Messages:
    2,928
  9. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    12,914
    cant the game pre-compile shaders during level load (a bit like ut pre-caches textures (in d3d renderer, doesnt pre-cache in ogl) ) ?
     
  10. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Yes, they can. But some games only do this for benchmark runs. Here's how some games (and benchmarks) "warm" the system for benchmarking:
    - Load critical assets
    - Draw frame
    - Prior to calling Present(), clear screen to black
    - Draw "Loading..." or some other progress meter
    - Call Present()
    - etc.
    - Once all assets are "warm", run normal benchmark

    Obviously, you don't want to double the benchmark time by running through all frames, so shortcuts are taken ("time" could be sped up to shorten number of frames generated, for example).
     
  11. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    Right but the issue is which state gets "compiled in" with a shader and which is dynamic is opaque. It's unreasonable for the application to assume that *all* state is compiled in and go through all of it (hey, maybe an implementation has to recompile a shader when a bound constant buffer size changes, who knows!), so there's not really a reasonable solution on PC.

    Pretty much all games do create shaders at level load time, but many drivers compile them lazily due to pulling in additional compiled state at draw call time. The idea with DX10's state structures was to try and eliminate some of that by grouping up relevant state and declaring it immutable up front, but of course it doesn't map perfectly to any one implementation and thus it ends up being fairly useless to that end as well.

    One interesting question though is why do graphics drivers not at least cache compiled shaders across runs (i.e. to disk or similar). Purely concerns over reverse engineering (that would seem odd these days)?
     
    #31 Andrew Lauritzen, Jan 2, 2013
    Last edited by a moderator: Jan 2, 2013
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,009
    Location:
    O Canada!
    And note, this has been done for a long time - remember people complaining about BF3 load times? This was why...
     
  13. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Nvidia does cache compute kernels, but not Direct3D. I presume the issue is the thousands of shaders a single game creates.
     
  14. Homeles

    Newcomer

    Joined:
    May 25, 2012
    Messages:
    234
  15. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    But I mean they already ship the bytecode for those shaders. Certainly it can get out of hand with permutations like in the original Far Cry (IIRC) where the patches were massive because of small changes causing the full cross product of shaders to have to be recompiled/distributed, but I can't imagine the overhead would be massively high for the number of shaders that typical games use. Also with the recent popularity of deferred shading and "ubershaders", the issue isn't as bad as it was a few years ago. Of course the "cache" can have a maximum size and an eviction policy as well.

    Anyways you may be right about size concerns, but for some games I could see it being a benefit.

    And if the driver is adding too large a cross product of its own on top of what the application is requesting in terms of shaders, that's a problem too :)
     
    #35 Andrew Lauritzen, Jan 3, 2013
    Last edited by a moderator: Jan 3, 2013
  16. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    7,906
    Location:
    WI, USA
    Going way back - Far Cry has a big shader cache directory AFAIR.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    6,829
    Location:
    Well within 3d
    Just for clarity's sake, do you mean it is objectionable that a driver have its own thread, or that its thread has a disproportionate share of active CPU cycles?
    It seems reasonable to give a program tasked with arbitrating between two systems of arbitrary composition running at user-interactive rates over a long(ish)-latency interface at least some freedom from possible blockage if it shared room in the same loop with other functionality.

    Copyright?
    Hasn't this come up with earlier attempts at binary translators for different CPU ISAs?
    Copyright lawsuits have been brought to bear for in-memory copies of copyrighted data, much less copies of software in a translated form on disk.
    If there's a possible advantage of a console versus an open PC, it's that the implied locked-down nature of the console and licensing to the console company may provide consent for such an action, whereas storing a transformed copy of a work by one unknown party for the use of someone similarly unknown provides nothing to assuage content creators.
     
  18. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    688
    I wouldn't think the size (in bytes) but the count of permutations is critical here. You have to implement a search-tree solution which guarantees that you access cached code faster than you can compile it. Hash-keying the multidimension space of shader-code+externals isn't exactly for free.
    Graphics IHVs are not Oracle, they don't have DB-performance groups, and they shouldn't.

    As far as I can see there has been no possible issue guessed which can not be coped with at the developer side. You did though say "maybe we should start caring about worst-case instead of throughput" (freely recreated :smile:), and that's what I think as well, but that has to sink into the programming patterns of engine-programmers. How do you do that? Get more embedded-realtime-system programmers into the game-industry?
     
  19. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    2,462
    Location:
    Funny, It Worked Last Time...
    BF2 no?
     
  20. Billy Idol

    Legend Veteran

    Joined:
    Mar 17, 2009
    Messages:
    5,613
    Location:
    Europe
    Thanks Andrew, great post, great topic. It happened to me that in Far Cry 3 FRAPS showed good fps, but it all felt jittery (in the village) and I scratched my head and could not understand whats going on. You gave me good insight and a possible explanation. So in short, I fully agree with you that (averaged) fps should not be the only measurement/benchmark. Digital Foundry analysis e.g. includes the controller input latency as well for certain games.

    Would a framerate smoothing by interpolation approach as presented by the force unleashed developers help out to decrease the impact of such performance spikes (or maybe make it even worse) and reduce the jittery feeling?
     

Share This Page

  • About Beyond3D

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...