I've been a big proponent of TechReport's frame latency measurement and analysis. Scott describes the method and why they started using it pretty well here (http://techreport.com/review/21516/i...e-benchmarking
), but after reading the comments following recent articles, it's clear to me that some people don't understand why measuring frame latencies is important, and really why measuring frame rates is borderline useless if your goal is to understand the user experience. Since twitter's 140 characters are insufficient for this sort of thing and I don't currently have a forum account at TR, I thought I'd write down my thoughts here. I don't imagine any of this will be news to the industry folks here, but it seemed as reasonable a place to post as any
My goal here is really to convince people of the following: the GPU rendering pipeline actually spans the CPU and GPU, and a spike anywhere will cause issues with fluidity. Thus for the purposes of gamers, measuring frame latencies is far superior to any sort of throughput measurement like frame rates.
Let's consider first the goal: smooth motion. Our eyes are actually very well adapted to tracking predictable motion and thus being able to focus and see detail even on moving objects (and/or when we are moving). This is kind of basic and intuitive, but the point is that for our eyes to perceive smooth motion in a game (or video), objects have to move consistently with respect to when frames are displayed. For instance, if an object moves across the screen in one second and frames are displayed every 1/10 second, it's important that the first frame shows the object 1/10 of the way across the screen, the second 2/10 way, etc. for the motion to appear smooth and trackable by our eyes.
That's the ideal, but how do we accomplish it? To understand that, one really needs to understand how modern graphics APIs and pipelines work at a high level. For our purposes, we're mostly just interested in how the CPU, GPU and display interact so we'll focus on those portions.
Generally, one can think of the graphics pipeline as a series of feed-forward queues connecting various stages (some on the CPU, some on the GPU). It's basically like an assembly line where the "pipeline stages" are the stations and the queues are bins that buffer the inputs to the next station. At one end we feed in rendering commands based on the state of the game world at a given time
, and at the other end we produce rendered frames on the display. The question is, how much should we advance that game world time between frames?
Recall that we want to have things coming out of the end of the pipeline (i.e. display) at a consistent rate. If we knew that rate, we could use it as our time step for the game systems. For instance, if we know that frames are displayed every 1/10 of a second, we can always just move our objects that far for each frame that we submit. Unfortunately, the rate of output - aka throughput
- of the pipeline depends on the slowest stage, which will vary depending on the hardware, workload, and many other factors (even from frame to frame).
The simplest solution to this problem is to make sure that we know the worst case throughput (in this case, the slowest frame we'll ever render) and just slow down the entire pipeline to run at that rate. This is basically the strategy used on consoles, where a fixed set of hardware makes it feasible. Pick a target frame rate - usually 30fps - then make sure that you only submit a frame for rendering every 1/30th of a second. As long as the pipeline can consume and spit out every frame before the next one arrives, there is no buffering of frames at the front of the pipeline and you get consistent, smooth frames. Denying work like this is called "starving" the pipeline, and GPU folks typically don't like it because it means the GPU will be idle for a portion of non-worst-case frames (because we're intentionally slowing it down in these cases), and that makes average frames per second numbers look worse
Unfortunately, this strategy is unworkable when you don't have a fixed platform. It's not really possible to scale an engine so that it can perfectly balance fast GPU/slow CPU, fast CPU/slow GPU and everything in between, so where the bottleneck is going to lie on a user system is not known. Furthermore, even on fixed platforms like consoles you need a fallback strategy since it isn't possible to guarantee
that every single frame will be done in allotted time. We need some way to respond to the actual realized throughput of the pipeline at a given time.
The way this is usually done is by observing "back-pressure" from the pipeline. If you pick a static number of frames that you are willing to buffer up at the start of the rendering pipeline (usually 1-3), when there's no more space in the queue you simply halt the game and wait until space is available. Then, assuming that pipeline is running in a relatively steady state
, the game can time how long it takes from the point that it fills the queue to the point where another slot opens up to get the rate of output of the pipeline. This is what pretty much every game does, and this is the same timing that FRAPS is measuring. GPU vendors like this strategy because it ensures that the GPU usually has more work waiting in the queue and thus will run at full throughput and show high frames per second numbers.
But note the emphasis in the previous paragraph, as this is the key point I'm trying to make: the back-pressure strategy that basically all games use to adjust to the throughput of a given hardware/software configuration is based on the assumption that the rendering pipeline is running in a consistent, steady state. Any spikes throw off that assumption and will
produce jittery output.
Let's consider a practical example, shamelessly stolen from one of Scott's articles (http://techreport.com/review/24022/d...in-windows-8/2
) and marked up by yours truly:
In section A, the game is humming along in a relatively steady state. It's reasonable to assume that the game is GPU bound here, and each frame is taking it about 16ms or so to render (i.e. ~63fps). But then at B, something happens to make the CPU wait far longer than usual to submit a frame to the pipeline. It's possible that the work submitted 1-3 frames ago took an exceptionally long time to run on the GPU and this is a legitimate stall on the back-pressure from the rendering pipeline. However, given the frames that follow, that's unlikely.
In section C something really interesting happens... frames all of a sudden are being submitted faster
than the previous steady state throughput. What is likely happening here is that frame B hit some sort of big CPU spike and while the CPU was grinding away on whatever that was, the GPU was still happily reading frames from its input queue. Note that during this time, the GPU is still consuming frames at 16ms/frame and displaying smooth output... recall that these frames were queued earlier, when the system was still in a steady state.
Now eventually the GPU mostly or entirely empties its input queue and may even go idle waiting for more work. Finally the CPU submits frame B and goes on to the next frame, but when it finishes the subsequent frame, the GPU's queue still has space so it does not block. This continues for a few frames until the queue is filled again. In section C we see that the latency of the CPU work to simulate and render a frame appears to be around 10ms, so the game runs at that rate until it is blocked by back-pressure again (frame 1625) and the system returns to a steady, GPU-bound state.
The other reason to assume that this is a CPU spike is because GPU spikes are a lot more rare. GPUs are typically fed a simple instruction stream and operate on it with fairly predictable latencies. It's certainly possible for a game to feed one very expensive frame to the pipeline causing a GPU spike, but that would be visible on all GPUs, not just a single vendor. Usually if some sort of bizarre event happens that requires a pile of irregular work to restructure rendering, that's a job for the CPU/driver. So while GPU hardware design can definitely affect how much work needs to be done on the CPU in the driver, it's unlikely that these sorts of spikes are actually occuring on the GPU. Usually its either game code, which would show up regardless of which GPU is used, or the graphics driver, which is almost certainly the case here.
So during sections B and C the GPU may well be happily delivering frames to the display at a consistent rate, so where's the problem? Remember, the CPU is timing the rate at which it is allowed to fill the input queue and using that as the rate at which it updates the game simulation. Thus a long frame like the one at B causes it to conclude that the rendering pipeline has slowed all-of-a-sudden, and to start updating the game simulation by 38ms each frame instead of 16ms. i.e. onscreen objects will move more than twice as far on the frame following B as they did previously. In this case though, it was just a spike and not a change in the underlying steady state rate, so the following frames (C) effectively have to move less (10ms each) to resynchronize the game with the proper wall clock time. This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott's follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery
to the display, but precisely the effect of the game adjusting how far it steps the simulation in time each frame.
The astute reader might then ask: "if the problem occurs because the CPU sees a spike and assumes it is a change in throughput, why not just smooth out or ignore spikes?". To some extent this does indeed work... see for instance Emil Persson's clever microstuttering fix (http://forum.beyond3d.com/showthread.php?t=49514
). However it's basically guesswork; if you're wrong and it was a true change in throughput then you have to speed up the next bunch of frames to "catch up" instead of slowing them down. There's no way to know for sure how long a frame is going to take
before it gets to the display, so any guess that a game makes might be wrong and the simulation will need to adjust for the error over the next few frames.
Ok, I've thrown a lot of information at you and I don't expect too many people to have read it word for word, so I'll summarize and draw a few conclusions. If you feel yourself getting angry, please do me the courtesy of actually reading the above explanation and pointing out where it's wrong before responding though.
1) Smooth motion is achieved by having a consistent throughput of frames all the way from the game to the display.
2) Games measure the throughput of the pipeline via timing the back-pressure on the submission queue. The number they use to update their simulations is effectively what FRAPS measures as well.
3) A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay
to see spikey output in frame latency graphs.
4) The converse is actually not true: seeing smooth FRAPS numbers does not guarantee you will see smooth display, as the pipeline could be producing output to the display at jittery intervals even if the input is consistent. This is far less likely though since GPUs typically do relatively simple, predictable work.
5) Measuring pipeline throughput (frames/second) over any
multi-frame interval (entire runs, or even over a second like HardOCP) misses these issues because - as we saw above - throughput balances to the output of the pipeline when there are CPU spikes, so you'll have shorter frames after a long one. However frames that have had widely differing simulation time updates do not look smooth to the eye, even if they are delivered at a consistent rate to the display.
6) The problem that Scott most recently identified on AMD is likely a CPU driver problem, not a GPU issue (although the two can be related in some cases).
7) This is precisely why
it is important to measure frame latencies instead of
frame throughput. We need to keep GPU vendors, driver writers and game developers honest and focused on the task of delivering smooth, consistent frames. For example (unrelated to the above case), allowing things like a driver to cause a spike one frame so that it can optimize the shaders and get higher FPS number for the benchmarks is not okay
8) If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites.
Anyways, enough soapboxing for one day
Hopefully I've at least given people an idea of how games, drivers and GPUs interact and why it's critical to measure performance differently if we want to end up with better game experiences.