Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 01-Jan-2013, 20:40   #1
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default On TechReport's frame latency measurement and why gamers should care

I've been a big proponent of TechReport's frame latency measurement and analysis. Scott describes the method and why they started using it pretty well here (http://techreport.com/review/21516/i...e-benchmarking), but after reading the comments following recent articles, it's clear to me that some people don't understand why measuring frame latencies is important, and really why measuring frame rates is borderline useless if your goal is to understand the user experience. Since twitter's 140 characters are insufficient for this sort of thing and I don't currently have a forum account at TR, I thought I'd write down my thoughts here. I don't imagine any of this will be news to the industry folks here, but it seemed as reasonable a place to post as any

My goal here is really to convince people of the following: the GPU rendering pipeline actually spans the CPU and GPU, and a spike anywhere will cause issues with fluidity. Thus for the purposes of gamers, measuring frame latencies is far superior to any sort of throughput measurement like frame rates.

Let's consider first the goal: smooth motion. Our eyes are actually very well adapted to tracking predictable motion and thus being able to focus and see detail even on moving objects (and/or when we are moving). This is kind of basic and intuitive, but the point is that for our eyes to perceive smooth motion in a game (or video), objects have to move consistently with respect to when frames are displayed. For instance, if an object moves across the screen in one second and frames are displayed every 1/10 second, it's important that the first frame shows the object 1/10 of the way across the screen, the second 2/10 way, etc. for the motion to appear smooth and trackable by our eyes.

That's the ideal, but how do we accomplish it? To understand that, one really needs to understand how modern graphics APIs and pipelines work at a high level. For our purposes, we're mostly just interested in how the CPU, GPU and display interact so we'll focus on those portions.

Generally, one can think of the graphics pipeline as a series of feed-forward queues connecting various stages (some on the CPU, some on the GPU). It's basically like an assembly line where the "pipeline stages" are the stations and the queues are bins that buffer the inputs to the next station. At one end we feed in rendering commands based on the state of the game world at a given time, and at the other end we produce rendered frames on the display. The question is, how much should we advance that game world time between frames?

Recall that we want to have things coming out of the end of the pipeline (i.e. display) at a consistent rate. If we knew that rate, we could use it as our time step for the game systems. For instance, if we know that frames are displayed every 1/10 of a second, we can always just move our objects that far for each frame that we submit. Unfortunately, the rate of output - aka throughput - of the pipeline depends on the slowest stage, which will vary depending on the hardware, workload, and many other factors (even from frame to frame).

The simplest solution to this problem is to make sure that we know the worst case throughput (in this case, the slowest frame we'll ever render) and just slow down the entire pipeline to run at that rate. This is basically the strategy used on consoles, where a fixed set of hardware makes it feasible. Pick a target frame rate - usually 30fps - then make sure that you only submit a frame for rendering every 1/30th of a second. As long as the pipeline can consume and spit out every frame before the next one arrives, there is no buffering of frames at the front of the pipeline and you get consistent, smooth frames. Denying work like this is called "starving" the pipeline, and GPU folks typically don't like it because it means the GPU will be idle for a portion of non-worst-case frames (because we're intentionally slowing it down in these cases), and that makes average frames per second numbers look worse

Unfortunately, this strategy is unworkable when you don't have a fixed platform. It's not really possible to scale an engine so that it can perfectly balance fast GPU/slow CPU, fast CPU/slow GPU and everything in between, so where the bottleneck is going to lie on a user system is not known. Furthermore, even on fixed platforms like consoles you need a fallback strategy since it isn't possible to guarantee that every single frame will be done in allotted time. We need some way to respond to the actual realized throughput of the pipeline at a given time.

The way this is usually done is by observing "back-pressure" from the pipeline. If you pick a static number of frames that you are willing to buffer up at the start of the rendering pipeline (usually 1-3), when there's no more space in the queue you simply halt the game and wait until space is available. Then, assuming that pipeline is running in a relatively steady state, the game can time how long it takes from the point that it fills the queue to the point where another slot opens up to get the rate of output of the pipeline. This is what pretty much every game does, and this is the same timing that FRAPS is measuring. GPU vendors like this strategy because it ensures that the GPU usually has more work waiting in the queue and thus will run at full throughput and show high frames per second numbers.

But note the emphasis in the previous paragraph, as this is the key point I'm trying to make: the back-pressure strategy that basically all games use to adjust to the throughput of a given hardware/software configuration is based on the assumption that the rendering pipeline is running in a consistent, steady state. Any spikes throw off that assumption and will produce jittery output.

Let's consider a practical example, shamelessly stolen from one of Scott's articles (http://techreport.com/review/24022/d...in-windows-8/2) and marked up by yours truly:


In section A, the game is humming along in a relatively steady state. It's reasonable to assume that the game is GPU bound here, and each frame is taking it about 16ms or so to render (i.e. ~63fps). But then at B, something happens to make the CPU wait far longer than usual to submit a frame to the pipeline. It's possible that the work submitted 1-3 frames ago took an exceptionally long time to run on the GPU and this is a legitimate stall on the back-pressure from the rendering pipeline. However, given the frames that follow, that's unlikely.

In section C something really interesting happens... frames all of a sudden are being submitted faster than the previous steady state throughput. What is likely happening here is that frame B hit some sort of big CPU spike and while the CPU was grinding away on whatever that was, the GPU was still happily reading frames from its input queue. Note that during this time, the GPU is still consuming frames at 16ms/frame and displaying smooth output... recall that these frames were queued earlier, when the system was still in a steady state.

Now eventually the GPU mostly or entirely empties its input queue and may even go idle waiting for more work. Finally the CPU submits frame B and goes on to the next frame, but when it finishes the subsequent frame, the GPU's queue still has space so it does not block. This continues for a few frames until the queue is filled again. In section C we see that the latency of the CPU work to simulate and render a frame appears to be around 10ms, so the game runs at that rate until it is blocked by back-pressure again (frame 1625) and the system returns to a steady, GPU-bound state.

The other reason to assume that this is a CPU spike is because GPU spikes are a lot more rare. GPUs are typically fed a simple instruction stream and operate on it with fairly predictable latencies. It's certainly possible for a game to feed one very expensive frame to the pipeline causing a GPU spike, but that would be visible on all GPUs, not just a single vendor. Usually if some sort of bizarre event happens that requires a pile of irregular work to restructure rendering, that's a job for the CPU/driver. So while GPU hardware design can definitely affect how much work needs to be done on the CPU in the driver, it's unlikely that these sorts of spikes are actually occuring on the GPU. Usually its either game code, which would show up regardless of which GPU is used, or the graphics driver, which is almost certainly the case here.

So during sections B and C the GPU may well be happily delivering frames to the display at a consistent rate, so where's the problem? Remember, the CPU is timing the rate at which it is allowed to fill the input queue and using that as the rate at which it updates the game simulation. Thus a long frame like the one at B causes it to conclude that the rendering pipeline has slowed all-of-a-sudden, and to start updating the game simulation by 38ms each frame instead of 16ms. i.e. onscreen objects will move more than twice as far on the frame following B as they did previously. In this case though, it was just a spike and not a change in the underlying steady state rate, so the following frames (C) effectively have to move less (10ms each) to resynchronize the game with the proper wall clock time. This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott's follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the effect of the game adjusting how far it steps the simulation in time each frame.

The astute reader might then ask: "if the problem occurs because the CPU sees a spike and assumes it is a change in throughput, why not just smooth out or ignore spikes?". To some extent this does indeed work... see for instance Emil Persson's clever microstuttering fix (http://forum.beyond3d.com/showthread.php?t=49514). However it's basically guesswork; if you're wrong and it was a true change in throughput then you have to speed up the next bunch of frames to "catch up" instead of slowing them down. There's no way to know for sure how long a frame is going to take before it gets to the display, so any guess that a game makes might be wrong and the simulation will need to adjust for the error over the next few frames.

Ok, I've thrown a lot of information at you and I don't expect too many people to have read it word for word, so I'll summarize and draw a few conclusions. If you feel yourself getting angry, please do me the courtesy of actually reading the above explanation and pointing out where it's wrong before responding though.

1) Smooth motion is achieved by having a consistent throughput of frames all the way from the game to the display.

2) Games measure the throughput of the pipeline via timing the back-pressure on the submission queue. The number they use to update their simulations is effectively what FRAPS measures as well.

3) A spike anywhere in the pipeline will cause the game to adjust the simulation time, which is pretty much guaranteed to produce jittery output. This is true even if frame delivery to the display (i.e. rendering pipeline output) remains buffered and consistent. i.e. it is never okay to see spikey output in frame latency graphs.

4) The converse is actually not true: seeing smooth FRAPS numbers does not guarantee you will see smooth display, as the pipeline could be producing output to the display at jittery intervals even if the input is consistent. This is far less likely though since GPUs typically do relatively simple, predictable work.

5) Measuring pipeline throughput (frames/second) over any multi-frame interval (entire runs, or even over a second like HardOCP) misses these issues because - as we saw above - throughput balances to the output of the pipeline when there are CPU spikes, so you'll have shorter frames after a long one. However frames that have had widely differing simulation time updates do not look smooth to the eye, even if they are delivered at a consistent rate to the display.

6) The problem that Scott most recently identified on AMD is likely a CPU driver problem, not a GPU issue (although the two can be related in some cases).

7) This is precisely why it is important to measure frame latencies instead of frame throughput. We need to keep GPU vendors, driver writers and game developers honest and focused on the task of delivering smooth, consistent frames. For example (unrelated to the above case), allowing things like a driver to cause a spike one frame so that it can optimize the shaders and get higher FPS number for the benchmarks is not okay!

8) If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites.

Anyways, enough soapboxing for one day Hopefully I've at least given people an idea of how games, drivers and GPUs interact and why it's critical to measure performance differently if we want to end up with better game experiences.
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 03-Mar-2013 at 08:20.
Andrew Lauritzen is offline   Reply With Quote
Old 01-Jan-2013, 21:10   #2
Davros
Senior Member
 
Join Date: Jun 2004
Posts: 11,075
Default

Quote:
Allowing things like a driver to cause a spike one frame so that it can optimize the shaders for the next few and get a higher FPS number for the benchmarks is not okay!
Is this what you think amd is doing ?
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 01-Jan-2013, 21:34   #3
Lightman
Senior Member
 
Join Date: Jun 2008
Location: Torquay, UK
Posts: 1,159
Default

I like your point no. 7 as it is what I thought as well.
It was very visible in Crysis when looking into one direction after fraction of second FPS would rise, but sudden change of scenery (eg. looking away from sea towards build up area or forest) would initially drop FPS by quite some margin (only for fraction of second again but more than few frames).

This also suggest that by increasing CPU speed we can expect these spikes to get smaller and eventually disappear.
I wonder about another possibility when CPU is so much ahead of GPU that it idles in between frames and goes into C states, can this affect frame rendering enough to matter? I know Intel's Speed Steep is fairly quick in changing between states but with aggressive power saving profile it still adds latency to every task. AMD K8-K10 implementation was pretty horrid and for gaming it was always best to disable C'n'Q all together.
Lightman is online now   Reply With Quote
Old 01-Jan-2013, 21:47   #4
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by Davros View Post
Is this what you think amd is doing ?
I won't speculate much on that... it's hard to say without vtune-ing the relevant situation with driver symbols. I think there's a strong likelihood that the spikes are coming from the graphics driver, but beyond that I don't really have enough information to say.

Quote:
Originally Posted by Lightman View Post
It was very visible in Crysis when looking into one direction after fraction of second FPS would rise, but sudden change of scenery (eg. looking away from sea towards build up area or forest) would initially drop FPS by quite some margin (only for fraction of second again but more than few frames).
Yeah it's definitely possible that this is due to JIT-style optimization; There are also cache warming effects though and sometimes the game has to do some streaming of its own when new stuff becomes visible.

Quote:
Originally Posted by Lightman View Post
This also suggest that by increasing CPU speed we can expect these spikes to get smaller and eventually disappear.
The trouble is that a lot of this stuff is very single-threaded right now, so while increased frequency and IPC can help (which is why Intel CPUs tend to do better at games right now), more cores really does not. It's not trivial to multithread a lot of the runtime until GPUs can accept significantly more interesting command streams, ideally submitted in parallel. But of course there's some slightly skewed economies with discrete GPUs that discourage this, since when GPUs are benchmarked vs. one another they are typically done with the fastest CPUs available. And of course they are benchmarked with average FPS...

Power-constrained integrated CPU/GPUs should change this equation somewhat, as lightening the CPU workload (driver, etc) can directly affect the power/frequency available to the GPU.

Quote:
Originally Posted by Lightman View Post
I wonder about another possibility when CPU is so much ahead of GPU that it idles in between frames and goes into C states, can this affect frame rendering enough to matter?
I doubt this has a large effect to be honest, but I don't remember the details of the latencies involved in each solution. But again, anything affecting power definitely gets interesting on integrated SoCs in the future.
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 01-Jan-2013 at 22:00.
Andrew Lauritzen is offline   Reply With Quote
Old 01-Jan-2013, 21:51   #5
caveman-jim
Member
 
Join Date: Sep 2005
Location: Austin, TX
Posts: 305
Default

Disregarding point 7 as baseless, the post is excellent. Thanks for taking the time.
__________________
http://twitter.com/cavemanjim

I work for AMD. The opinions posted here are my own and may not represent those of my employer.
caveman-jim is offline   Reply With Quote
Old 01-Jan-2013, 22:10   #6
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by caveman-jim View Post
Disregarding point 7 as baseless, the post is excellent. Thanks for taking the time.
I'm not claiming that's what's happening in this case, just that you definitely can do stuff like that if people are only measuring frame rates. I edited the point slightly to hopefully make that clear.

I'm familiar enough with the industry (being a part of it ) to know that drivers are optimized to the benchmarks primarily, so what reviewers are measuring is critical. If we ultimately want smooth games, we should be measuring frame latencies, like the games themselves do to update their simulations.

Sorry if the example there came off as me claiming to understand this particular case with AMD - that was not my intention. The post is merely using Scott's recent results as an example of why we need to measure this way; I'm not really trying to speak directly to the issue he saw, which I imagine is something fixable in AMD's driver. But that's the point, let's make sure this stuff gets caught and fixed!
__________________
The content of this message is my personal opinion only.
Andrew Lauritzen is offline   Reply With Quote
Old 01-Jan-2013, 22:30   #7
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,445
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
7) This is precisely why it is important to measure frame latencies instead of frame throughput. We need to keep GPU vendors, driver writers and game developers honest and focused on the task of delivering smooth, consistent frames. Allowing things like a driver to cause a spike one frame so that it can optimize the shaders for the next few and get a higher FPS number for the benchmarks is not okay!
I'm not sure if the frame graph example is strong support of this claim.
If this is an example of the driver running a shader compilation in the middle of a frame trying to use the shaders, it is either not in the service of improving successive frames or it's doing a very poor job of it.
The next few "fast" frames as you've noted are probably ready and processed, just waiting on the final submission of the frame. The rest of the graph shows zero change in the optimality of the shaders the driver allegedly optimized at the cost of jacking one frame's latency.

The otherwise steady-state nature of the graph and zero real improvement beyond frames whose submission is held up by the engine makes me think that this isn't a case of trying to optimize the next few frames for benchmark numbers, but some kind of glass jaw in whatever the system is doing to maintain the average performance numbers.

When talking about tens of milliseconds, I was thinking if there is some kind driver problem not directly attributable to the GPU or CPU silicon for which the timescales here are glacial (doesn't rule out a software or system issue at a low level to have caused it).
Is something being juggled over the PCIe bus, or some kind of memory buffer issue?
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 01-Jan-2013, 22:58   #8
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,569
Default

There is no one single thing for, its all over the place - the app, the driver, allocations of memory, CPU thread priorities, etc., etc. I believe some of the latency with BL2 was, in fact, simply due to the size of one of the buffers; a tweak to is has improved it significantly (a CAP is in the works).
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 01-Jan-2013, 23:44   #9
caveman-jim
Member
 
Join Date: Sep 2005
Location: Austin, TX
Posts: 305
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
I'm not claiming that's what's happening in this case, just that you definitely can do stuff like that if people are only measuring frame rates. I edited the point slightly to hopefully make that clear.

I'm familiar enough with the industry (being a part of it ) to know that drivers are optimized to the benchmarks primarily, so what reviewers are measuring is critical. If we ultimately want smooth games, we should be measuring frame latencies, like the games themselves do to update their simulations.

Sorry if the example there came off as me claiming to understand this particular case with AMD - that was not my intention. The post is merely using Scott's recent results as an example of why we need to measure this way; I'm not really trying to speak directly to the issue he saw, which I imagine is something fixable in AMD's driver. But that's the point, let's make sure this stuff gets caught and fixed!
I understand and agree, although FPS still has an important place in reviewing. Smoothness at specific settings and FPS rate needs to be known for consumers to get a balanced idea of the performance.
__________________
http://twitter.com/cavemanjim

I work for AMD. The opinions posted here are my own and may not represent those of my employer.
caveman-jim is offline   Reply With Quote
Old 02-Jan-2013, 00:21   #10
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by 3dilettante View Post
The otherwise steady-state nature of the graph and zero real improvement beyond frames whose submission is held up by the engine makes me think that this isn't a case of trying to optimize the next few frames for benchmark numbers, but some kind of glass jaw in whatever the system is doing to maintain the average performance numbers.
Completely agreed, see my clarification to caveman-jim above. I didn't mean to imply that or attempt to explain what's going on in this specific case... it was just another example of why measuring and optimizing for frame rates is a bad plan.

Quote:
Originally Posted by 3dilettante View Post
Is something being juggled over the PCIe bus, or some kind of memory buffer issue?
Could be any number of things for that one specific case really, and it should be easy for AMD to track down as it's almost certainly a CPU-side issue. But my point is really not about this specific case, but rather than stuff like this shouldn't ever get through QA/review and it has in the past and still does simply because of using bad performance metrics like FPS.

Quote:
Originally Posted by Dave Baumann View Post
There is no one single thing for, its all over the place - the app, the driver, allocations of memory, CPU thread priorities, etc., etc. I believe some of the latency with BL2 was, in fact, simply due to the size of one of the buffers; a tweak to is has improved it significantly (a CAP is in the works).
Great, and that's exactly the desirable outcome for everyone really. No need to freak out about stuff like many commenters have, but we definitely want to catch and correct issues like this, and I'm sure there are many more and lots of blame to spread around; let's just get to fixing it all To be clear, I'm not picking on AMD or anyone else here... there's lots of issues to go around. This is more a call to action for the whole industry to improve how we measure and optimize for gamers.

Quote:
Originally Posted by caveman-jim View Post
I understand and agree, although FPS still has an important place in reviewing. Smoothness at specific settings and FPS rate needs to be known for consumers to get a balanced idea of the performance.
"Smoothness at particular settings" sure, and frame latencies are a good place to start to measure that. Certainly you can argue over whether 99% percentile, time beyond some threshold, quartile graphs, etc. are the best way to present the data, but averaging successive frames is pretty clearly not a good way to present it. FPS is really only meaningful and interesting if it's metered like on many console games (i.e. a "30fps" or "60fps" game), and even then you still want to measure "dropped" frames and such. FPS really does not add any interesting information over frame times, and I don't think there's a compelling reason that it needs to be used in reviews other than legacy. I'm convinced that the enthusiasts who read these sorts of reviews (don't kid yourself, regular people do not...) can understand and adapt.

But hey, more data is all good in my books I just don't ever want to see only FPS numbers for any review that claims to tell me how smooth my gameplay experience is going to be with a specific game, set of hardware, etc.
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 00:34.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 00:42   #11
Zaphod
Remember
 
Join Date: Aug 2003
Posts: 2,108
Default

So, how clever are current games at determining what now() to sample, render and present to the user?

Obviously, just smoothing out the display rate could just exacerbate issues if the display time of a frame is shifted from the game time. While on the other hand, wildly varying frame times would exacerbate stuttering if the game world runs at a (fixed or) smoothed interval.

IIRC, there was a discussion about this early this year, and some (Nvidia?) quotes from TR back then seemed to indicate that both cases would have to be accounted for.
Zaphod is offline   Reply With Quote
Old 02-Jan-2013, 01:07   #12
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by Zaphod View Post
So, how clever are current games at determining what now() to sample, render and present to the user?
They're not super-clever at all... as I noted, every one that I know of just times the frame loop and uses that as a delta to update stuff. Even for games with fixed physics ticks they still use the frame loop timer to decide how many to do (i.e. it's the same thing, just discretized).

Quote:
Originally Posted by Zaphod View Post
Obviously, just smoothing out the display rate could just exacerbate issues if the display time of a frame is shifted from the game time. While on the other hand, wildly varying frame times would exacerbate stuttering if the game world runs at a (fixed or) smoothed interval.
Indeed, I don't think there's necessarily a solution for the "big spike" issue other than removing the spikes and providing low, consistent latencies for all operations. A lot of the problem today really stems from the fact that GPU optimization is really finicky... getting something slightly wrong can cut your performance in half or more, so a lot of driver effort is expended to not fall off those cliffs. Ultimately I think we need more robust performance, and generally just more concentration on worst case performance, not best or average case.

For stuff like crossfire/SLI microstuttering due to submission to multiple queues there are potential fixes though like Emil's stuff.
__________________
The content of this message is my personal opinion only.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 01:34   #13
caveman-jim
Member
 
Join Date: Sep 2005
Location: Austin, TX
Posts: 305
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
"Smoothness at particular settings" sure, and frame latencies are a good place to start to measure that. Certainly you can argue over whether 99% percentile, time beyond some threshold, quartile graphs, etc. are the best way to present the data, but averaging successive frames is pretty clearly not a good way to present it. FPS is really only meaningful and interesting if it's metered like on many console games (i.e. a "30fps" or "60fps" game), and even then you still want to measure "dropped" frames and such. FPS really does not add any interesting information over frame times, and I don't think there's a compelling reason that it needs to be used in reviews other than legacy. I'm convinced that the enthusiasts who read these sorts of reviews (don't kid yourself, regular people do not...) can understand and adapt.

But hey, more data is all good in my books I just don't ever want to see only FPS numbers for any review that claims to tell me how smooth my gameplay experience is going to be with a specific game, set of hardware, etc.

I disagree FPS averages have no value; if you're gonna show the data then I don't see any value in not showing the computed average especially if you're gonna contrast it to another methodology computed perceived 'smoothness'.

60fps is not the be-all-end-all, some gamers want 90, some want 120, some want 45 - they want to know what trade offs are necessary to get there. Reviewers are ultimately limited on the permutations for testing so it's not possible to find every combination but if you can simply turn off the FPS limiter in Skyrim and get rid of the stutter on the AMD card, shouldn't that be noted? These are enthusiasts who aren't afraid of tweaking .ini files, I postulate, if they're willing to go down the rabbit hole of statistical frame time analysis.
__________________
http://twitter.com/cavemanjim

I work for AMD. The opinions posted here are my own and may not represent those of my employer.
caveman-jim is offline   Reply With Quote
Old 02-Jan-2013, 01:43   #14
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by caveman-jim View Post
60fps is not the be-all-end-all, some gamers want 90, some want 120, some want 45 - they want to know what trade offs are necessary to get there. Reviewers are ultimately limited on the permutations for testing so it's not possible to find every combination but if you can simply turn off the FPS limiter in Skyrim and get rid of the stutter on the AMD card, shouldn't that be noted? These are enthusiasts who aren't afraid of tweaking .ini files, I postulate, if they're willing to go down the rabbit hole of statistical frame time analysis.
No argument with any of that but don't see how it relates to whether you report frame latency-based measurements or FPS. I'm saying that those gamers who 'want 90' (and I personally always aim for 100/10ms with my stuff ) would be just as better served by saying they want 99% of the frames faster than 11ms or similar. This is mostly orthogonal to vsync.
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 02:32.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 04:17   #15
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,264
Send a message via Skype™ to rpg.314
Default

Great post throughout with some great ideas thrown in.

Quote:
Originally Posted by Andrew Lauritzen View Post
So during sections B and C the GPU may well be happily delivering frames to the display at a consistent rate, so where's the problem? Remember, the CPU is timing the rate at which it is allowed to fill the input queue and using that as the rate at which it updates the game simulation. Thus a long frame like the one at B causes it to conclude that the rendering pipeline has slowed all-of-a-sudden, and to start updating the game simulation by 38ms each frame instead of 16ms. i.e. onscreen objects will move more than twice as far on the frame following B as they did previously. In this case though, it was just a spike and not a change in the underlying steady state rate, so the following frames (C) effectively have to move less (10ms each) to resynchronize the game with the proper wall clock time. This sort of "jump ahead, then slow down" jitter is extremely visible to our eyes, and demonstrated well by Scott's follow-up video using a high speed camera. Note that what you are seeing are likely not changes in frame delivery to the display, but precisely the affect of the game adjusting how far it steps the simulation in time each frame.
This seems to be a good ID of the problem involved. And I am skeptical, as 3d already pointed out, that the driver is optimizing the shaders between frames. My hunch is that it almost certainly due to the variations in CPU load alone. However, there might be other issues involved of the similar nature. There seems to be a rather simple solution to this problem of object level stutter. The game is using a too simple a heuristic to calculate the time step of the game physics. ALL of the volatility is immediately reflected into the time steps which is the cause of the stutter.

The embedded world has to deal with such problems all the time. They have a fairly well standardized solution to this. PID controllers. This is a robust solutions to smooth out the variations in the input. The existing systems to find the time step are too simple, hence too volatile.

PID systems use a goal signal to minimize the error. The games can use the average delay in the last 10 frames as the goal signal to smooth out the variations in frame time.
rpg.314 is offline   Reply With Quote
Old 02-Jan-2013, 06:06   #16
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by rpg.314 View Post
And I am skeptical, as 3d already pointed out, that the driver is optimizing the shaders between frames.
Yes totally agreed in this case and didn't mean to imply that. But trust me, such things have been done and are routinely done on separate threads (which is far less harmful thankfully, although graphics drivers eating a whole thread for themselves is another topic...). More common is shader recompiles due to state changes which can cause nasty spikes too, although normally they subside after a little gameplay. For instance, I think it was NV30 that had to recompile the shader if you changed a uniform/constant from or to zero

I also agree that games should consider using some sort of smoothing function on the raw time deltas (gonna take a closer look at PID controllers, thanks for the link), although it's obviously important to eliminate the spikes as well. In fact, making the critical single-threaded rendering path as predictable and lightweight as possible is important in general.

Thanks for the comments and insight so far guys!
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 06:14.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 06:15   #17
MJP
Member
 
Join Date: Feb 2007
Location: Irvine, CA
Posts: 523
Default

I think that reviewers should just post a GPUView capture for each benchmark,and then we'll really know what's going on.
__________________
The Blog | The Book
MJP is offline   Reply With Quote
Old 02-Jan-2013, 06:36   #18
Davros
Senior Member
 
Join Date: Jun 2004
Posts: 11,075
Default

that gpuview looks damm cool but installing a sdk scares me. Shame otherwise i'd run skyrim with it

Quote:
if you changed a uniform/constant
If you can change it its not a constant
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 02-Jan-2013, 07:14   #19
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,264
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
Yes totally agreed in this case and didn't mean to imply that. But trust me, such things have been done and are routinely done on separate threads (which is far less harmful thankfully, although graphics drivers eating a whole thread for themselves is another topic...). More common is shader recompiles due to state changes which can cause nasty spikes too, although normally they subside after a little gameplay. For instance, I think it was NV30 that had to recompile the shader if you changed a uniform/constant from or to zero

I also agree that games should consider using some sort of smoothing function on the raw time deltas (gonna take a closer look at PID controllers, thanks for the link), although it's obviously important to eliminate the spikes as well. In fact, making the critical single-threaded rendering path as predictable and lightweight as possible is important in general.

Thanks for the comments and insight so far guys!
We need parallel command streams now. The current multithreading model doesn't go far enough. Let's hope MS put some effort towards that for nextbox otherwise they are not advance PC DX by itself. Too busy with all the tablet shiny.
rpg.314 is offline   Reply With Quote
Old 02-Jan-2013, 07:40   #20
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 2,284
Default

Are there technical reasons why review sites can't post raw fraps numbers for all the benchmarks they run?

E.g. Is fraps something that doesn't work for all games (makes them crash?) Or does the act of running fraps create Heisenberg issues?

It's obvious that some don't like the kind of analysis Scott is doing, but it'd be nice for others to do them themselves. So the sites only do what they always do, run a canned benchmark, report a number, done, but with fraps running in the background...

(I realize very well that non-technical reasons are a much bigger impediment for this to ever become reality.)
silent_guy is offline   Reply With Quote
Old 02-Jan-2013, 11:33   #21
Bludd
Eric the Half-a-bee
 
Join Date: Oct 2003
Location: The cat detector van from the Ministry of Housinge
Posts: 2,133
Default

I think this is a very interesting metric and it coincides with Anand taking an interest in performance consistency in SSDs.
Bludd is offline   Reply With Quote
Old 02-Jan-2013, 16:14   #22
caveman-jim
Member
 
Join Date: Sep 2005
Location: Austin, TX
Posts: 305
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
No argument with any of that but don't see how it relates to whether you report frame latency-based measurements or FPS. I'm saying that those gamers who 'want 90' (and I personally always aim for 100/10ms with my stuff ) would be just as better served by saying they want 99% of the frames faster than 11ms or similar. This is mostly orthogonal to vsync.
Yes, I concur. What 99% doesn't show is uneven frame render times inside that target time. If you're aiming for 10ms a variance of 40% / 4ms (e.g. a section of game performs as 10ms, 10ms, 10ms, 6ms, 6ms, 10ms, 6ms, 8ms, 10ms but overall reports 99% time 10ms) might not be perceptible but at 22ms a change like that might be (a section performs as 22, 22, 13, 13, 13, 22, 22, 22, 13, 13, 17, 22, 22, 13, 13, 22, 22 but overall the 99% time is 22ms). If you were looking for a 45fps performance baseline the 99% time of 22 would satisfy you but the experience wouldn't because of the variation inside the 99% time.



I wonder how power limiting technology will affect gaming performance, right now GPU's clock down under full load to stay in TDP (yeah I know, the message is they turbo when load is light, same difference). Frame rate limiting and vsync leave TDP on the table, I wonder if the next step is power aware geometry/AA/compute ... this may be off topic for this discussion though.
__________________
http://twitter.com/cavemanjim

I work for AMD. The opinions posted here are my own and may not represent those of my employer.
caveman-jim is offline   Reply With Quote
Old 02-Jan-2013, 17:30   #23
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,334
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
7) This is precisely why it is important to measure frame latencies instead of frame throughput. We need to keep GPU vendors, driver writers and game developers honest and focused on the task of delivering smooth, consistent frames. For example (unrelated to the above case), allowing things like a driver to cause a spike one frame so that it can optimize the shaders and get higher FPS number for the benchmarks is not okay!

8) If what we ultimately care about is smooth gameplay, gamers should be demanding frame latency measurements instead of throughput from all benchmarking sites.
I don't know what is going on with Borderlands 2, I work on OpenCL. However, I think you are being a bit too judgmental here. First, when a new effect is used, the driver must compile the shaders involved. Yes, this means optimizing the shaders too. You can experience that as a "hitch" during gameplay, but once the effects are compiled, you won't experience that hitch again, unless the shaders are recreated (i.e. next level :P). Second, some features are emulated now (think fog). So if an API feature were enabled that required recompilation of shaders, then you might experience a "hitch". Third, consoles can avoid all of this easily as applications can ship precompiled shaders for every effect used.

OpenCL also allows for precompiled kernels to be saved/loaded. This is an important feature as some kernels are huge and can take several minutes to compile. This also allows for IP protection as you can just ship a stripped binary and not your source code. This would be a nice feature for Direct3D, but I don't imagine game developers would want to ship precompiled shaders for every possible GPU out there. So it would be better if the game could compile the shaders as they are used and save them to the hard drive. Then the game would just reload the binaries as they are used/needed. If you happened to change GPUs in your machine, then the game would have to recompile the shaders again, but that wouldn't be a huge deal unless you were changing GPUs constantly. Of course, this wouldn't necessarily catch cases where state changes caused recompilation. Perhaps you could create a query to check if there was recompilation.

PCs and consoles give different experiences, but some of this is not really under the control of the driver/GPU.
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote
Old 02-Jan-2013, 17:46   #24
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,271
Default

Quote:
Originally Posted by MJP View Post
I think that reviewers should just post a GPUView capture for each benchmark,and then we'll really know what's going on.
Haha, agreed! And ideally vtune traces with driver symbols too

Quote:
Originally Posted by silent_guy View Post
Are there technical reasons why review sites can't post raw fraps numbers for all the benchmarks they run?
Not really, no. As Scott mentions, you basically just check off an option in FRAPS that dumps the raw data to a file, and FRAPS works in basically everything.

Quote:
Originally Posted by caveman-jim View Post
What 99% doesn't show is uneven frame render times inside that target time.
Definitely true, and there are potentially some other metrics that would capture this better, such as some sort of running variance/deviation metric. I think there are definitely other interesting ways to analyze the data than what Scott has done for instance, but I just want to get over the first hurdle to start

Quote:
Originally Posted by caveman-jim View Post
I wonder how power limiting technology will affect gaming performance, right now GPU's clock down under full load to stay in TDP (yeah I know, the message is they turbo when load is light, same difference). Frame rate limiting and vsync leave TDP on the table, I wonder if the next step is power aware geometry/AA/compute ... this may be off topic for this discussion though.
It's definitely going to get really interesting, no doubt...

Quote:
Originally Posted by OpenGL guy View Post
I don't know what is going on with Borderlands 2, I work on OpenCL. However, I think you are being a bit too judgmental here.
As I've mentioned in previous responses, I wasn't referencing the AMD issue there specifically, just using examples of why FPS is a bad metric.

Quote:
Originally Posted by OpenGL guy View Post
Second, some features are emulated now (think fog). So if an API feature were enabled that required recompilation of shaders, then you might experience a "hitch".
Right, i.e. "state-based recompiles", but these get nastier than just fixed-function features. Still like changing certain rasterizer/blend state, bound texture types (2D/3D/Cube/etc) and so on can also cause recompiles and the state that is "dynamic" vs. "compiled" is implementation dependent and completely opaque. ... and it definitely does happen in the middle of gameplay. This sort of stuff has to stop in the long run, but it implies more general hardware and people don't want to pay the price.

But yeah, to reiterate, I'm not claiming that's what's happening here (it's probably more memory-related stuff, as usual), but it's yet another "hitch/stutter" that is ignored when measuring using FPS. Also, I think I'm allowed to be "judgmental" about the state of the industry here, both as a gamer and a developer at an IHV (although I don't specifically do drivers) I just want us all to concentrate on improving the gaming experience in this area a little bit... I don't think it'll be a ton of work, but it requires redefining our performance metrics.
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 18:20.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 17:55   #25
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,334
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
As I've mentioned in previous responses, I wasn't referencing the AMD issue there specifically, just using examples of why FPS is a bad metric.
Yep, I know. I was just saying that "avoiding all hitches" isn't practical in some cases, at least with current APIs.
Quote:
Originally Posted by Andrew Lauritzen
Right, i.e. "state-based recompiles", but these get nastier than just fixed-function features. Still like changing certain rasterizer/blend state, bound texture types (2D/3D/Cube/etc) and so on can also cause recompiles and the state that is "dynamic" vs. "compiled" is implementation dependent and completely opaque. ... and it definitely does happen in the middle of gameplay.

This sort of stuff has to stop in the long run, but it implies more general hardware and people don't want to pay the price.
More general hardware implies more recompilation, not less. It's the removal of fixed-function bits that trigger some of these things.
Quote:
Originally Posted by Andrew Lauritzen
But yeah, to reiterate, I'm not claiming that's what's happening here (it's probably more memory-related stuff, as usual), but it's yet another "hitch/stutter" that is ignored when measuring using FPS.
This is one issue with benchmarking. Getting a hitch on the first instance of a new effect might be annoying, but if you're using those same effects for the next 30 minutes, who will remember the hitch that happened 29 minutes ago?
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:05.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.