Tripple Buffering, Nvidia FastSync, Display Chains

pharma

Legend
Nvidia Fast Sync
Nvidia explains what is Fast Sync? This new technology is a new Sync technology, different from G-Sync. It's implemented in the GPU as a buffer.
 
Last edited:
FastSync enables GPU card to cache finished frames and only send the freshest one when the display is ready to show next frame. The game engine does not get throttled when rendering during any part of the work. It works best when rendering framerate is much higher than the display refresh rate.

It will work on Maxwell&Pascal cards and any monitor.
 
Last edited:
FastSync enables GPU card to cache finished frames and only send the freshest one when the display is ready to show next frame. The game engine does not get throttled when rendering during any part of the work. It works best when rendering framerate is much higher than the display refresh rate.

It will work on Maxwell&Pascal cards and any monitor.
And what's the difference to plain old triple-buffering with Vsync on? Either I don't understand triple buffering, or Nvidia just announced a 20 year old technique as their own invention...
 
And what's the difference to plain old triple-buffering with Vsync on? Either I don't understand triple buffering, or Nvidia just announced a 20 year old technique as their own invention...

I believe the buffer fills and backpressure causes latency again.
 
I believe the buffer fills and backpressure causes latency again.
There is no backpressure with triple buffering. Only with plain double buffering.

I know, the Nvidia guy presenting "Fast Sync" claimed that with triple buffering the buffers could "fill up", but that's just plain wrong. The two back buffers are flipped every time one is filled, and the outdated one is then overwritten, while the last fully rendered one may be flipped to the front. (Unless Nvidias implementation of triple-buffering was really screwed up, and their engineers actually didn't knew you had to do a back buffer flip, wrongly abusing the triple buffer as a chain instead. Not saying there aren't scenarios where you want that, but that's not what you call triple buffering.)

But the guy also claimed that V-Sync would be defined as every rendered frame also being displayed (hint: that's not what it means, only that the front buffer flip is synchronized with refresh blank), so I wouldn't give too much on what he says.

Buffer fills are a legit objection (at least if we consider the historic implementations), but buffer flip should actually happen via memory mapping nowadays.
 
And what's the difference to plain old triple-buffering with Vsync on? Either I don't understand triple buffering, or Nvidia just announced a 20 year old technique as their own invention...

SimBy already answered. I think with triple buffering you can still have back pressure if the card is fast enough. With Fast sync it will drop any extra frames so there's less concern about latency.

If triple buffering also discards extra frames then there's no difference.
 
GPU could render ahead creating a sequence that flip chain will have to follow (latency) even with triple buffering.
Yes, for Nvidia cards controlled/overridden by the "maximum pre-rendered frames" setting in the application profile. Unfortunately with an unreasonable high default, and only becomes active when also activating V-Sync, leading to the common missattribution of the resulting input lag to the V-Sync option.

I strongly believe that all "Fast Sync" actually is, is forcing triple buffering + V-Sync, plus setting "maximum pre-rendered frames" to 0.
Only the latter two options were exposed in the control panel before, but for <=DX11 titles, you could try to enforce triple buffering before with tools like D3DOverrider.
 
Yes, for Nvidia cards controlled/overridden by the "maximum pre-rendered frames" setting in the application profile. Unfortunately with an unreasonable high default, and only becomes active when also activating V-Sync, leading to the common missattribution of the resulting input lag to the V-Sync option.

I strongly believe that all "Fast Sync" actually is, is forcing triple buffering + V-Sync, plus setting "maximum pre-rendered frames" to 0.
Only the latter two options were exposed in the control panel before, but for <=DX11 titles, you could try to enforce triple buffering before with tools like D3DOverrider.

However this can be balanced by the TechReport article about the AMD Crimson edition drivers and when they say the following regarding Triple buffering:
Another change—and potential improvement—for e-sports players is an optimized flip queue size. The Crimson drivers can make use of only a single frame buffer in games where the additional input lag generated by triple-buffering doesn't make sense, like League of Legends or Dota 2.
For an idea of where this optimization takes place, have a look at our handy, if oversimplified, diagram of the frame production process:

Moving to a single buffer, as AMD's example above shows, can reduce input lag to 16.7 ms on a 60Hz display, versus 50 ms with triple-buffering enabled. AMD says this improvement makes mouse and keyboard input more responsive. We don't see a per-application setting for the flip queue size in Radeon Settings, so we're guessing that Crimson manages it automatically.
http://techreport.com/review/29357/amd-radeon-software-crimson-edition-an-overview

TBH I think in theory Triple Buffering is great, but it is not applicable to all situations, not even sure one can say it is even applicable to most.
Cheers
 
Eh... what? Why are they using "triple buffering" synonymous with a 3 buffers long swap chain?
Microsoft's documentation calls a Direct3D swap chain of three buffers "triple buffering"
Oh. I love it when terms are being reused like that...

So, no, that article refers to the "other" type of triple buffering following Microsoft's terminology (render ahead), not the original one (page flip).
Equivalent to "maximum pre-rendered frames", except that the Crimson driver doesn't expose this option.

Got to agree with Anandtech here:
If you are implementing render ahead (aka a flip queue), please don't call it "triple buffering," as that should be reserved for the technique we've described here in order to cut down on the confusion. There are games out there that list triple buffering as an option when the technique used is actually a short render queue. We do realize that this can cause confusion, and we very much hope that this article and discussion help to alleviate this problem.
http://www.anandtech.com/show/2794/4
 
Eh... what? Why are they using "triple buffering" synonymous with a 3 buffers long swap chain?

Oh. I love it when terms are being reused like that...

So, no, that article refers to the "other" type of triple buffering following Microsoft's terminology (render ahead), not the original one (page flip).
Equivalent to "maximum pre-rendered frames", except that the Crimson driver doesn't expose this option.

Got to agree with Anandtech here:
http://www.anandtech.com/show/2794/4
I really should had looked at the diagram but as they said simplified I skipped them lol.
I assumed as they referenced AMD they were not talking about the historical flip queue-render ahead (name I think Nvidia used).
Just curious if triple buffering was enabled how did Catalyst-driver handle the flip queue in the past?
Just wondering if this is where the historical debate about input lag is all coming from.

My bad :)
Edit:
NVM.
I see that AMD in the past recommend forcing flip queue to 1 (default is 3 which is why I was curious how it worked or if dynamic) when enable triple buffering.
This is through RadeonPro.

Cheers
 
Last edited:
Just curious if triple buffering was enabled how did Catalyst-driver handle the flip queue in the past?
Just wondering if this is where the historical debate about input lag is all coming from.
There is no triple buffering with Direct3D. The swap chain is a pure queue.

And yes, that's most likely the reason why most gamers associate triple buffering with high latency, respectively V-sync in general, as the swap chain is only used together with V-sync.

Even e.g. CS:GO has a "triple buffering" option, which would - following that logic - refer to an extra long flip queue, and not to actual triple buffering. Now it makes sense why there are so many complains about the poor performance of that option, and why users claim the could boost the performance of that option by enforcing "maximum pre-rendered frames", which essentially means that they just overrode the option they just activated.
I couldn't make any sense of this before.

But we have seen at least one example of real triple buffering on AMD hardware before, and that was DX12 version of AotS, except that the buffer flip was apparently implemented on the application side.

Btw.: Not even Nvidias speaker knew that there are two features by the same name: https://www.youtube.com /watch?v=WpUX8ZNkn2U&t=15m20s (Breaking it to preserve the timestamp.)
 
Last edited:
There is no triple buffering with Direct3D. The swap chain is a pure queue.

And yes, that's most likely the reason why most gamers associate triple buffering with high latency, respectively V-sync in general, as the swap chain is only used together with V-sync.

Even e.g. CS:GO has a "triple buffering" option, which would - following that logic - refer to an extra long flip queue, and not to actual triple buffering. Now it makes sense why there are so many complains about the poor performance of that option, and why users claim the could boost the performance of that option by enforcing "maximum pre-rendered frames", which essentially means that they just overrode the option they just activated.
I couldn't make any sense of this before.

But we have seen at least one example of real triple buffering on AMD hardware before, and that was DX12 version of AotS, except that the buffer flip was apparently implemented on the application side.

Btw.: Not even Nvidias speaker knew that there are two features by the same name: https://www.youtube.com /watch?v=WpUX8ZNkn2U&t=15m20s (Breaking it to preserve the timestamp.)

Just to add Direct3D toolwise there was RadeonPro and still is Riva Tuner (D3DOverrider) from a more general approach to triple buffering, although I have never seen any analysis on them in this specific setup.
Cheers
 
Just to add Direct3D toolwise there was RadeonPro and still is Riva Tuner (D3DOverrider) from a more general approach to triple buffering, although I have never seen any analysis on them in this specific setup.
Cheers
Coming back to the input lag with Triple Buffering.

OK I found a site (http://www.displaylag.com/) that is measuring input lag and also using these Direct3D tools to enable triple buffering; they used a fighter "arcade game" as these are probably the most sensitive to input lag due to requirements for combo's/competitions/etc.
At first I was not sure about the guy doing the tests, but looking further his results seem pretty accurate with his GSYNC and FreeSync; here he noticed a slight lag with GSYNC compared to VSYNC Off, and that FreeSync is slightly faster than VSYNC Off.

Anyway there does seem to be input lag caused by triple buffering ONLY when compared to pure VSYNC off performance - which is where most of us were coming from when comparing Triple Buffering performance.
In essence Triple Buffering causes no input lag when compared to purely VSYNC On.
Measurements were based off 60Hz test.
D3DOverrider I think involved Hilbert Hagedoorn who is also involved in MSI Afterburner.

Nvidia:
V-Sync OFF 59ms 3.5 frames
V-Sync ON (D3DOverrider) 115ms 6.9 frames
V-Sync ON + Triple Buffering (D3DOverrider) 120ms 7.2 frames

AMD:
V-Sync OFF 61ms 3.7 frames
V-Sync ON (D3DOverrider) 106ms 6.4 frames
V-Sync ON + Triple Buffering (D3DOverrider) 109ms 6.5 frames

The guy does understand and also analysed flip-queue and pre-rendered frames and sets them correctly if forcing triple buffering.
http://www.displaylag.com/reduce-input-lag-in-pc-games-the-definitive-guide/

The tables for AMD and NVIDIA are 2/3rds of the way down and titled: Nvidia GeForce Input Lag Results (60hz): , and AMD Radeon Input Lag Results (60hz):
Each result given in the table also has a youtube video showing 10 iterations captured to give an average.

Looks like he also based his approach from the B3D topic:https://forum.beyond3d.com/threads/...or-available-games-read-the-first-post.44653/
Targeted resolution was compiled based on developer statements, as well as analysis from Beyond3D

Cheers
 
Another consideration beyond my post above that can influence this discussion.
Fullscreen vs Windowed vs Borderless.

This will also have further implications with regards to the discussion and context.
Cheers
 
One thing I don't get is this is, AFAICS, just better in every way than Vsync. So why have Vsync at all going forward, just make this the new Vsync and do away with the old...
 
One thing I don't get is this is, AFAICS, just better in every way than Vsync. So why have Vsync at all going forward, just make this the new Vsync and do away with the old...
When comparing this straight to how Vsync was implemented in DX applications, there are actually a couple of differences. What typically makes Vsync so bad isn't delay for the synchronization, it's the render ahead queue stacked on top of it to preserve the GPU utilization despite Vsync. It's not wise to get rid of that though, as it also masks micro stutters (due to e.g. texture loads) quite well.

And it's not free of downsides either, as rendering frames which are not ever going to be displayed is just a pure waste of energy. Full double buffered Vsync with only 1-2 frames rendered ahead is a pretty efficient way to conserve energy, as it also doubles as a (desired) frame rate limit. Without the downsides of setting a frame rate limit without Vsync, as that may still cause tearing. With real triple buffered Vsync, you are wasting just as much energy as without Vsync.

If possible, the optimal solution is actually Vsync + frame pacing, as that gets you a minimal latency for the whole frame. Respectively if the frame pacing works correctly, turning Vsync on or off shouldn't make a difference (in theory!) as the present should be timed properly with the V-blank. It's pretty difficult to tune that correctly though, as every micro stutter in the render path will also cause the pacing to be thrown off.
 
Back
Top