Synchroniszation issues with SLI and CrossFire

Controls how frames are synchronized, If a GPU workload balance shifts. Theres a change to the way they sync. Often in CPU limited situations. SLI will constantly repredict the synchronization of the frames. ((it can happen in pure GPU limited scenerios too)) This issue is actually far more prominent in areas where you end up CPU limited. There are tweaks you can make to the profile bits on how this prediction will occur. But generally its "smart". If your frames lose synchronization the drivers will repredict the GPU load and resync them.

Say you render 10 frames in 20 MS intervals. Then your frame synchronization spikes to 50 or 100 MS. Usually SLI drivers will resync the frames back down into the lower intervals. Its not always 100% successful and some games will have more problems than others. Typically in my experience the higher your ZFill/Pixel bottleneck. The less likely you'll have uneven frame distribution or poor GPU scaling.

The NWN2 example I gave above was an area where I forced the bottleneck to shift by rapidly changing the GPU load. And you'll see more "Spikes" in that example of frame syncing. Nvidia has a limited load visual indicator which helps illustrate the GPU load and the syncing that occurs. The more it fluctuates the more likely you are seeing frame sync issues.

Chris


Below is an example of perfect frame distribution. With little to no fluctuation. But of course not all titles behave this ideally.

 
Last edited by a moderator:
Controls how frames are synchronized, If a GPU workload balance shifts. Theres a change to the way they sync. Often in CPU limited situations. SLI will constantly repredict the synchronization of the frames. ((it can happen in pure GPU limited scenerios too)) This issue is actually far more prominent in areas where you end up CPU limited. There are tweaks you can make to the profile bits on how this prediction will occur. But generally its "smart". If your frames lose synchronization the drivers will repredict the GPU load and resync them.

Say you render 10 frames in 20 MS intervals. Then your frame synchronization spikes to 50 or 100 MS. Usually SLI drivers will resync the frames back down into the lower intervals. Its not always 100% successful and some games will have more problems than others. Typically in my experience the higher your ZFill/Pixel bottleneck. The less likely you'll have uneven frame distribution or poor GPU scaling.

The NWN2 example I gave above was an area where I forced the bottleneck to shift by rapidly changing the GPU load. And you'll see more "Spikes" in that example of frame syncing. Nvidia has a limited load visual indicator which helps illustrate the GPU load and the syncing that occurs. The more it fluctuates the more likely you are seeing frame sync issues.

Chris


Below is an example of perfect frame distribution. With little to no fluctuation. But of course not all titles behave this ideally.


I appreciate the pissed off passionate ChrisRay but like these posts from ya a lot better!:)
 
Controls how frames are synchronized
You keep using that word. I do not think it means what you think it means ;) Do you mean it controls the consistency of inter-frame delays? That's neither synchronization nor really a load balancing issue, you can have interframe delays of 1-100-1-100-1-100.. with identical loads for the GPUs (both near 100%, since this is clearly a wildly GPU limited situation).
If a GPU workload balance shifts. Theres a change to the way they sync.
Actual consistent load imbalance with AFR is almost impossible unless the game engine has some special work it does consistently only every so many 2n frames. Inter-frame delays can obviously get unbalanced because of a slow frame in GPU limited situation, but that doesn't represent a shift in load balance. I understand what you are saying, but you are pretty much redefining the meaning of the words "synchronization" and "load balancing" to do it ... which is not a great idea.

Anyway, it seems NVIDIA already adds delays to frame out to get some consistency in them in certain situations (predictably it's turned off completely for benchmarks).
 
I'll be the first to admit my vocabulary is not as wide as I'd like it to be. I'm perfectly willing to accept a new term for definition ;)

The visual indicator is actually pretty good at showing "MS spikes" and inconsistencies. Even without noticing a "Stutter" I can look at the load visual indiciator and see where the spikes occur and when. It's really interesting to watch if you actually pay close attention to it. Which I do in pretty much all my benchmarking. Its one of the ways I determine software scaling amongst other things. Really handy tool but can be misinterpreted as well. Its also great for determining weak spots such as Crysis on "High" Settings at higher resolutions. Which show lots of fluctuations due to running out of memory.

One of the interesting things I've noticed is. The more GPUS you add. ((Something aaronspink pointed out)) the less the inconsistency occurs. I've seen it more prominent in 2 way than I have in 3 way or 4 way. While Quad 9800GX2 certainly has its fair share of limitations which really prevent it from shining ((512 being a big killer and artificial limiter)). It does allow you to really experiment with the different AFR modes and how they work.

Anyway, it seems NVIDIA already adds delays to frame out to get some consistency in them in certain situations (predictably it's turned off completely for benchmarks).

Actually. The 3dmark06 benchmark uses a generic AFR2 profile. ((0x0x240005)) which is the identical profile UT3 uses. So theres really no differences in the profile bits for how these two softwares identify their AFR rendering mechanism believe it or not. But 3dmark06 is a wierd benchmark to begin with. Since its vastly CPU limited in spots. Especially the second game test which seems strongly tied to vertex setup limitations. An area where I have found SLI to be quite weak in scaling.

Theres no other app detection mechanism for SLI compatibility that I'm aware of. With the exception of Vantage which detects SLI and tends to respond accordingly. This is the only software I know of that lacks that transparency. ((0x00400005)) is actually the most common profile bit in AFR rendering. I'll let some people here guess what it does. Also keep in mind that "C" shares the same hex value. :).

Chris

Chris
 
Last edited by a moderator:
How does a layman or an average person's knowledge suppose to know what an inter-frame delay is?

That's the thing, some may be calling it load-balancing and it may by mythical from some point-of-views based on its exact definition but the point is AFR is not as smooth as a single GPU.
 
profilebit004tq3.png



For those of you who are really interested into hacking into Nvidia's profile bits. Nhancer does have an easy to manage tool for doing it. I really suggest those who are into it. Not all of Grestorm's information on the profile bits is accurate. But he does have a general idea of how to recognise the hex values. And for the beginning SLI tweaker. This is a good software to toy with. While I cant explain in detail exactly what these profile bits do specifically. I think that you will find many of them will behave differently under different circumstances. And some do behave better in more CPU limited titles. The big problem is. What is CPU limited now may not be CPU limited with tommorrows architecture. And its hard for Nvidia at times to setup the "perfect" profile for all types of hardware.

Chris
 
I'll be the first to admit my vocabulary is not as wide as I'd like it to be. I'm perfectly willing to accept a new term for definition ;)

The visual indicator is actually pretty good at showing "MS spikes" and inconsistencies. Even without noticing a "Stutter" I can look at the load visual indiciator and see where the spikes occur and when. It's really interesting to watch if you actually pay close attention to it. Which I do in pretty much all my benchmarking. Its one of the ways I determine software scaling amongst other things. Really handy tool but can be misinterpreted as well. Its also great for determining weak spots such as Crysis on "High" Settings at higher resolutions. Which show lots of fluctuations due to running out of memory.

One of the interesting things I've noticed is. The more GPUS you add. ((Something aaronspink pointed out)) the less the inconsistency occurs. I've seen it more prominent in 2 way than I have in 3 way or 4 way. While Quad 9800GX2 certainly has its fair share of limitations which really prevent it from shining ((512 being a big killer and artificial limiter)). It does allow you to really experiment with the different AFR modes and how they work.



Actually. The 3dmark06 benchmark uses a generic AFR2 profile. ((0x0x240005)) which is the identical profile UT3 uses. So theres really no differences in the profile bits for how these two softwares identify their AFR rendering mechanism believe it or not. But 3dmark06 is a wierd benchmark to begin with. Since its vastly CPU limited in spots. Especially the second game test which seems strongly tied to vertex setup limitations. An area where I have found SLI to be quite weak in scaling.

Theres no other app detection mechanism for SLI compatibility that I'm aware of. With the exception of Vantage which detects SLI and tends to respond accordingly. This is the only software I know of that lacks that transparency. ((0x00400005)) is actually the most common profile bit in AFR rendering. I'll let some people here guess what it does. Also keep in mind that "C" shares the same hex value. :).

Chris

Chris

Damn, you are giving your secrets away! =) NOW, are you really serious about us guessing what the most common bit in AFR does?
--Wouldn't you just say some people are just really bothered by MS; others aren't?

i certainly *can see* it; but then for me it is, so what; almost all of the time when i use AFR in Crossfire it is to get "smoother and faster" with more detail - better IQ [even IF i don't use CFAA]. It just works practically for me - much better - to have a 2nd 2900 in Crossfire than a single one - if the game i am playing scales OK

Now, usually, i much prefer to run with my single 8800GTX over my 2900xt Crossfire, IF the minimum FPS is OK on the Nvidia card. i get a lot more pleasure playing with it overall; and i am really looking forward to a single GT280 and possible a 2nd one if i get the opportunity - IF i also upgrade form 16x10 to 25x16 which is my hope, for a 24" display or something smaller than 30".
Finally, where was your image taken from? [excuse my MMORPG ignorance; i am FPSer/RPG and just now - this month, into LotRO] .. and what was it supposed to show in a single image?

And really finally, i really do like nHancer .. and it was Grestorn who suggested adding the 2 ms delay to the 2nd card. Too bad there are no profiles for AMD HW and something comparable to nhancer
[ahancer =P]
 
Last edited by a moderator:
The sacrafice is higher than 10% FPS. You dont dynamically balance and sync the frames. Your framerate will eventually drop below that of a single card.
WHY? Prove it ;)

Actually there was a user at 3dcenter.de that coded a small tool (directx hook) that FORCED the framerate at 30fps by adding a small delay of (30ms-frametime) to each frame. That way no frame could be put out faster than 30ms.

And guess what: IT WORKED. Perfect distributed frametimes WITH SLI ENABLED ;)

I have the tool here on my pc, but i won't share it.
That user is said to work for ATI and suddenly he disappeared from our forum and was never seen again...

That tool was an EXTERNAL tool not even working inside the driver, so don't tell me Nvidia doesn't have the skill to solve this little problem with full access to the driver and superhigh skilled coders ;)

They just fear to lose a little bit of average fps, that's all.
 
Btw, all examples of good distribution of frametimes with games at >60fps, mainly low graphics quality games, are totally useless, because we all know that "uncomplicated" games are running very smoothly with SLI.

New games show higher and higher load on shader complexity, which means shifting the load from cpu to gpu.
Vantage at the extreme preset doesn't even need more than 4% of my Quadcore @ 4GHz (i can see the cpu load in realtime on my G15 display), it all runs almost entirely on the gpus!

And of course, vantage shows the WORST distribution of frametimes ever seen- but this is the future, this is what we will see with SLI!

People don't buy to SLI to get a jump in fps from 100 to 200. They buy SLI because they want to reach playable framerates where a single card would fail.
This means: the magical 30fps!

Verdict:

Future Games + SLI AFR+ 30fps = Hell ;)
 
WHY? Prove it ;)

Actually there was a user at 3dcenter.de that coded a small tool (directx hook) that FORCED the framerate at 30fps by adding a small delay of (30ms-frametime) to each frame. That way no frame could be put out faster than 30ms.

And guess what: IT WORKED. Perfect distributed frametimes WITH SLI ENABLED ;)

I have the tool here on my pc, but i won't share it.
That user is said to work for ATI and suddenly he disappeared from our forum and was never seen again...

Yeah, as far as I see it, there's no reason for nv/AMD to recommend or built into drivers such a frame limiter. At least, as long as most people buy cards acording to benchmark results.
 
Anything that makes the situation less GPU limited diminishes the effect.

See thats where my tests have shown me otherwise. I pretty much work with highly GPU limited scenerios 1920x1080 + with 16xQ/16xAA with transparency SS enabled ect as I deal basically with high end SLI on a regular basis. ((or if I need too 16xS/32xS for older titles)) The more established your your pixel/zfill bottleneck the better SLI performs for me. For instance my BF2142 ((obviously CPU limited title)) was running @ 1920x1080 @ 16xS AA ((which is 2x2 OGSS * 4x RGMS)) which is high pixel fillrate bound. Anyway my point is when working with SLI I pretty much always aim to make myself as GPU limited as possible. Even in my reviews which I aim at SLI enthusiasts I do this.

When titles start becoming CPU limited. ((Everquest 2 is a good example and a big title where I change my AFR rendering bit from 0x0040000D to 0x024000D)) just because of how CPU limited the title is. And massive stuttering it causes. This is the profile I give out to all users who complain about EQ 2 performance. Just for that reason.

In the cases of games like HL2, I just tell users to shut SLI off or enable SLIA32xQ or SLI16xAA. Because it simply causes more problems than its worth in that game due to it being so highly CPU bound.


On the other subject:

As far as the "Frame limiter" goes. Yes you can limit your framerate and cause a more harmonious sync between frames. This even works with Single GPUs... I'm talking about maximized performance with vsync off. In which case making a perfect 15-30-45-60 would not work.

Chris
 
Who really games with v-sync off though, if you're going for eye-candy at least? I only turn off v-sync for benching, never for gaming anymore.
 
On the other subject:

As far as the "Frame limiter" goes. Yes you can limit your framerate and cause a more harmonious sync between frames. This even works with Single GPUs... I'm talking about maximized performance with vsync off. In which case making a perfect 15-30-45-60 would not work.

Chris
And what do you think would look better: 30fps synchronized or 35fps unsynchronized? ;)

Of course 30fps in sync!

You can choose your settings for a ~35fps average framerate and then lock it to 30fps -> perfect gaming with SLI ;)

But that is just an expedient- there a better ways within the driver, as i mentioned before- when Nvidia/Ati are willing to sacrifice a small amout of avg fps.

I have a feeling that this is never going to happen as long as all major reviewing sites just emphasize "holy avg fps" ...
 
I wouldn't mind that option if it would offer superior smoothness because for me it's always the lows that are paramount and highs, can't tell the difference anyway.

Sure, it may not look sexy selling a product but it's not like you're forced to use an option like this; and can be a higher quality setting for superior smoothness attributes while keeping lower frame-rates nice and steady.

I suppose the idea is to sell product after all and understand that but still feel that offering more quality options for the gaming experience would help sell product as well. If a gamer desires or believes he needs more frame-rate -- still could have it.............but why force the ones that desire more consistency here with lows with just offering views into the wind so-to-speak?

Still appreciate AFR though and multi-GPU's - just like to see it improve and evolve and become more seamless. It's not like they don't know and no doubt have plans and ideas to improve this some-how.......just can't wait for these days to come to fruition.
 
Back
Top