CPU Limited Games, When Are They Going To End?

Certain tasks just don't scale past a certain number of cores because the added latency goes up with every core added, and eventually gets to a point where the latency penalty outweighs the performance improvement.

Cores don't inherently add latency. Instead it depends on the cache topology etc.
 
Very good blog from Durante (game dev & former modder who worked on fixing the Dark Souls PC port) detailing how to do CPU optimizations to essentially triple fps.

Here are some interesting tidbits regarding multi core optimizations and high fps stabilization.

Parallelization Step:

This is by far the most exciting step, and also the one that is really a rather questionable idea. When we started to work on the port, the game had a main update thread, a graphics thread, and several background threads for audio and asynchronous tasks. Of those, only the graphics and main update thread do any real CPU work, and only the main thread was limiting performance.

Looking into that main update thread further, we discovered that a large chunk of time is spent updating all the actors in the scene individually -- that includes characters, monsters, pickups, and basically everything else you can see. as well as some things you can't see such as event triggers. So the basic idea is rather obvious: perform these updates in parallel.

Of course, in reality it's not quite that simple, since each of these update steps can and will interact arbitrarily with some global system or state that was developed under the assumption that everything is sequential. The surprising thing is really that we got it to work, but a lot of development time went into debugging problems that arose due to this parallelization. Ultimately, due to the level of synchronization required, the actor updates only really scale up to 3 threads -- but the improvement is still substantial, especially since it is most pronounced in the hardest-to-run areas (with more individual actors).

GPU Query Step:

The final step on the GPU optimization journey was more related to improving frametime stability at high framerates rather than pure performance. The image shows a frametime chart comparison, and you can see that not only is the release version much faster, it is also a lot more stable. It is important to note here that this is without a frame limiter or V-sync -- because you usually only see such a flat frametime chart with one of those engaged.

What we did here is purposefully introduce a one-frame-off synchronization point between GPU and CPU progress. My initial thought was that this would improve stability but reduce framerate, and so it would have to be a setting, but in actual testing across several configurations it turns out that it improves both. I'm not 100% sure why, but my theory -- after investigations with Tracy -- is that it has to do with thread scheduling decisions from the OS being improved in this case.

https://steamcommunity.com/games/2731870/announcements/detail/4666382742870026336


f318a0b154e91a5580dd6d413579c9109b2fd299.png
 
Back
Top