A new PSO precaching mechanism was introduced as Experimental in 5.1 to improve PSO hitching in DX12 titles. Improvements to this system in 5.2 include:
- We've improved the performance and stability of the system. There were various corner cases that needed to be addressed.
- We now skip drawing objects if their PSOs aren't ready yet. The system aims to have the PSO ready in time for drawing, but it will never be able to guarantee this. When it's late, it is now possible to skip drawing the object instead of waiting for compilation to finish (and hitching).
- The number of PSOs to precache has been reduced due to improved logic that omits ones that will never be used.
- We've improved the old (manual) PSO cache system so that it can be used alongside precaching.
Pretty much. I think that's how I'd prefer it to be.So instead of hitching we'll have missing objects like an LOD pop in?
Pretty much. I think that's how I'd prefer it to be.
In my mind ideally devs would create as full of a PSO cache as possible and precompile up front, and then hopefully
Yep. I'd rather have an object in the scene pop in some milliseconds late rather than have the entire thing stutter for that amount of time.Yeah I think that's a better compromise. The effect causing the stutternis likely just one small part of the screen and thus it not loading, or loading slower it likely to be preferable to the whole.image stuttering and potentially impacting gameplay.
Developer Tools
DANIEL TUTINO-GALLETTI
Posted on March 2
General updates and quality-of-life improvements for developer iteration in UE 5.2 include:
- Visual Studio 2022 is now default
- All enabled warnings for the Clang Static Analyzer have been addressed
- C UnrealHeaderTool is deprecated and replaced with C# UnrealHeaderTool
- Optimizations to Garbage Collection performance and memory use
- UI and usability improvements to Unreal Insights components
Yep. I'd rather have an object in the scene pop in some milliseconds late rather than have the entire thing stutter for that amount of time.
Another interesting thing I noticed when looking at the roadmap was that with UE5.2 they have improved Garbage Collection performance and memory use.
I know that garbage collection/memory management is an often cited reason for hitching in PC games alongside compilation stuttering.. so I wonder about this point. Would be nice if it helps reduce traversal stuttering while loading in new areas/assets.
Anyone have any good links that can help me understand more about this? I understand that the memory setup between PC and console is very different.. but I find it odd that many PC games have these traversal stutters where consoles don't, considering we have faster CPUs and far more memory. I suppose the game is just coded in a more general way which favors the console design.. but I've always wondered why PC games can't just load things in with a more fine grained approach and get more of that data into RAM further ahead of time?
I actually wonder if it has more to do with the decompression happening on the CPU than anything else. DirectStorage would also help alleviate this particular issue.
Yea, that update is seemingly for the engine tools.. but if that doesn't apply to retail games, then why do some developers blame stuttering in games on garbage collection?I’m guessing garbage collection is related to the engine tools and not the real-time engine? I’d be very surprised if there were any gc, unless there’s a scripting language or blueprints that use gc.
Yea, that update is seemingly for the engine tools.. but if that doesn't apply to retail games, then why do some developers blame stuttering in games on garbage collection?
Performance
One natural question to ask at this point is whether all this new flexibility comes at some performance cost. After all, if pipelines as they were originally conceived needed so many more restrictions, how can those restrictions be rolled back without negative consequences?
On some implementations, there is no downside. On these implementations, unless your application calls every state setter before every draw, shader objects outperform pipelines on the CPU and perform no worse than pipelines on the GPU. Unlocking the full potential of these implementations has been one of the biggest motivating factors driving the development of this extension.
On other implementations, CPU performance improvements from simpler application code using shader object APIs can outperform equivalent application code redesigned to use pipelines by enough that the cost of extra implementation overhead is outweighed by the performance improvements in the application.
In either case, all conformant VK_EXT_shader_object implementations are tested to meet specific performance requirements:
These tests are intended to establish a minimum performance bar for VK_EXT_shader_object implementations that developers can rely on. This means that if a driver advertises support for VK_EXT_shader_object, you can depend on it to perform well.
- Draw calls using shader objects must not take more than 150% of the CPU time of draw calls using fully static graphics pipelines
- Draw calls using shader objects must not take more than 120% of the CPU time of draw calls using maximally dynamic graphics pipelines
- Dispatch calls using compute shader objects must not be measurably slower than dispatch calls using compute pipelines
- Creating a shader object from binary shader code must not take more than 150% of the CPU time of the cost of copying an equivalent amount of data into device local memory
If you’re interested in the details of this extension’s performance goals and design criteria, or just more information about some of the motivations that drove the development of this extension, please see the formal extension proposal.
In summary, shader objects impose substantially fewer restrictions on applications compared to pipelines, and enable dynamism-heavy applications like games and game engines to avoid the explosive pipeline permutation combinatorics, which until now might have been seen as a cost of admission for access to modern graphics APIs.