Shader Compilation on PC: About to become a bigger bottleneck?

DavidGraham · May 5, 2023

Regarding the increasingly extensive stuttering in the recent UE titles.

https://twitter.com/i/web/status/1654059161791987712

https://twitter.com/i/web/status/1654282977147224065

Flappy Pannus · May 5, 2023

DavidGraham said:
Regarding the increasingly extensive stuttering in the recent UE titles.

https://twitter.com/i/web/status/1654059161791987712

https://twitter.com/i/web/status/1654282977147224065

I bet someone will respond wit - ahhhh there it is

https://twitter.com/i/web/status/1654313861938556928

Scott_Arm · May 5, 2023

Jedi whatever has more serious problems than handling PSOs. Allocating memory in the render thread is just bad.

DavidGraham · May 8, 2023

In the last few releases, we seem to be getting a new kind of stuttering across consoles and PC.

https://twitter.com/i/web/status/1651467882260307968

https://twitter.com/i/web/status/1651467890946711552

https://twitter.com/i/web/status/1651467895866642433

DegustatoR · May 8, 2023

Ah, the benefits of consoles having ultra fast SSD storage!.. /s

DavidGraham · May 9, 2023

One of the developers speaks about the messy state of Vulkan/DX12.

There is only one problem, which is that with all this fine-grained complexity, Vulkan winds up being basically impossible for humans to write. Actually, that's not really fair. DX12 and Metal offer more or less the same degree of fine-grained complexity, and by all accounts they're not so bad to write. The actual problem is that Vulkan is not designed for humans to write.

Literally. Khronos does not want you to write Vulkan, or rather, they don't want you to write it directly.

I was in the room when Vulkan was announced, across the street from GDC in 2015, and what they explained to our faces was that game developers were increasingly not actually targeting the gaming API itself, but rather targeting high-level middleware, Unity or Unreal or whatever, and so Vulkan was an API designed for writing middleware. The middleware developers were also in the room at the time, the Unity and Epic and Valve guys. They were beaming as the Khronos guy explained this. Their lives were about to get much, much easier.

Vulkan is weird— but it's weird in a way that makes a certain sort of horrifying machine sense. Every Vulkan call involves passing in one or two huge structures which are themselves a forest of other huge structures, and every structure and sub-structure begins with a little protocol header explaining what it is and how big it is. Before you allocate memory you have to fill out a structure to get back a structure that tells you what structure you're supposed to structure your memory allocation request in. None of it makes any sense!

In short, Vulkan is not for you. It is a byzantine contract between hardware manufacturers and middleware providers, and people like… well, me, are just not part of the transaction.

Khronos did not forget about you and me. They just made a judgement, and this actually does make a sort of sense, that they were never going to design the perfectly ergonomic developer API anyway, so it would be better to not even try and instead make it as easy as possible for the perfectly ergonomic API to be written on top, as a library.

Khronos thought within a few years of Vulkan⁸ being released there would be a bunch of high-quality open source wrapper libraries that people would use instead of Vulkan directly. These libraries basically did not materialize.

It turns out writing software is work and open source projects do not materialize just because people would like them to.

I want to talk about WebGPU

WebGPU is the new WebGL. That means it is the new way to draw 3D in web browsers. It is, in my opinion, very good actually. It is so good I think it will also replace Canvas and become the new way to draw 2D in web browsers. In fact it is so good I think it will replace Vulkan as well as normal...

cohost.org

Rootax · May 9, 2023

DavidGraham said:
In the last few releases, we seem to be getting a new kind of stuttering across consoles and PC.

https://twitter.com/i/web/status/1651467882260307968

https://twitter.com/i/web/status/1651467890946711552

https://twitter.com/i/web/status/1651467895866642433

So, what's the problem here ? Why unloading and loading data is freezing everything ? Drivers ? Api ? Engine ?

pcchen · May 9, 2023

Rootax said:
So, what's the problem here ? Why unloading and loading data is freezing everything ? Drivers ? Api ? Engine ?

Even PCIe 4.0 x16 has only 32GB/s bandwidth, so loading 2GB of data takes at least 1/16 seconds, which means the FPS will be less than that (< 16FPS). That'd certainly cause a noticeable stutter.
It'd be even slower if you have to load them from storage.

Scott_Arm · May 9, 2023

@pcchen I think the issue is probably that they're not loading the data before it's needed, and it impacts their critical path. Basically you get a stall because the data is not ready because they're loading as needed or trying to load something just in time.

pcchen · May 9, 2023

Scott_Arm said:
@pcchen I think the issue is probably that they're not loading the data before it's needed, and it impacts their critical path. Basically you get a stall because the data is not ready because they're loading as needed or trying to load something just in time.

Yes, it could be a pacing issue. For example it'd be impossible to load the entire assets in an open world game, so some sort of streaming will be required. Generally you'll want to keep distant assets in lower LOD and near assets in higher LOD, and streaming according to current player location. However, it can be difficult to schedule the loading and I can imagine in some (maybe rare) case it's being scheduled too late so it has to wait for the assets to be loaded to continue.
It can be difficult to maintain a good schedule especially in an open world game where players can move freely (espcially if players can fly or move quickly). I don't know what's the best way to deal with this, but I hope something like DirectStorage and the recent NVIDIA paper on AI assisted texture compression might help. Also, that's also why I think a video card with larger memory could be a good insurance policy

Scott_Arm · May 9, 2023

pcchen said:
Yes, it could be a pacing issue. For example it'd be impossible to load the entire assets in an open world game, so some sort of streaming will be required. Generally you'll want to keep distant assets in lower LOD and near assets in higher LOD, and streaming according to current player location. However, it can be difficult to schedule the loading and I can imagine in some (maybe rare) case it's being scheduled too late so it has to wait for the assets to be loaded to continue.
It can be difficult to maintain a good schedule especially in an open world game where players can move freely (espcially if players can fly or move quickly). I don't know what's the best way to deal with this, but I hope something like DirectStorage and the recent NVIDIA paper on AI assisted texture compression might help. Also, that's also why I think a video card with larger memory could be a good insurance policy

The Spider-man GDC presentation had really good information about how they handled streaming and that was with an ultra-slow HDD

Rootax · May 9, 2023

pcchen said:
Even PCIe 4.0 x16 has only 32GB/s bandwidth, so loading 2GB of data takes at least 1/16 seconds, which means the FPS will be less than that (< 16FPS). That'd certainly cause a noticeable stutter.
It'd be even slower if you have to load them from storage.

Well, I get that if you're loading the assets just in time, but if you stream ahead, it shouldn't affect the fps. Or, are the gpu inefficient at loading assets on vram ? Like, can they work on current frame while loading/unloading at the same time ? I suppose they have dma engines to manage this.

pcchen · May 10, 2023

Rootax said:
Well, I get that if you're loading the assets just in time, but if you stream ahead, it shouldn't affect the fps. Or, are the gpu inefficient at loading assets on vram ? Like, can they work on current frame while loading/unloading at the same time ? I suppose they have dma engines to manage this.

There's no problem with that. The problem is the game tries to load too much data within one frame (probably because that particular frame needs these data, see discussions above).

Silent_Buddha · May 10, 2023

It could be that the game/developer knows that "these" assets will be used soon so they want to start filling VRAM with those assets. However, they don't have proper code in place to limit how quickly those assets are loaded. Thus those assets are loaded in as fast as the host system can transfer them. If this involves lots of small files (lots of IO overhead) then that will starve the game of system resources that might be critical (for example, transfer of many small files or high speed transfers can eat significant CPU time).

I could certainly see this happening with developers who haven't implemented a robust streaming system in the past for a variety of reasons. They may figure that the host system is best suited to handling any and all IO tasks and don't specifically code around making certain that their IO demands and code in game aren't themselves causing a bottleneck.

Host IO systems aren't there to try to predict how file transfers will impact your app code and execution, they are there generally just to deliver data as quickly as possible, as safely as possible or some combination of both. PS5, XBS, DirectStorage, Windows File System, etc. aren't going to ensure that the developer doesn't screw things up by initiating an uncontrolled large burst of IO traffic that might impact system resources that their currently running code might need.

Regards,
SB

DegustatoR · May 11, 2023

DX12 PSO Precaching

A new PSO precaching mechanism was introduced as experimental in 5.1 to improve PSO hitching in DX12 titles. Improvements to this system in 5.2 include:

We improved the performance and stability of the system. There were various corner cases which we needed to address.

We now skip drawing objects if their PSOs aren't ready yet. The system aims to have the PSO ready in time for drawing, but it will never be able to guarantee this. When it's late, it is now possible to skip drawing the object instead of waiting for compilation to finish (and hitching).

We reduced the number of PSOs to precache due to improved logic that omits ones which will never be used.

We improved the old (manual) PSO cache system so that it can be used alongside precaching.

Unreal Engine 5.2 Release Notes | Unreal Engine 5.2 Documentation

pjbliverpool · May 11, 2023

DegustatoR said:
Unreal Engine 5.2 Release Notes | Unreal Engine 5.2 Documentation

Looks like some really goo improvements there. Hopefully UE5.2 will spell the end for shader comp stutter once and for all in the engine. And if it doesn't exist in UE5, games built on other engines will hopefully put a lot more effort into resolving their own problems too.

trinibwoy · May 11, 2023

DegustatoR said:
Unreal Engine 5.2 Release Notes | Unreal Engine 5.2 Documentation

Wow the skipping objects things is interesting. What if the material changes on an object already in view and you need a fresh PSO for the new material? Will it disappear for a few frames?

Remij · May 11, 2023

pjbliverpool said:
Looks like some really goo improvements there. Hopefully UE5.2 will spell the end for shader comp stutter once and for all in the engine. And if it doesn't exist in UE5, games built on other engines will hopefully put a lot more effort into resolving their own problems too.

It won't. But less is always better.

Qesa · May 12, 2023

This seems similar to the stages the dolphin emulator went through. The GameCube had ridiculous pipeline permutations before it was cool, so dolphin was a stuttery mess. They added not rendering in lieu of stuttering too.

Their eventual solution was to add a shader interpreter as a fallback while the shader compiled in the background.

davis.anthony · May 12, 2023

I hope we get to try that UE5.2 Jeep demo

Shader Compilation on PC: About to become a bigger bottleneck?

DavidGraham

Flappy Pannus

Scott_Arm

DavidGraham

DegustatoR

DavidGraham

I want to talk about WebGPU

Rootax

pcchen

Moderator

Scott_Arm

pcchen

Moderator

Scott_Arm

Rootax

pcchen

Moderator

Silent_Buddha

DegustatoR

pjbliverpool

B3D Scallywag

trinibwoy

Meh

Remij

Qesa

davis.anthony

Similar threads