Before we got *too* into the weeds - it's great to see folks getting some actual data but for people not familiar with low level GPU optimization, please try and avoid your initial reactions to things. It might seem crazy to see something like "this pass only uses 20% of the theoretical ALU throughput" but that's actually completely normal, and always has been for GPUs. Each pass will tend to hit different parts of the GPU harder, and there will always be a bottleneck somewhere.
Moreover bottlenecks are not as simple as "memory" vs "ALU". Even in cases where things are stuck waiting on memory, it's usually not as simple as adding more memory bandwidth. As GPU rendering becomes more complex there are more and more cases where performance is a complex function of cache hierarchies, latency hiding mechanisms, register file sizes and banking and so on. There's a reason why huge parts of these chips are devoted to register files and increasingly caches rather than just laying down more raw compute to pump those marketing numbers
So yes, it's all well and good to look at some profiles but unless you are experienced in looking at these things frequently please avoid generalizing what you're seeing to any statements about what constitutes "normal" or "efficient" use of a GPU for a given task.
Moreover bottlenecks are not as simple as "memory" vs "ALU". Even in cases where things are stuck waiting on memory, it's usually not as simple as adding more memory bandwidth. As GPU rendering becomes more complex there are more and more cases where performance is a complex function of cache hierarchies, latency hiding mechanisms, register file sizes and banking and so on. There's a reason why huge parts of these chips are devoted to register files and increasingly caches rather than just laying down more raw compute to pump those marketing numbers
So yes, it's all well and good to look at some profiles but unless you are experienced in looking at these things frequently please avoid generalizing what you're seeing to any statements about what constitutes "normal" or "efficient" use of a GPU for a given task.
You're going to hate this but... "it depends" a lot. There are a lot of factors that affect VSM performance from how much cache invalidation there is to how much non-nanite there is and so on. There's quite a lot of console variables and tools to adjust how they function for a given game, some amount of which does help you target specific budgets. The "constant time" thing is not really feasible to do in a strict sense since 1) it's not possible to perfectly predict how long something will take to render beforehand, especially for non-nanite geometry and 2) the pops you note can be pretty significant if it is not handled carefully. There are hysteresis tools though that can adjust resolution smoothly as you approach the page pool allocation size and similar. More importantly though, as VSMs try and match their resolution to the sampling resolution of the screen, they actually scale better with the primary dynamic res adjustments, unlike conventional shadow maps. Again, non-Nanite geometry is a bit of a wildcard as it's impossible to control the cost of it on the backend, but that's yet more reason to make everything Nanite.Regarding that: How much does rendering VSMs consume, and how much does tracing the shadowmap consume? A cheaper soft shadow map solution might like the current CoD one might save a ms or two on lower end hw. But for rendering VSMs themselves all I can think of is that "constant time vsm" trick, where VSM gets a constant time budget and only renders the highest mips it can within that budget. Sure the next frame could see a pop to higher res in some tiles, but it seemed to work well enough from what I saw of it.
Last edited: