dx9 shadow maps (drawcall limitations)

Infinisearch · Sep 26, 2020

I was wondering how in the dx9 era were shadow maps done? I am referring to draw call limitations BTW, as in how does one use shadow maps if your draw call budget was only 3 to 6 thousand draw calls per frame. Just light maps with a clever use of shadow maps on some dynamic objects? But I noticed some modern games used dx9 a few years ago. So was wondering about that as well???

corysama · Mar 4, 2021

Infinisearch said:
I was wondering how in the dx9 era were shadow maps done? I am referring to draw call limitations BTW, as in how does one use shadow maps if your draw call budget was only 3 to 6 thousand draw calls per frame. Just light maps with a clever use of shadow maps on some dynamic objects? But I noticed some modern games used dx9 a few years ago. So was wondering about that as well???

I did the shadow map implementation for Star Wars: The Force Unleashed for the Xbox 360 and PS3. The engine was a mostly-straighforward DX9 forward renderer.

We experimented with perspective shadow maps and dual-paraboloid maps, but settled on parallel cascade directional shadows and straightforward frustum shadows for spot lights. As a half-measure, point lights could cast shadows, but only in a single frustum like spot lights. This was fine for point lights up in a corner of the room. The main issue was that the allocated a fixed time and memory budget of 3 shadow maps per frame. In a given frame, this could be utilized by 3 cascades, 3 point/spot lights, or a 2:1 or 1:2 mix of the two styles. A priority system would weigh lights' brightness, size, distance and an artist-assigned priority each frame to allocate maps to lights. Every object had two bits to indicate if it was a shadow caster and/or receiver. Lights and objects also each had a 32-bit mask controlled by the artists. A light and an object had to share at least one bit for the light to affect the object (default for both was 0x0001).

In a frame, the game would render the 3 shadow maps, then the depth pre-pass. With that we could render the 3 shadows to a 4-channel, full-screen, 8-bit-per-channel map using a single PCF sample. These 3 channels were then blurred with a screen space constant sized depth-bilateral Gaussian blur that used the 4th channel as intermediate storage for a some extra quality hacks. Screen-constant sized blur had the nice effect of making close-up shadows have "small" penumbra while distant shadows had "large area" penumbra in world space

The forward renderer would process 4 or 8 lights in a single pass. There was a single code path where all lights did the full math for a spot light regardless of their type. Point lights would be configured in data as if they were spotlights with 360 degree cones. Directional lights would be point lights moved "far away" along their direction and with zero falloff. The branchless pipelining of 4 lights was faster than special-casing minimal work for two lights. There was also a tetrahedral mesh of spherical harmonic light probes that objects would interpolate as they moved through. Artists could create lights that exist only to be baked into the probes.

Objects that receive shadows would read the 4-channel screen-space shadow map and put it the 3 useful channels through a 3x4 matrix to multiplex the 3 shadows vs. 4 lights. We had a "shadow intensity" parameter on each light. Default 1.0 was normal shadow behavior. Less than 1 would allow light leaking through the shadow. This was useful for avoiding pure black shadows when you only have a single light. Greater than 1.0 would have normal shadow behavior, but would additionally cause the shadow from the over-intense light to block other lights by 1.0-shadow intensity. This was useful for preventing the shadow from being washed out when there are many non-shadow-casting lights.

We did quite a lot of research and implementation for baked lighting, but the game actually shipped with very little of it. Almost everything was completely dynamically lit. Even with distributed, GPU-based baking, we couldn't get the workflow for baked lighting to be faster than the instant feedback of dynamic lighting.

So, how was it done? Very carefully to extract the best results from tiny budget while also keeping the human effort minimal.

Ethatron · Mar 5, 2021

My observations stem from Oblivion. It's a DX9 renderer and uses dual-paraboloid shadow maps. In general I only observed the sun as a "shadow" source. As with most state based APIs you can not lump normal draws and shadow draws into the same class. There is very little state-change going on between individual shadow draws, no textures, same shader and so on. This makes both CPU and GPU eat through those draws at a much higher rate. As there's almost no state change anyway besides vertex streams, it was easy to simply merge the stuff into larger batches, removing even the intrinsic overhead of the draw command itself. In some games this is the reason why you have only dynamic light(s) + static object(s) producing shadows. You can surely program a bit more, and mix large static blobs with some small dynamic stuff, if it's affordable.

dx9 shadow maps (drawcall limitations)

Infinisearch

corysama

Ethatron

Similar threads