My indexed deferred lighting system suffers from texture presition loss.

cubrman · Nov 16, 2013

Hi,

I am implementing indexed deferred rendering engine according to Damian Trebilco's article from here:

http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredLighting1.1.pdf

In my case I don't want to make a geometry prepass and will therefore capture every single lights source on a view ray. This will allow me to draw lit transparent obejcts. We are building a top-down game so in our case we only need to support up to 8 lights sources and that's all. To implement this I am using the bit-shifting method of storing lights IDS, like described in aforementioned article. I will explain it as it is the main point of the question.

In order to store every lights that hangs on a view ray for a given pixel, their indices (from 1 to 255 max) are sent into shader as four 8-bits variables, where only the top 2 bits hold the value of the index. I split the value at the CPU level and divide the values by 254.5f before sending them to shader (I tried 256 but it produced more precision loss, switching to 254.5 I managed to store up to 4 lights on SOME hardware without data loss instead of 3). The shader simply outputs the four variables, but the magic happens when the alpha-blending kicks in, here is the formula:

source*one+destination*0.25.

Multiplying by 0.25 shifts the bits in the destination texture down one level in order to free space for the new value. The new value is then stored via addition.

In the subsequent forward shader the indices are decoded with the following code:

float4 packedLight = tex2D(LIndexSampl,TC);
#define NUM_LIGHTS 256

// Unpack each lighting channel
float4 unpackConst = float4(4.0, 16.0, 64.0 , 256.0) / NUM_LIGHTS;

// Expand the packed light values to the 0.. 255 range
float4 floorValues = ceil(packedLight * 254.5f);

for(int i=0; i< 4; i++)
{
packedLight = floorValues * 0.25f; // Shift two bits down
floorValues = floor(packedLight); // Remove shifted bits
float texCoord = dot((packedLight - floorValues), unpackConst);

The most recently written light gets retrieved first. The loop then proceeds to light the object by reading light parametersfrom a 256x1 texture using the retrieved texCoord as lookup coordinates.

But the problem is, I can only store up to 4 (and on some hardware only 3) lights before the data I have in the texture start deteriorating (r8g8b8a8 surface). That means that if I draw another light into the texture, the bits get messed up which means that in the forward shader i can retrieve a completely different value (not even close to the original - can't fix by simply storing 3 (and even 5) spaces in the light's parameters texture for each light). However, according to the article, this technique should be able to store up to 16 unique light indices (as I understood the author tried and succeeded with this) and, moreover, it should support any number of overdraws as older light indices should simply be pushed out by math rules to free space for the newer ones. I did try to use HdrBlendable texture format which theoretically should easily support up to 8 lights, but it had precicely the same problems.

What am I missing? Why is my data getting messed up? Anyone has ideas?

For anyone interested, here is my XNA 4 project, which supports up to 4 lights (sometimes :/). Please try it and write if 4 lights work for you:

https://drive.google.com/file/d/0B8dysKzeYPBQM1JmbGdIa0NjQmc/edit?usp=sharing

cubrman · Nov 18, 2013

I have found a solution that allows for an (arbitrary) unlimited number of overlapping lights on any view ray. This method is based upon the stencil index packing method described in the original article.

To implement it you will need 1 32bits texture for every four overlapping lights plus one 32 bit DepthStencil buffer.

1. The light volumes are drawn in several passes into these textures, you will need one pass for every two overlapping lights you need to support.

2. The lights drawn should be sorted from the lowest index to the highest.

3. You use the stencil buffer to limit the number of lights being drawn at any one pass, by allowing only pixels with reference value less than 2 and incrementing the stencil on success.

4. You clean the stencil buffer after each pass.

5. The depth buffer is needed to specify which lights to skip. With shader model 3 you can output light index into the depth buffer with DEPTH semantics and at any one moment you only allow those pixels through, which value is greater than the value stored in the depth buffer. That is why the list of lights being drawn has to be sorted.

6. The contents of the depth buffer should be preserved until all the passes are drawn. If you use up to 4 textures you can start by drawing into COLOR3 (the furthest one) and then move up to COLOR0 gradually changing pixel shader output structures to only output into 4, 3, 2 and then 1 texture to preserve the data in the other textures (in XNA where I am working the buffer is attached to a RenderTarget). If you use more than 4 textures you can either have more than one DepthStencil buffer or set PreserveContents flag for the buffer. In the latter case you can call SetRendertargets after every two passes and only draw into the "distant" one.

7. The two BlendStates should be the same as in the Damian's article: they should both have MAX BlendFunction for colors and alphas and every BlendFactor set to ONE but the first should have write only to Red and Green channels and the second should write only to Blue and Alpha.

8. In the actual shader you output float4(Index, 1-Index,0,0) or float4(0,0,Index, 1-Index) depending on the BlendState.

Afterwards I use an intermediate shader to decode the maps with lights indices so that they have clear indices in their channels when I send them into the forward shader. Please refer to the original article for the decoding algorithm.

Now there is one more detail

. To make it all run at a reasonable FPS, you need to shrink the textures where you store indices. My textures are clientWidth/8, clientHeight/8, so 1/64 size of the original resolution. I tried 1/128 but it provides almost no additional benefit - the speed is lightning fast. The reason it does not affect the visual quality is because I manually shrink the attenuation in the forward shader, so there are some pixels whose calculate unnecessary zero lighting.
I have tested the system on four machines with very different GPUs (some 5-year old, 1 Nvidia GTX 560 ti) – it works perfectly.
We are building a top-down shooter (http://www.Z-H-I.com) and I am going to implement this lighting system there with up to 8 lights (maybe 12

). Theoretically this will allow us to draw lit transparent objects, have perfect alpha cutoffs in our sprites, have unlimited number of materials and use MSAA antialiasing. Just like guys from Radeon team did in their Leo demo with DirectX11

. The alpha version of the game is coming in the next two months. It will be free and I will post the link here so that you will be able to check how viable the system is.

Davros · Nov 18, 2013

Dont give up, one of the clever people will get round to giving you an answer
ps: its not a driver problem is it ?
if you could provide an exe people could test it for you

cubrman · Nov 18, 2013

I have actually posted a solution this morning. The website said: "your message will not be seen until it has been moderated". If it has been lost by the website's software, please contact me.

My indexed deferred lighting system suffers from texture presition loss.

cubrman

cubrman

Davros

cubrman

Similar threads