We have rather complex per pixel material shading in our engine. In addition to basic color texture and normal map, every material parameter can be adjusted in pixel precision. All this data needs be sampled for each rendered pixel, and this needs to be done in the deferred rendering geometry pass when the bandwidth is already eaten up by the multiple g-buffer writes.
Our artists do the following texture layers to each material:
- Color (rgb)
- Opacity (greyscale)
- Ambient multiplier (greyscale)
- Diffuse multiplier (greyscale)
- Specular multiplier (greyscale)
- Glossiness multiplier (greyscale)
- Self illumination (greyscale)
- Heightmap (greyscale)
- Normalmap (rgb)
Normalmap (tangent space) is usually automatically generated from heightmap. Ambient multiplier map is usually automatically generated by a ambient occlusion generation tool from object geometry and heightmap. Other maps are usually hand drawn by the artist.
Currently our texture management toolchain packs these 9 maps to 3 textures like this:
color.r, color.g, color.b, opacity (DXT5)
normal.x, normal.y, height, ambient (8888)
diffuse, specular, glossiness, illumination (8888)
In pixel shader, the normal vector z component is calculated as z=sqrt(1-x^2-y^2). The sign bit is not relevant as the map is in tangent space. All texture map material properties values are multipliers for the material value and are normalized to [0,1] range by our texture management tools.
With this setup the quality is excellent, but each pixel takes 9 bytes of memory.
Current and planned optimizations
Our current texture setup stores most of the data in a much higher precision than we need. Most of the data can be compressed (at least slightly) without any loss of image quality.
By going though our material content I noticed following things:
- RGB color is optimally stored in the DXT5 RGB channels. No change is needed.
- Opacity is 1.0 for all pixels in over 95% of our textures.
- Opacity does not need the high precision of the separate DXT5 alpha channel, as we only use it for alphatesting. We need more than one bit for smooth curvy clipping result, but 2-3 bits should give identical result as full 8 bits.
- Self illumination is 0.0 for all pixels in over 95% of our textures.
- In textures with self illumination, the self illumination affects only limited areas, and in those areas the illumination is so strong that it hides other material properties.
- Diffuse and specular are usually connected (both are often the same texture with different brightness and contrast and sometimes smoothing). Often glossiness is also connected to the specular.
- Material value multipliers (specular, diffuse, glossiness) do not need full 8 bit precision. These maps contain rarely any smooth gradients.
- Ambient needs less precision that other material channels, because our ambient cubemap lighting is rather dim. 1.0 can affect 0.15 to the final pixel value (without tone mapping).
- Height is important for parallax mapping to look good, and artifacts in it will look pretty bad.
First stage of optimization:
color.r, color.g, color.b, normal.x (DXT5)
opacity, height, illumination, normal.y (DXT5)
diffuse, specular, glossiness, ambient (8888)
The first step was to decide that I wanted at least two DXT5 textures. This way I could store the color nicely to the first one, and the normal vector (xy) to the both DXT5 alpha channels. This gives me identical normal map quality compared to 3Dc 'ATI2' format (as it's simply two DXT5 alphachannels glued together). As opacity is usually 1 and illumination is usually 0, I can put them in the DXT5 r and b channels and the height on the DXT5 g channel (g is 6 bits while r and b are 5). Because either opacity or illumination (or both) are almost always constant for the whole block, all the interpolation values are often be used for the height. This gives me enough quality for the height. For materials with much varying opacity and/or self illumination, the artist can choose full 8888 format for this combination texture. Less than 5% of materials should need this.
So far the memory and bandwidth usage per pixel has dropped from 9 to 6 bytes without any visible sacrifices in image quality. Less than 5% materials (by artist's choice) still use the 8888 format to store (opacity, height, illumination, normal.y) instead of the DXT5 format. The good thing is that this does not affect our code or shaders at all, as both formats have the same channels and same range (0,1).
How to get rid of the last uncompressed 8888 texture?
This is why I actually wrote this post. I'd like to have some feedback from the game developers who have written games to high end console platforms with proper texture compression support, as my own experience from packing textures comes from platforms like NDS and PS2/PSP. Using palette texture tricks is not something that cuts it in the high end side.
ATI1/ATI2:
One way to store these four channels is to use two ATI2 textures. This results in quality almost identical to the uncompressed 8888 texture and saves 50% of the data size and bandwidth. However I need to sample 2 textures instead of one, so there is a performance hit. With Radeon 1X-series ATI added support to one channel ATI1 format also, so I was kind of hoping that we get four channel ATI4 with the HD-series (or an official four DXT5 alpha channel format in DX10.1). That would have been a perfect format for my needs.
DXT5:
This would actually work for some textures and provide a nice 4x compression ratio. I'd store diffuse, specular and glossiness to the rgb components, as they are usually connected (have similar gradients). The ambient data is completely separate. The unconnected DXT5 alpha channel would suit it well. This only works for rather simple materials, but is surely worth testing.
4444:
16 values for diffuse, specular, glossiness and ambient multipliers (2x compression ratio). For ambient the precision is enough (as it cannot affect more than 0.15 of pixel value). This also should be ok for diffuse, but I fear it will cause banding in the specular highlights (smooth glossiness ramp is the worst case). Normalizing the channels to [0,1] range (and using the stored value to interpolate between low and high values in the pixel shader) would improve the quality. This should be enough for many materials, but not enough for the most demanding ones.
Currently I am leaning towards a mix of 8888 and 4444 for the last combine texture depending on how demanding the material is. DXT5 should also be used for the simplest materials (and channels sorted so that the DXT5 quality loss is manageable). Any advice on improvements is highly welcome!
Our artists do the following texture layers to each material:
- Color (rgb)
- Opacity (greyscale)
- Ambient multiplier (greyscale)
- Diffuse multiplier (greyscale)
- Specular multiplier (greyscale)
- Glossiness multiplier (greyscale)
- Self illumination (greyscale)
- Heightmap (greyscale)
- Normalmap (rgb)
Normalmap (tangent space) is usually automatically generated from heightmap. Ambient multiplier map is usually automatically generated by a ambient occlusion generation tool from object geometry and heightmap. Other maps are usually hand drawn by the artist.
Currently our texture management toolchain packs these 9 maps to 3 textures like this:
color.r, color.g, color.b, opacity (DXT5)
normal.x, normal.y, height, ambient (8888)
diffuse, specular, glossiness, illumination (8888)
In pixel shader, the normal vector z component is calculated as z=sqrt(1-x^2-y^2). The sign bit is not relevant as the map is in tangent space. All texture map material properties values are multipliers for the material value and are normalized to [0,1] range by our texture management tools.
With this setup the quality is excellent, but each pixel takes 9 bytes of memory.
Current and planned optimizations
Our current texture setup stores most of the data in a much higher precision than we need. Most of the data can be compressed (at least slightly) without any loss of image quality.
By going though our material content I noticed following things:
- RGB color is optimally stored in the DXT5 RGB channels. No change is needed.
- Opacity is 1.0 for all pixels in over 95% of our textures.
- Opacity does not need the high precision of the separate DXT5 alpha channel, as we only use it for alphatesting. We need more than one bit for smooth curvy clipping result, but 2-3 bits should give identical result as full 8 bits.
- Self illumination is 0.0 for all pixels in over 95% of our textures.
- In textures with self illumination, the self illumination affects only limited areas, and in those areas the illumination is so strong that it hides other material properties.
- Diffuse and specular are usually connected (both are often the same texture with different brightness and contrast and sometimes smoothing). Often glossiness is also connected to the specular.
- Material value multipliers (specular, diffuse, glossiness) do not need full 8 bit precision. These maps contain rarely any smooth gradients.
- Ambient needs less precision that other material channels, because our ambient cubemap lighting is rather dim. 1.0 can affect 0.15 to the final pixel value (without tone mapping).
- Height is important for parallax mapping to look good, and artifacts in it will look pretty bad.
First stage of optimization:
color.r, color.g, color.b, normal.x (DXT5)
opacity, height, illumination, normal.y (DXT5)
diffuse, specular, glossiness, ambient (8888)
The first step was to decide that I wanted at least two DXT5 textures. This way I could store the color nicely to the first one, and the normal vector (xy) to the both DXT5 alpha channels. This gives me identical normal map quality compared to 3Dc 'ATI2' format (as it's simply two DXT5 alphachannels glued together). As opacity is usually 1 and illumination is usually 0, I can put them in the DXT5 r and b channels and the height on the DXT5 g channel (g is 6 bits while r and b are 5). Because either opacity or illumination (or both) are almost always constant for the whole block, all the interpolation values are often be used for the height. This gives me enough quality for the height. For materials with much varying opacity and/or self illumination, the artist can choose full 8888 format for this combination texture. Less than 5% of materials should need this.
So far the memory and bandwidth usage per pixel has dropped from 9 to 6 bytes without any visible sacrifices in image quality. Less than 5% materials (by artist's choice) still use the 8888 format to store (opacity, height, illumination, normal.y) instead of the DXT5 format. The good thing is that this does not affect our code or shaders at all, as both formats have the same channels and same range (0,1).
How to get rid of the last uncompressed 8888 texture?
This is why I actually wrote this post. I'd like to have some feedback from the game developers who have written games to high end console platforms with proper texture compression support, as my own experience from packing textures comes from platforms like NDS and PS2/PSP. Using palette texture tricks is not something that cuts it in the high end side.
ATI1/ATI2:
One way to store these four channels is to use two ATI2 textures. This results in quality almost identical to the uncompressed 8888 texture and saves 50% of the data size and bandwidth. However I need to sample 2 textures instead of one, so there is a performance hit. With Radeon 1X-series ATI added support to one channel ATI1 format also, so I was kind of hoping that we get four channel ATI4 with the HD-series (or an official four DXT5 alpha channel format in DX10.1). That would have been a perfect format for my needs.
DXT5:
This would actually work for some textures and provide a nice 4x compression ratio. I'd store diffuse, specular and glossiness to the rgb components, as they are usually connected (have similar gradients). The ambient data is completely separate. The unconnected DXT5 alpha channel would suit it well. This only works for rather simple materials, but is surely worth testing.
4444:
16 values for diffuse, specular, glossiness and ambient multipliers (2x compression ratio). For ambient the precision is enough (as it cannot affect more than 0.15 of pixel value). This also should be ok for diffuse, but I fear it will cause banding in the specular highlights (smooth glossiness ramp is the worst case). Normalizing the channels to [0,1] range (and using the stored value to interpolate between low and high values in the pixel shader) would improve the quality. This should be enough for many materials, but not enough for the most demanding ones.
Currently I am leaning towards a mix of 8888 and 4444 for the last combine texture depending on how demanding the material is. DXT5 should also be used for the simplest materials (and channels sorted so that the DXT5 quality loss is manageable). Any advice on improvements is highly welcome!