Basic,
<thinking stream of consciously>, here's an idea
Assume 4x4 block, 128-bit output.
Store 2 base normals "losslessly" @ 16-bits (I use in quotes, I mean, store with 0.01-0.02 radian resolution quantized on unit sphere)
That leaves 96 bits left for 16 texels or 6 bits per texel. Use spherical interpolation to map 3-bits to each major sphere axis between the two end-points. This yields 64 different normal directions per block uniformly distributed over a cross section of the sphere, hopefully one that is small.
If you're willing to make assumptions about frequency, you can do better and support a 64-bit method.
Store 1 16-bit normal. That leaves 48-bit left over 16 texels, or 3-bits per texel. Define a neighborhood around the central normal over which you expect 7-different normals to fall and use the 3-bit index to lookup into an implicit table around this center.
The latter is only useful if you have good tools to detect when it can and can't be used to produce good results.
Using the 'neighborhood' approach, you could also store 1 16-bit normal, and be left with 128 normals in your defined neighborhood (11-22.5 degrees?)
Of course, all of these methods overlook redundancy in the data topology, which might be better served by using VQ-like techniques.
<thinking stream of consciously>, here's an idea
Assume 4x4 block, 128-bit output.
Store 2 base normals "losslessly" @ 16-bits (I use in quotes, I mean, store with 0.01-0.02 radian resolution quantized on unit sphere)
That leaves 96 bits left for 16 texels or 6 bits per texel. Use spherical interpolation to map 3-bits to each major sphere axis between the two end-points. This yields 64 different normal directions per block uniformly distributed over a cross section of the sphere, hopefully one that is small.
If you're willing to make assumptions about frequency, you can do better and support a 64-bit method.
Store 1 16-bit normal. That leaves 48-bit left over 16 texels, or 3-bits per texel. Define a neighborhood around the central normal over which you expect 7-different normals to fall and use the 3-bit index to lookup into an implicit table around this center.
The latter is only useful if you have good tools to detect when it can and can't be used to produce good results.
Using the 'neighborhood' approach, you could also store 1 16-bit normal, and be left with 128 normals in your defined neighborhood (11-22.5 degrees?)
Of course, all of these methods overlook redundancy in the data topology, which might be better served by using VQ-like techniques.