High Rez Textures and Texture Compression

Having played FEAR for a while now, I think it's worth pointing out that so far I've not seen a single hint of "repeated" textures on wall surfaces, e.g. office walls. I'm sure there are repeats, but they're not obvious.

Jawed
 
Regarding texture compression, well, I might be a snob because of the CG background but I find the DXTC artifacting horrible already. Anything more would be unbearable.

About Megatexture and sizes, well, I'd like to know id's content creation tools, as Photoshop tends to go crazy slow with 8K*8K images with multiple layers, and 3D texture painting programs like BodyPaint get seriously unstable with just a dozen 4K maps (ie. 3 parts with color, bump and specular maps, that also have a few layers). I can't imagine working with a 16K*16K texture on a current PC... is there a 64bit version of Photoshop, or what?
 
Decoding of entropy-coded data can AFAIK be done in parallel in about O( (log N)^2) time for N bits with a variant of the technique called 'parallel prefix computation'. I think the hardware cost is about O(N*S + N^2*log M) where S is the size of the symbol table and M is the maximum number of symbols that can be decoded in one go (if you use different symbol tables for different symbols, you will additionally need to multiply this hardware cost with the number of symbol tables that you use)

The idea: First, for every bit position, assume that a symbol starts there and compute how long that symbol would be. Then, for each bit position, look up the size of the 'next' symbol, so that for each bit position we get the sum of the 2 next symbols from that position. Repeat the procedure again (adding two sums) to get the length of the 4, then 8, then 16 etc next symbols. Now, at bit position 0 you will have generated pointers to symbol 0,1,2,4,8,16 etc; from each of these pointers, you can collect pointers to symbols n+1, n+2, n+4, n+8 etc, recursively until you have covered all symbols; you can this way collect pointer to ALL symbols in log(M) stages.

For texture compression, this approach is WAAY too heavy to allow data to be stored compressed in an L1 texture cache; it may be usable for an L2 cache.

I think some x86 CPUs use algorithms similar to this to enable fast parallel decoding of the instruction set's obnoxious variable-length instruction format.
 
If you go for decorrelation with quantization and per sample entropy coding then the entropy coding is not the problem if you are smart about it. The amount of cycles and overhead for the decorrelation, transforms have the edge over simple prediction there, and the extra indirection needed for the lookup in the map of index pointers ... those are the problems.
 
Last edited by a moderator:
Lets make some guesses assuming cache hits ... reading the index map makes for 1 cycle of latency. Entropy decoding should probably be doable in around 4 (guess the type of entropy coding needed to be able to do it this fast). Finally 4x4 integer transform and dequantization 3 cycles. Total latency for decoding 8 cycles, and if I was making an optimistic guess Id say you could cut this down to 5.

How practical would something like this be?
 
Last edited by a moderator:
Back
Top