R 9700Pro : Q about Framebuffer and Z-Buffer compression

demalion said:
Even simple delta compression (from zbuffer pixel to zbuffer pixel) would guarantee some compression of a scene that was generated of multipixel polygons. Then there are other schemes that might come into the picture.

Just a quick addition -

This method is implicitly lossy (see ADPCM) unless your format allows for the fact that you can provide a working set that will generate a _larger_ z-buffer than simply storing the information uncompressed.

You have to take into account the possibility that the data fluctuates from extreme to extreme, and since the delta is the maximum representable size you need extra bits to mark the blocks where this occurs. If it happens in a large enough proportion of blocks then the compressed data will naturally be bigger than the input data. Of course this extreme case is monstrously unlikely, but we're looking for guarantees here, and unlike audio ADPCM where the artifacts are minor, Z buffer artifacts caused by incorrectly coded data would be horrendously visible.

- Andy.
 
demalion said:
I think my original comment has gotten lost in the shuffle, and arjan's mention of the counting argument/theorum has gotten me confused with one of the crackpots with 50:1 compression of random data. ;)

To repeat my statements:

With enough rules about the generation of the larger set, yes you can. Whether there are enough rules for the compression scheme(s) used is another question (and ATi seems to have determined there isn't).

I will point out that a minimum resolution is one such rule that exists already...it may be possible that some future scheme may be able to guarantee a certain level of compression.

Now re-read my post and you'll see how I'm not invalidating anything I said, I think.

Sorry - my fingers aren't fast enough to keep us from overlapping posts ;)

I understand the point you make, and you are right. Applying this effort to a Z or colour buffer you would be looking for a set of rules where, in essence, none exist.

There are no hard and fast rules about how data is provided to the 3D hardware - the possible set of Z/colour buffers is inherently and explicitly the full possible set of (width*height*(2^bitdepth)) values, unless you allow lossy compression methods.

[Edit] - quick edit to make things clearer.

Cheers
- Andy.
 
demalion said:
I think my original comment has gotten lost in the shuffle, and arjan's mention of the counting argument/theorum has gotten me confused with one of the crackpots with 50:1 compression of random data. ;)

To repeat my statements:

With enough rules about the generation of the larger set, yes you can. Whether there are enough rules for the compression scheme(s) used is another question (and ATi seems to have determined there isn't).

I will point out that a minimum resolution is one such rule that exists already...it may be possible that some future scheme may be able to guarantee a certain level of compression.

Now re-read my post and you'll see how I'm not invalidating anything I said, I think.

And I would still say that a minimum framebuffer resolution (I assume that is what you mean) by itself is NOT a valid rule - no matter how large the resolution is, I can fill the framebuffer with incompressible garbage - the example that Humus gave where all the Z-buffer values are known to be below 0.5 is, however, a valid rule, since in this case, we can always compress data by shaving off a bit of all the values in the Z-buffer.

If you can *prove* that e.g. 80% or more of all dZ/dx slope values are below a given (small) value, then you have a valid rule usable for deterministic lossless compression - the problem lies in proving that such a rule applies with 100% certainty, as opposed to just 99.95% (in which case the 0.05% that failed will eventually blow up in your face).
 
arjan, I said "enough rules" in the first place, and said minimum resolution was one such rule. I'm not claiming it is enough by itself, but that it is one necessary rule to move forward.
 
Such a minimum resolution could be extremely small - a framebuffer of 4x4 pixels is too small to be usable for anything I can think of, but still easily compressible if it contains the right data. That's probably why it looked to me like you were suggesting "minimum resolution" as a 'sufficient' rather than just a 'necessary' condition :oops: .
 
demalion said:
With enough rules about the generation of the larger set, yes you can.
I would say that you aren't mapping a larger set onto a smaller one then, but a set with "larger" elements (i.e. with higher memory requirements)
 
Ok, I think we've ironed out the garantueed compression ratio by now.
I think though that it's possible to cut down on memory requirement with framebuffer compression, even though we still must be able to handle incompressible data. As long as we get a good solution to the addressing problem, and are able to expand the framebuffer storage space if neccesary it should be possible.
 
Humus said:
As long as we get a good solution to the addressing problem, and are able to expand the framebuffer storage space if neccesary it should be possible.
How do you handle the addressing problem? I.e. where do you keep the information required to find the appropriate address? Sounds like this will negate any space savings. Remember that you have to be able to access this database of values randomly.
 
One way to handle the addressing would be as follows:

Subdivide the framebuffer into tiles of e.g. 8x8 pixels, and reserve a fixed block of memory for each framebuffer tile, e.g. a 128 byte block. Also keep a pool of free 128 byte blocks. Now, when compressing the framebuffer, we will sometimes get data that fits into the 128 byte fixed block. If we don't, then we can allocate one or more blocks from the pool until we have enough blocks to store our data. This, of course, requires that we store pointers to these additional blocks somewhere. One place to store them would be at the beginning of the initial 128 byte block. There, you specify how many pointers you need (~1 byte), followed by the actual pointers (~2-3 bytes each), then followed by the compressed data. If we assume 4x multisampling with 2.5x compression on average for the actual data, we need 8*8*4*2*4/2.5 = 819 bytes per block on average. In addition to this memory, we waste on average ~64 bytes because our compressed data are forced into 128-byte blocks and about 15-20 bytes of data to pointers, for a total of about 80 bytes, reducing comression ratio from 2.5:1 to about 2.25:1. Also, we need a list of free 128-byte blocks somewhere, which increases memory usage by an additional ~2%, so we're down to about 2.20:1. Which doesn't seem that bad to me. The cost: You get a 2-level memory access on each framebuffer read, which requires some latency tolerance. AFAIK, present-day frame/Z-buffer compressors already require 2 levels of accesses to determine how much data needs to be read in order to avoid reading excessive data, so you get little additional penalty there. This leaves the free block list. If arranged as a stack, most accesses to that list can be cached quite effectively, so it has little additional effect other than just taking a little memory.

So addressing of a dynamically allocated compressed framebuffer isn't that hard to solve, even without killing off the effect of compression.
 
arjan de lumens said:
This, of course, requires that we store pointers to these additional blocks somewhere. One place to store them would be at the beginning of the initial 128 byte block. There, you specify how many pointers you need (~1 byte), followed by the actual pointers (~2-3 bytes each), then followed by the compressed data.
So how do I access the next adjacent block? Since the size of the previous block varies, I need to know how to find the next one.
 
If I'm reading that right, the size of each block doesn't vary, just the amount of data that's placed into each block (same memory allocated, but not always used).
 
I'd just like to add one additional little thought. Given the nature of 3D scenes, the most obvious compression technique would be to use planar compression (i.e. store three values for the position and slope of a plane, and calculate the z values off of those). This form of compression's efficiency is only related to the number of triangles that completely fall within the specified block size for compression compared to the size of each block. This form of compression makes the most sense when coupled with higher-sample FSAA, where there is more likely to be a large pixel/triangle ratio.

That said, what I'd be really interested in is a good z-buffer compression that also works well in scenes where there is a very high triangle/pixel ratio. Given the geometric nature of 3D graphics, it makes sense that curve-based techniques may work very well. It may also be worth investigating for 3D architecture designers whether or not some lossy compression is worth it. The amount of lossy compression would obviously have to be very tightly controlled, as z-buffer errors are indeed extremely ugly. It still is worth investigating, however. After all, if you can, with some form of slightly-lossy compression, ensure an absolute minimum compression ratio, then it may be worth it to implement such a method in conjunction with an overall more accurate z-buffer.

As an analog, what do you think would be better:

1. 30GB DVD2's with the same MPEG2 encoding at higher resolution.
2. 30GB DVD2's with MPEG4 encoding at even higher resolution.

If you've ever watched a high-quality MPEG4 movie, you'll most definitely choose the second, as MPEG4 can achieve equivalent image quality at much lower bitrates, so even though the compression is inherently more lossy, it's still overall better if the same bitrate is used.
 
Chalnoth said:
Given the geometric nature of 3D graphics, it makes sense that curve-based techniques may work very well. It may also be worth investigating for 3D architecture designers whether or not some lossy compression is worth it. The amount of lossy compression would obviously have to be very tightly controlled, as z-buffer errors are indeed extremely ugly. It still is worth investigating, however. After all, if you can, with some form of slightly-lossy compression, ensure an absolute minimum compression ratio, then it may be worth it to implement such a method in conjunction with an overall more accurate z-buffer.

As an analog, what do you think would be better:

1. 30GB DVD2's with the same MPEG2 encoding at higher resolution.
2. 30GB DVD2's with MPEG4 encoding at even higher resolution.

If you've ever watched a high-quality MPEG4 movie, you'll most definitely choose the second, as MPEG4 can achieve equivalent image quality at much lower bitrates, so even though the compression is inherently more lossy, it's still overall better if the same bitrate is used.

I think your analogy is somewhat flawed -

You are neglecting the fact that the compression used in MPEG2 or MPEG4 was carefully selected for certain properties. Those properties are basically dictated by the desired result - that the errors introduced by the loss of data in the final image are of a nature that are generally tolerated and compensated for by the eye.

In a Z buffer no matter what information you choose to throw away in the compression you cannot significantly control the appearance of the visual errors that are introduced because they are dictated separately by the data in the colour buffer. As a result any inaccuracy in Z buffering can and will produce some maximum magnitude errors when the result is viewed.

As an extreme case you can already see the effects of this by using a 16-bit Z buffer (this is effectively simple lossy compression of the Z buffer by reduction of its dynamic range). Parts of scenes, where the data is well suited to this form of compression, look fine, while elsewhere you get very visible artifacts with objects 'popping' through each other, sawtooths on object edges etc. Doubling or quadrupling the size of the Z buffer in both dimensions would not fix this problem.

If you perform lossy compression of a Z buffer you will get all sorts of fighting and sparkling artifacts, and they are likely to be wholly unpredictable in appearance - you could easily get sparkling errors popping in randomly all over the screen as the content changes.

- Andy.
 
when using an lossy compression, would it help to store the data in an specific way so that Z-buffer is stored left and RGB-data right like :

zrgb :: for one pixel

zzzzzzzzrgbrgbrgbrgbrgbrgbrgbrgb :: for 8 pixel

and then use an compression like huffmann encoding ( or wavelets which allow lossless and lossy compression in one alorithm I think ) where the fine detail of the data which is the RGB data here degrades first. With an compression ratio of 2 the RGB data would only degrade slightly or not ( like an excellent jpeg with an quality of 80-100 ). If the compression is capped at 2 (or 4 with MSAA ) the storage should be predictable too, or?
 
Chalnoth said:
If I'm reading that right, the size of each block doesn't vary, just the amount of data that's placed into each block (same memory allocated, but not always used).
I don't think you are reading it correctly:
Now, when compressing the framebuffer, we will sometimes get data that fits into the 128 byte fixed block. If we don't, then we can allocate one or more blocks from the pool until we have enough blocks to store our data.
 
arjan de lumens said:
we need 8*8*4*2*4/2.5 = 819 bytes per block on average. In addition to this memory, we waste on average ~64 bytes because our compressed data are forced into 128-byte blocks and about 15-20 bytes of data to pointers, for a total of about 80 bytes, reducing comression ratio from 2.5:1 to about 2.25:1.
So when you have data that doesn't compress, then you are actually using more space than you would if the data wasn't compressed at all.
So addressing of a dynamically allocated compressed framebuffer isn't that hard to solve, even without killing off the effect of compression.
But since you have to plan for the worst case scenario, you have to reserve memory so you can add mode blocks. In other words, you aren't saving any real space at all.
 
Chalnoth said:
As an analog, what do you think would be better:

1. 30GB DVD2's with the same MPEG2 encoding at higher resolution.
2. 30GB DVD2's with MPEG4 encoding at even higher resolution.

If you've ever watched a high-quality MPEG4 movie, you'll most definitely choose the second, as MPEG4 can achieve equivalent image quality at much lower bitrates, so even though the compression is inherently more lossy, it's still overall better if the same bitrate is used.
I'm not sure that's the case. The last time I saw a comparison review of MPEG2 and MPEG4 at high bitrates, there was very little difference between the two, and in some scenes MPEG2 had a distinct advantage.

The hypothesis was that this was caused by the more mature compression methods of MPEG2 and the fact that the compression process is naturally 'easier'.

At low bitrates (anything under 2-3Mbps) MPEG4 is always superior, but at higher bitrates it is much less clear-cut.

Compression (the subject, not the act) does get hellishly complex!
 
I do understand demalion's point, and agree that by placing enough rules onto the set you could indeed 'compress further'.

But I kind of have the view that the effect of placing those rules would be that instead of creating opportunities for 'compression' you would just in fact define a different way of storing the Z values that would take up less space.

So we get onto some fine semantic points about 'is it compression, or just an alternate representation'? Of course, this is a question for all lossless compression methods (because it IS just an alternate representation).

For example, a tile-render architecture such as PowerVR doesn't have to store a Z-buffer - because it can build the image inside the tile using the Z-information calculated during setup. But then you have the 'restriction' that you have to store the entire data for the previous frame - and if that's not acceptable, you have to go and do a partial render to empty the data bins, which will involve writing out a screen-sized Z-buffer (so it can be reread to regenerate the Z information on the next pass).

The point is that under current API's we don't have any additional rules, so we have to handle any possible case, and if that includes 2-pixel wide quads that go from the nearplane to the farplane the hardware has to handle them, even if they are completely useless.

This is quite a silly discussion - I hope nobody's taking it too seriously. :)
 
Dio said:
.. and if that includes 2-pixel wide quads that go from the nearplane to the farplane the hardware has to handle them, even if they are completely useless.

huh? what do you mean by 'completely useless' - extreme view angles at large planes just happen! the fact that huge planes can come down to stripes of 2 (or even 1) pixels of width is fault of the discrete math - don't blame coders!

This is quite a silly discussion - I hope nobody's taking it too seriously. :)

don't be ridiculous! :)
 
OpenGL guy said:
Humus said:
As long as we get a good solution to the addressing problem, and are able to expand the framebuffer storage space if neccesary it should be possible.
How do you handle the addressing problem? I.e. where do you keep the information required to find the appropriate address? Sounds like this will negate any space savings. Remember that you have to be able to access this database of values randomly.

I'll hand it over to the hardware guys to implement ;)

Seriously though, we don't want a full malloc() implementation in hardware. But by applying some restrictions we might get a good enough solution to cut down storage space. You can keep a stack of free memory block. You may split it up into different buckets of different sized blocks to simplify finding best suited memory area, for instance one 64bits/block bucket, one 128bit/block bucket etc. Depending on how much space you need you allocate from the right place. You might skip reusing old blocks which were later found to be too small. This will reduce complexity but also reduce efficiency. If one digs into the problem I'm certain there's a good solution to be found.
 
Back
Top