R 9700Pro : Q about Framebuffer and Z-Buffer compression

mboeller

Regular
hi all;

I have one stupid question about the framebuffer and Z-buffer compression used
by the new R 9700Pro cards (and soon by the R9500 cards ).

Does this so called Compression really compress the framebuffer and Z-buffer (up
to 24times) or does it only lessen the bandwidth demand (up to 24times)?

If it only lessens the bandwidth demand, then I think the R9500 with it's 64MB
memory will be a lot slower with AA, because the framebuffer and Z-buffer
would need a lot of space.

But if the Frame-buffer and Z-buffer is really compressed then the R9500 / R9700
would need only a small amount of memory for the Framebuffer and Z-buffer
and so the 64MB would be enough.

example :

without AA : 1600 x 1200 x 32bit x 3 = 22500 KB ( with 2x compression ) = 11250 KB

with 4x MSAA ( and an 2times higher compression due to MSAA ) the memory demand
would only be 22500 KB for Framebuffer and Z-buffer with 4xMSAA enabled.
Otherwise without compression the memory demand would be : 4 x 22500KB = 90000 KB !!

This would mean that AA is only possible at up to 1024x768 pixels due to the high
memory demand ( 1024x768x32x3x4 = 36MB, leaving only 28MB for textures ) and slow due
to the memory trashing when it has to load most textures out of AGP-memory (with
new games like UT2k3 ).
 
Based on the info ATI has released, I think your second statement is correct, i.e. that HyperZ III only reduces memory bandwidth rather than shrinking the size of the frame buffer. So it's a good bet the 64MB Radeon 9500 would have a reduced set of supported AA modes. However, if it has fewer rendering pipes than the 9700 Pro, then it probably won't have enough fill rate to make 1600x1200 with 4x AA usable anyway. But I guess they have to leave some reason for people to shell out the big bucks for the 9700 Pro!

Another question... is there any reason you couldn't build a 9500 board with 128MB of memory? It might be a bit more expensive, but it would probably provide a lot more performance in hi-res AA modes and in games that use large textures.
 
You're on to something about the Radeon 9500 if it only come with 64 MB.

Sireric from ATI has explained before that in the case of 16x12 res at 6xAA you will need ~107 MBytes. The tricky part is that you have to reserve all of these megabytes because the compression algorithm is lossless and thus you might (in theory) be unlucky enough to get no compression at all. So you reserve ~107 MB in this case but you would never in practice have to write that many = saved memory traffic. Hope I don't ramble on too much here..! ;)
 
However, if it has fewer rendering pipes than the 9700 Pro, then it probably won't have enough fill rate to make 1600x1200 with 4x AA usable anyway.

I don't have any reason to suspect that MSAA on R300 isn't fillrate free, a la NV2x.
 
The way I understand it, it does compress frame/Z data whenever writing to the framebuffers, thus saving bandwidth on write and the subsequent read. However, since the pool of framebuffer memory is fixed during rendering, it must allocate enough memory beforehand to be certain to function currectly under all scenarios - even when fed completely incompressible data. This will necessarily be a slightly larger amount of frame/Z memory than if compression was just turned off. (You need to, for each potentially compressed block of the frame buffer, store at least 1 additional bit indicating whether compression of that block failed or not)

If memory space is tight, it should be possible to downscale the multisample buffer before passing it to the double-buffering chain - so that you don't need to keep 2 full multisampled color buffers in memory all the time.
 
The Z data is stored in a compressed way in the frame buffer, but the main goal is to save bandwidth. It's a lossless compression method (only thing that would be acceptable), so we cannot guarantee a compression ratio (just a "not to exceed" ratio of 24:1). There's a few cases were we just can't compress and fall back to 1:1. However, we can't predict those cases, so we need to assume the worst, and reserve the space for all the data to be stored uncompressed, even though we end up using a small fraction of that, in general.

Same thing for color compression (though different compression ratios).

A 64MB card will have some performance issues at higher resolutions with some AA modes (basically, if the frame buffer is larger than 64MB) -- We would need to start rendering to AGP space, which can be done, but it's a serious bottleneck (it's got 1/20 the bandwidth of the local memory, so it could take your 40 fps down to 4fps).
 
sireric said:
The Z data is stored in a compressed way in the frame buffer, but the main goal is to save bandwidth. It's a lossless compression method (only thing that would be acceptable), so we cannot guarantee a compression ratio (just a "not to exceed" ratio of 24:1). There's a few cases were we just can't compress and fall back to 1:1. However, we can't predict those cases, so we need to assume the worst, and reserve the space for all the data to be stored uncompressed, even though we end up using a small fraction of that, in general.

Same thing for color compression (though different compression ratios).

A 64MB card will have some performance issues at higher resolutions with some AA modes (basically, if the frame buffer is larger than 64MB) -- We would need to start rendering to AGP space, which can be done, but it's a serious bottleneck (it's got 1/20 the bandwidth of the local memory, so it could take your 40 fps down to 4fps).

Damn... :)

I had hoped that You can at least "guaranty" that the compression is 2X (4X at the most ) and 4X with 4xMSAA enabled ( 16X at the most ). This would have been really nice. I understand that a lossy compression is not possible because this would give a lot of Z-buffer errors.

I think without this a lot of people will be disappointed with the AA-performance of the R9500, cause it is said that most boards will only have 64MB and so the amount of memory is not enough for new games using a lot of textures. With 64MB the AA-performance of the R9500 would be quite low at 1024x768 due to the memory overflow ( I think the same as the AA-performance of the R9700Pro at 1600x1200 ). So it seems that 2xMSAA (only) will be the sweat spot of the R9500 with 64MB.
 
Yes, that's probably true, that 2x AA would only be reasonable for a 64MB Radeon 9500. However, 4x should be possible if the user only wants to run at, say, 1024x768. In other words, those with lower-quality monitors might not mind so much.

Regardless, it still will have better AA quality than a GeForce4 Ti (though not nearly so much as the Radeon 9700...as you stated, if most indeed do ship with 64MB, then that will severely impact the benefits of the new architecture where FSAA is concerned). I also feel that the reduced number of FSAA samples realistically available will limit the Radeon 9500 in how much better than a GeForce4 Ti it can be.

For ATI's sake, they should most certainly outfit most, if not all, of the Radeon 9500's with 128MB of RAM. They certainly look to be powerful enough to use that much.
 
mboeller said:
Damn... :)

I had hoped that You can at least "guaranty" that the compression is 2X (4X at the most ) and 4X with 4xMSAA enabled ( 16X at the most ).

It's simply not possible to garantuee any level of lossless compression. You cannot uniquely map a larger set onto a smaller set.
 
Humus said:
mboeller said:
Damn... :)

I had hoped that You can at least "guaranty" that the compression is 2X (4X at the most ) and 4X with 4xMSAA enabled ( 16X at the most ).

It's simply not possible to garantuee any level of lossless compression. You cannot uniquely map a larger set onto a smaller set.

With enough rules about the generation of the larger set, yes you can. Whether there are enough rules for the compression scheme(s) used is another question (and ATi seems to have determined there isn't).

I will point out that a minimum resolution is one such rule that exists already...it may be possible that some future scheme may be able to guarantee a certain level of compression.
 
demalion said:
I will point out that a minimum resolution is one such rule that exists already...it may be possible that some future scheme may be able to guarantee a certain level of compression.

Not at all. For every lossless compression algorithm that exists, no matter how good, there exists at least one data set that will be *expanded* by that algorithm. There is a simple mathematical proof of this called the "counting argument". You may argue that, for a given compression algorithm, and a given use of it, that the data sets that are expanded by that algorithm are unlikely to appear in practice though.
 
demalion said:
With enough rules about the generation of the larger set, yes you can. Whether there are enough rules for the compression scheme(s) used is another question (and ATi seems to have determined there isn't).
I don't think you are correct. For example:
Show us how to map all hex numbers of the form 0xXY to the set of hex numbers of the form 0xZ such that you can recover the original data for every possible X and Y. This is a 2:1 compression ratio. And, as you can see, the set of values for 0xZ has 16 elements, but the set of values for 0xXY has 256 elements and there is no way to get such a mapping.

If you could compress every piece of data, then you could repeatedly compress the data and achieve a smaller result each time. However, this isn't the case.
 
arjan de lumens/OpenGL guy, I don't think you understood demalion's argument fully. He has a point really. If it can be proven that certain combinations of the full uncompressed data set will never occure, then it might be possible to garantuee a lossless compression ratio. This however means that there are redudancy in the original data, which can be possible if a non-redudant version of it is more computationally expensive or unintuitive. I don't think this is the case really when talking about framebuffer though, and I can't see any kind of combination that's never going to happend. But just for a silly example say that the Z value would always be between 0 and 0.5, then you could garantuee that you could always compress it to one bit less.
 
Hmmm, another way to reduce the requirements could be to look at the actual memory use in the prior frame rendered and then add some to that for safety. This might work in practice because one frame rendered should never be that different from the next one. As a safety precaution if you encounter a overflow of the framebuffer, you then could drop the FSAA in rest of the frame instead of a total crash.

Nah, crazy idea – but great fun. 8)
 
Reducing the memory requirements for a compressed framebuffer may be possible with virtual/paged framebuffer memory - you start out allocating the amount of memory you think you need before rendering the frame (which may be estimated from the previous frame's actual usage + a safety margin), and then take a page fault if you were wrong. Dunno how much this would add to the complexity of present-day crossbar memory controllers, though - you may need to buffer 20+ page faults before you can actually process them.
 
It's also a matter of quickly being able to address a certain block of data. A certain 8x8 (or so) block at one part of the screen might compress nicely while another might not. Still you'll need to quickly find the address in memory where to write the compressed or non-compressed data. This would be hard if we didn't have a full-size framebuffer. I guess one could do some kind of cache-like table of addresses where certain tiles are stored and being able to write it to another place later if the block requires more space later on in the rendering process. Not sure how feasible it would be in practice though.
 
Humus has the essence of what I'm saying, I think.

For example, if you contend that a possible zbuffer set would consist totally of polygons the same size or smaller than a pixel at randomly determined "z" distances, then you could possibly invalidate what I say. The question then becomes is there a way to guarantee or assume that this would not be the case? That is what I meant by "with enough rules", etc.

For about any other method of generating z buffer data, it would not be invalidated.

Even simple delta compression (from zbuffer pixel to zbuffer pixel) would guarantee some compression of a scene that was generated of multipixel polygons. Then there are other schemes that might come into the picture.

My comment on minimum resolution was related to the rule I had in mind of "a large enough set for data compression to be effective", i.e., a 8x5 pixel screen could easily be enough to prevent z buffer compression from working. :p
 
demalion said:
For example, if you contend that a possible zbuffer set would consist totally of polygons the same size or smaller than a pixel at randomly determined "z" distances, then you could possibly invalidate what I say. The question then becomes is there a way to guarantee or assume that this would not be the case? That is what I meant by "with enough rules", etc.

For about any other method of generating z buffer data, it would not be invalidated.

You just invalidated what you said yourself, I think...

Of course I would contend that what you have described is a possible z buffer set - there is nothing in any 3D specification that I know of that makes it illegal. The fact that it is extremely unlikely is neither here or there - the possibility of this set could make it impossible to guarantee that you will compress Z buffer data in all cases. You cannot guarantee anything unless you eliminate this as a possible input set, which implicitly involves placing artificial limits on submitted geometry.

Effectively you are advocating moving to a lossy compression system - not something that hardware manufacturers like to do when image quality is becoming more and more important.

The point of the Z buffer compression methods chosen by hardware manufacturers is that they already work on the assumption that the set that you describe is vanishingly unlikely, and therefore they can guarantee that they get good compression with most likely working sets.

So I don't see how you are proposing anything different here - if you guarantee me a good working set then I guarantee that existing Z compression will give you a good compression ratio. I can't guarantee exactly what that ratio is, though, so I still have to allocate the worst case buffer to cover all possible cases.

- Andy.
 
I think my original comment has gotten lost in the shuffle, and arjan's mention of the counting argument/theorum has gotten me confused with one of the crackpots with 50:1 compression of random data. ;)

To repeat my statements:

With enough rules about the generation of the larger set, yes you can. Whether there are enough rules for the compression scheme(s) used is another question (and ATi seems to have determined there isn't).

I will point out that a minimum resolution is one such rule that exists already...it may be possible that some future scheme may be able to guarantee a certain level of compression.

Now re-read my post and you'll see how I'm not invalidating anything I said, I think.
 
Back
Top