memory req. for SMOOTHVISION 2.0

tb

Newcomer
Hi!

I used 3D-Analyze 1.63 and took a simple D3D8 SDK sample (DolphinVS) and tested ATi's different SMOOTHVISION 2.0 settings:
(memory usage for FSAA is only correctly displayed when you enable it in the driver panel, don't know why)

1024x768x32
2xFSAA 9MB
4xFSAA 18MB
6xFSAA 27MB

1600x1200x32
6xFSAA 66MB

Could someone test the memory req. on a GeForce 4 ?

Thomas
 
Are you using a 16-bit z buffer? Not counting double buffering I get 92MB for 1600x1200x32 with a 24/8 z buffer. I expect GeForce 4 will be the same.
 
tb said:
I used 3D-Analyze 1.63 and took a simple D3D8 SDK sample (DolphinVS) and tested ATi's different SMOOTHVISION 2.0 settings
What's the big mystery?

1600*1200*(4*6 (color) + 2*6 (16-bit Z)) = 66 MB.

How do you expect people to get 6x AA numbers for the GeForce 4?
 
the mystery....

Thanks, I know the formula. 2x and 4x FSAA would be enough on a GF4 card. The reason for my testing methode was the following:

hz3.gif


If HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected? Is it IDirect3DDevice8::GetAvailableTextureMem()'s fault?

Thomas
 
From what I've read on this forum the 9700 always has to reserve the full space for the framebuffer since if it had to render a frame where only minimal compression was possible then it would run out of memory and cause it to stop and move some textures etc out of the way.
 
Re: the mystery....

tb said:
TIf HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected?
It's working fine, but you have some misconceptions. Because the compression is lossless, you can't guarantee a fixed compression amount. This means that you have to reserve a sufficient amount of memory in case you are unable to compress the data at all.

Memory is not conserved, what is conserved is bandwidth, which is usually more important.
 
Re: the mystery....

OpenGL guy said:
tb said:
TIf HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected?
It's working fine, but you have some misconceptions. Because the compression is lossless, you can't guarantee a fixed compression amount. This means that you have to reserve a sufficient amount of memory in case you are unable to compress the data at all.

Memory is not conserved, what is conserved is bandwidth, which is usually more important.

In there "advertising" they seem to be very optimistic:
"...HZ3 achieves a minimum 2:1 compression ratio..."
But you could be right, if they don't conserve memory...If they would conserve memory, efficency would be better(no need to compress the z-data when reading data from the zbuffer).

Thomas
 
Re: the mystery....

tb said:
In there "advertising" they seem to be very optimistic:
"...HZ3 achieves a minimum 2:1 compression ratio..."
Again, we're talking about saving bandwidth, not memory.
But you could be right, if they don't conserve memory...
Trust me on this one. ;)
If they would conserve memory, efficency would be better(no need to compress the z-data when reading data from the zbuffer).
Please explain how this could be done in a lossless manner and tell me how I can access a random piece of data with your method.
 
Lossless compression schemes: RLE, LZW(too complicated?)...
But you can't quarantee a minimum compression ratio!

Random access: Use the hierachical Z blocks as access patterns and save the z-data in this way that you have these blocks and compress them.

I'am not a hardware engineer, so maybe I'am writing BS, just some quick ideas. Have to go to bed (9:45 in the morning, need some sleep till 13:00)...

Thomas
 
Thomas, their main aim is to save bandwidth and not memory storage space. This compression most likely uses a block based system. This means that Z-Values are read and written in blocks (makes sense due to memory burst modes), each block contains a certain number of depth values, say for 64 pixels (I pulled that number out of the air). Now the memory size of this block is dynamic based on the actual depth contents, if the depth is constant, or under a constant slope in space then you can compress the depth values quite easily and hence the block when written out takes a small amount of memory space. On the other hand if I use a really nasty checkerboard pattern alpha tested polygon (constantly on/off per pixel) and multiple polygons at different depths then the Z Values in that area are very dynamic and hence very difficult to compress and thus take up a lot of space. ATIs hardware can not predict the future and hence needs to allocate enough storage space for the worst case since else the whole things fails.

Now what they gain is that if the block compresses well they might only need to do one burst of memory access and they have the whole depth info, if the block compresses poorly they probably need multiple reads to grab the whole block. Most likely they keep track of how big each block is on chip so they know how much reading to do. So in the end its a matter of bandwidth, do I pull X onto the chip or XXX onto the chip.

I am still surprised about their claim for a minimum of 2:1, to be absolutely always lossless that sounds impossible unless they found some new way of compression... I am sure lots of companies would love a guaranteed compression of 2:1 no matter the data sent down.

Their claimed ratio goes up with AA because it increases the likelyhood of easy to compress depth values (constant, constant slope, etc - completely random per pixel is unlikely in 6x AA mode).

K-
 
Kristof said:
I am still surprised about their claim for a minimum of 2:1, to be absolutely always lossless that sounds impossible unless they found some new way of compression... I am sure lots of companies would love a guaranteed compression of 2:1 no matter the data sent down.
It's not possible of course. They just use a DDPCM scheme.

ciao,
Marco
 
tb said:
Lossless compression schemes: RLE, LZW(too complicated?)...
These are unacceptable because you can't easily find a random piece of data.
But you can't quarantee a minimum compression ratio!
Then how are they better than what ATi already has?
Random access: Use the hierachical Z blocks as access patterns and save the z-data in this way that you have these blocks and compress them.
So how do I find the Z value at (523, 318)? If it's not at a fixed offset, then how do I get the value quickly?
 
Just to follow on from OpenGL guy's post, this is why many compressed image formats, for example JPEG, are not used as a compressed texture format.
 
tb;

I had also hoped that the compression does not only save bandwidth but also memory. But as it now seems this is not possible.

I had hoped they use some form of lossless-compression like TREC which, according to the old .doc had an lossless compression of 3.5 (quite extrem!) and the compression / decompression looked like an "simple" pipe (no loopback etc.. )

[edit] the lossly TREC compression had an compression ratio of around 11, but this was an second mode. TREC was based on JPEG, but worked only on small squares (8x8pixel?)
 
mboeller said:
I had hoped they use some form of lossless-compression like TREC which, according to the old .doc had an lossless compression of 3.5 (quite extrem!) and the compression / decompression looked like an "simple" pipe (no loopback etc.. )
It's not possible to compress every pattern of bits by a factor of 3.5 with a lossless compression scheme. So you still have to be able to handle the case when your compression doesn't work well. Plus, without reading the specs on TREC, it sounds like each block will compress to a different size, meaning that the starting address of the next block will be unknown without first looking at the previous block. This is not acceptable for random access.
 
Okay, I would say issue cleared. They only save bandwidth, point. Send ATi a mail, no answer, but I'll wait. Thanks guxs for the clarification on this issue.

Thomas
 
If the goal is to save memory footprint then lossless compression is not the answer. To solve this a different memory structure is needed. Ati is obviously not too concerned with the memory footprint with memory being so cheap. Performance and simplicity in hardware is what matters most. Especially when you're trying to compete on a short product cycle.
 
Back
Top