memory req. for SMOOTHVISION 2.0

tb · Nov 12, 2002

Hi!

I used 3D-Analyze 1.63 and took a simple D3D8 SDK sample (DolphinVS) and tested ATi's different SMOOTHVISION 2.0 settings:
(memory usage for FSAA is only correctly displayed when you enable it in the driver panel, don't know why)

1024x768x32
2xFSAA 9MB
4xFSAA 18MB
6xFSAA 27MB

1600x1200x32
6xFSAA 66MB

Could someone test the memory req. on a GeForce 4 ?

Thomas

3dcgi · Nov 12, 2002

Are you using a 16-bit z buffer? Not counting double buffering I get 92MB for 1600x1200x32 with a 24/8 z buffer. I expect GeForce 4 will be the same.

tb · Nov 12, 2002

z-buffer, front and back buffer are not included in these numbers.

Thomas

tb · Nov 12, 2002

tb said:
z-buffer, front and back buffer are not included in these numbers.

Thomas

Small mistake. 16 bit zbuffer is included.

Thomas

OpenGL guy · Nov 12, 2002

tb said:
I used 3D-Analyze 1.63 and took a simple D3D8 SDK sample (DolphinVS) and tested ATi's different SMOOTHVISION 2.0 settings

What's the big mystery?

1600*1200*(4*6 (color) + 2*6 (16-bit Z)) = 66 MB.

How do you expect people to get 6x AA numbers for the GeForce 4?

tb · Nov 12, 2002

the mystery....

Thanks, I know the formula. 2x and 4x FSAA would be enough on a GF4 card. The reason for my testing methode was the following:

If HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected? Is it IDirect3DDevice8::GetAvailableTextureMem()'s fault?

Thomas

Bambers · Nov 12, 2002

From what I've read on this forum the 9700 always has to reserve the full space for the framebuffer since if it had to render a frame where only minimal compression was possible then it would run out of memory and cause it to stop and move some textures etc out of the way.

OpenGL guy · Nov 12, 2002

Re: the mystery....

tb said:
TIf HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected?

It's working fine, but you have some misconceptions. Because the compression is lossless, you can't guarantee a fixed compression amount. This means that you have to reserve a sufficient amount of memory in case you are unable to compress the data at all.

Memory is not conserved, what is conserved is bandwidth, which is usually more important.

tb · Nov 12, 2002

Re: the mystery....

OpenGL guy said:
tb said:

TIf HZ3 achives at least a compression from 2:1 the amount of memory for the whole FSAA+zbuffer stuff must be smaller than an non HZ3 version.

So, is HZ3 not working as expected?

Click to expand...

It's working fine, but you have some misconceptions. Because the compression is lossless, you can't guarantee a fixed compression amount. This means that you have to reserve a sufficient amount of memory in case you are unable to compress the data at all.

Memory is not conserved, what is conserved is bandwidth, which is usually more important.

In there "advertising" they seem to be very optimistic:
"...HZ3 achieves a minimum 2:1 compression ratio..."
But you could be right, if they don't conserve memory...If they would conserve memory, efficency would be better(no need to compress the z-data when reading data from the zbuffer).

Thomas

OpenGL guy · Nov 12, 2002

Re: the mystery....

tb said:
In there "advertising" they seem to be very optimistic:
"...HZ3 achieves a minimum 2:1 compression ratio..."

Again, we're talking about saving bandwidth, not memory.

But you could be right, if they don't conserve memory...

Trust me on this one.

If they would conserve memory, efficency would be better(no need to compress the z-data when reading data from the zbuffer).

Please explain how this could be done in a lossless manner and tell me how I can access a random piece of data with your method.

tb · Nov 12, 2002

Lossless compression schemes: RLE, LZW(too complicated?)...
But you can't quarantee a minimum compression ratio!

Random access: Use the hierachical Z blocks as access patterns and save the z-data in this way that you have these blocks and compress them.

I'am not a hardware engineer, so maybe I'am writing BS, just some quick ideas. Have to go to bed (9:45 in the morning, need some sleep till 13:00)...

Thomas

Kristof · Nov 12, 2002

Thomas, their main aim is to save bandwidth and not memory storage space. This compression most likely uses a block based system. This means that Z-Values are read and written in blocks (makes sense due to memory burst modes), each block contains a certain number of depth values, say for 64 pixels (I pulled that number out of the air). Now the memory size of this block is dynamic based on the actual depth contents, if the depth is constant, or under a constant slope in space then you can compress the depth values quite easily and hence the block when written out takes a small amount of memory space. On the other hand if I use a really nasty checkerboard pattern alpha tested polygon (constantly on/off per pixel) and multiple polygons at different depths then the Z Values in that area are very dynamic and hence very difficult to compress and thus take up a lot of space. ATIs hardware can not predict the future and hence needs to allocate enough storage space for the worst case since else the whole things fails.

Now what they gain is that if the block compresses well they might only need to do one burst of memory access and they have the whole depth info, if the block compresses poorly they probably need multiple reads to grab the whole block. Most likely they keep track of how big each block is on chip so they know how much reading to do. So in the end its a matter of bandwidth, do I pull X onto the chip or XXX onto the chip.

I am still surprised about their claim for a minimum of 2:1, to be absolutely always lossless that sounds impossible unless they found some new way of compression... I am sure lots of companies would love a guaranteed compression of 2:1 no matter the data sent down.

Their claimed ratio goes up with AA because it increases the likelyhood of easy to compress depth values (constant, constant slope, etc - completely random per pixel is unlikely in 6x AA mode).

K-

nAo · Nov 12, 2002

Kristof said:
I am still surprised about their claim for a minimum of 2:1, to be absolutely always lossless that sounds impossible unless they found some new way of compression... I am sure lots of companies would love a guaranteed compression of 2:1 no matter the data sent down.

It's not possible of course. They just use a DDPCM scheme.

ciao,
Marco

Kristof · Nov 12, 2002

Interesting reference on this issue :

http://www.beyond3d.com/forum/viewtopic.php?t=2853&highlight=compression

K-

OpenGL guy · Nov 12, 2002

tb said:
Lossless compression schemes: RLE, LZW(too complicated?)...

These are unacceptable because you can't easily find a random piece of data.

But you can't quarantee a minimum compression ratio!

Then how are they better than what ATi already has?

Random access: Use the hierachical Z blocks as access patterns and save the z-data in this way that you have these blocks and compress them.

So how do I find the Z value at (523, 318)? If it's not at a fixed offset, then how do I get the value quickly?

Simon F · Nov 12, 2002

Just to follow on from OpenGL guy's post, this is why many compressed image formats, for example JPEG, are not used as a compressed texture format.

mboeller · Nov 12, 2002

tb;

I had also hoped that the compression does not only save bandwidth but also memory. But as it now seems this is not possible.

I had hoped they use some form of lossless-compression like TREC which, according to the old .doc had an lossless compression of 3.5 (quite extrem!) and the compression / decompression looked like an "simple" pipe (no loopback etc.. )

[edit] the lossly TREC compression had an compression ratio of around 11, but this was an second mode. TREC was based on JPEG, but worked only on small squares (8x8pixel?)

OpenGL guy · Nov 12, 2002

mboeller said:
I had hoped they use some form of lossless-compression like TREC which, according to the old .doc had an lossless compression of 3.5 (quite extrem!) and the compression / decompression looked like an "simple" pipe (no loopback etc.. )

It's not possible to compress every pattern of bits by a factor of 3.5 with a lossless compression scheme. So you still have to be able to handle the case when your compression doesn't work well. Plus, without reading the specs on TREC, it sounds like each block will compress to a different size, meaning that the starting address of the next block will be unknown without first looking at the previous block. This is not acceptable for random access.

tb · Nov 13, 2002

Okay, I would say issue cleared. They only save bandwidth, point. Send ATi a mail, no answer, but I'll wait. Thanks guxs for the clarification on this issue.

Thomas

3dcgi · Nov 13, 2002

If the goal is to save memory footprint then lossless compression is not the answer. To solve this a different memory structure is needed. Ati is obviously not too concerned with the memory footprint with memory being so cheap. Performance and simplicity in hardware is what matters most. Especially when you're trying to compete on a short product cycle.

memory req. for SMOOTHVISION 2.0

tb

3dcgi

tb

tb

OpenGL guy

tb

Bambers

OpenGL guy

tb

OpenGL guy

tb

Kristof

nAo

Nutella Nutellae

Kristof

OpenGL guy

Simon F

Tea maker

mboeller

OpenGL guy

tb

3dcgi