24bit Z-Buffer or 32 bit Z-Buffer ?

Dio · Jan 22, 2004

Chalnoth said:
Wow, only FP16? Seems like that'd be very inaccurate for a w-buffer...

The characteristics of a W-buffer make it both more suitable for floating point (see the Blinn paper "W pleasure W fun" for more details) and more amenable to small floating point formats.

It is arguable that W-buffering is usually superior to Z-buffering (not least because it cuts out an interpolator). However, it is less flexible, because there was never a standard implementation it was never really compatible, and for FPS-type games items very close to the camera frequently have accuracy problems.

Dio · Jan 22, 2004

Ostsol said:
The only floating point depth buffering that I know of would be the w-buffer -- which, of course, is no longer supported by new video cards.

There is no requirement that W buffers be float, or that Z buffers be fixed.

Simon F · Jan 22, 2004

Dio said:
and for FPS-type games items very close to the camera frequently have accuracy problems.

Then it has been done incorrectly.

If done correctly then, for an "Object" at N units from the camera, the 'real world' Z error will always be within A*N units, where A is quite small fraction.

Guden Oden · Jan 22, 2004

One thing that's made me curious for a long time now is, why use 24 or 32 bits for Z, but only 8 for stencil? Is there no need for stencils to have more than 8, or is it just because it was convenient?

Would games benefit from say, a separate 16-bit stencil buffer (ie: not shared with Z), or would that cause incompatibilities?

How is the stencil buffer used anyway? That's rather a mystery to me.

Thanks, guys. You're being most helpful answering these questions!

darkblu · Jan 22, 2004

just for the record (after refreshing my memory): matrox g400 had 24/8 and 32/0. and i seem to remember at least one other card, but the memory is really vague, if i find anything about it i'll post it.

Dio · Jan 22, 2004

Simon F said:
Dio said:

and for FPS-type games items very close to the camera frequently have accuracy problems.

Click to expand...

Then it has been done incorrectly.

If done correctly then, for an "Object" at N units from the camera, the 'real world' Z error will always be within A*N units, where A is quite small fraction.

I think the W-buffers exported by various cards were too dissimilar and 'unpredictable' to make it work properly.

Hyp-X · Jan 22, 2004

Guden Oden said:
Would games benefit from say, a separate 16-bit stencil buffer (ie: not shared with Z), or would that cause incompatibilities?

Quick answer: No.

Stencil contains integer values, so increasing the number of bits doesn't improve precision (it always has a precision of 1), only range.

The improved range could only have an advantage if the application was specifically written to take advantage of it, or if the application is unaware of the range limitation (but than it will have bugs on all existing hardware...)

Dio · Jan 22, 2004

Well, the stencil buffer can in theory be split into many different buffers by use of the mask, etc., so a 16-bit stencil buffers might be useful to someone as two 8-bit stencil buffers...

It would be 'more useful' - but it's a lot less 'more useful' than having a stencil buffer vs. not having one.

Prophet · Jan 26, 2004

i was wondering about stencel as well i have been playing alot of unreal2/XMP lately and messing with the ini files if i enable stencel i see no fps drop on my 9600 but i cant really tell if theres a dif other then that

what dose the stencel do for games like XMP/UNREAL2

thanks

psurge · Jan 26, 2004

Wouldn't separating the z and stencil buffers allow for much faster z/stencils clears (when you don't want to clear both at the same time, i.e stencil shadows with multiple light sources)?

Also, isn't it more efficient to store stencil values in a different buffer from color values (stencil values are contiguous in memory, not spread across 32bit words)?

Regards,
Serge

OpenGL guy · Jan 26, 2004

psurge said:
Wouldn't separating the z and stencil buffers allow for much faster z/stencils clears (when you don't want to clear both at the same time, i.e stencil shadows with multiple light sources)?

With compression, and other bandwidth reducing measures, clears can be pretty much free.

Also, isn't it more efficient to store stencil values in a different buffer from color values (stencil values are contiguous in memory, not spread across 32bit words)?

Since you are often accessing the Z data at the same time as stencil data, it can make sense to have them packed together. Separate buffers means you need more FIFOs and other things to keep the pipeline efficient.

3dcgi · Jan 27, 2004

Another problem with storing the stencil buffer separately is its small size. Memory controllers these days want to deal with large chunks of data so they can hide page breaks between transfers.

KimB · Jan 27, 2004

OpenGL guy said:
Since you are often accessing the Z data at the same time as stencil data, it can make sense to have them packed together. Separate buffers means you need more FIFOs and other things to keep the pipeline efficient.

Well, if you're going to be doing shadow volumes with multiple lights, don't you need to clear the stencil buffer multiple times (between rendering of each light) before clearing the z-buffer?

Anyway, I'm hopeful for a shift to some sort of "super buffers" or something wherein the developer just assigns data however they choose to, say, a 64-bit or 128-bit space in system RAM. It may even be possible to label certain pieces of this space as having certain properties, in order to tell the video card it can use specific compression/optimization techniques (ex. this 16-bit space is frequently linear from pixel to pixel, and so it would be useful to compress it like you would a z-buffer).

Ostsol · Jan 27, 2004

Chalnoth said:
Anyway, I'm hopeful for a shift to some sort of "super buffers" or something wherein the developer just assigns data however they choose to, say, a 64-bit or 128-bit space in system RAM. It may even be possible to label certain pieces of this space as having certain properties, in order to tell the video card it can use specific compression/optimization techniques (ex. this 16-bit space is frequently linear from pixel to pixel, and so it would be useful to compress it like you would a z-buffer).

Sounds like an extremely complex feature to implement. Remember that complexity is what is keeping alot of features from coming into being. Framebuffer reads reads in the fragment shader, for example, were at one point in the GLslang spec, but it was pulled due to the complexity of such a feature.

KimB · Jan 27, 2004

Well, it's not something I would expect soon. I expect a basic implementation first (well, nVidia already does have such an implementation, and ATI allows for multiple output buffers of the same data type). Performance optimizations come later, if the feature is used at all.

psurge · Jan 27, 2004

3dcgi - AFAICS the fact that stencil data is small would actually reduce the number of page breaks, since the stencil buffer will span a smaller number of pages than z/color buffers. It seems to me that you loose efficiency if the word size you can read from memory is significantly greater than the average number of stencil bits you actually need to read.

Either way, I think it comes down to whether or you usually need lots of stencil values in one go or not... but it seems to me that you would read a whole bunch of them at a time, given increasing levels of AA and shaders running on 2x2 pixel stamps.

(please excuse/correct any stupidity on my part, i'm not a hardware expert).

Regards,
Serge

Prophet · Jan 28, 2004

ok question dose using stencels in a game like unreal2/xmp cause eather quility loss or gain and/or a perfofmance hit/gain

also using ATI radeon 9600 i know its got 24bit z but dose that change when stencels are used

plz dont go to deep into specs im a computer tech geek but not programer

KimB · Jan 28, 2004

psurge said:
3dcgi - AFAICS the fact that stencil data is small would actually reduce the number of page breaks, since the stencil buffer will span a smaller number of pages than z/color buffers.

The idea is that while writing to/reading from the stencil buffer, you'll also do quite a bit of accessing in other buffers. If you were just accessing the stencil buffer, your logic might hold, but that just doesn't happen.

Also remember that modern memory busses are 64-128 bits wide. You would want to be able to read a good amount of continguous data to maximize performance. I'd say somewhere in the range of 256-512 bits of continguous data would be decent. That would be 32-64 pixels for optimal accessing for 8 bits per pixel worth of data. So, sure, if you had 16x FSAA on a high-end card, I suppose it might not be too bad. I'm just not sure we'll be up to 16x FSAA for a little while yet, and by the time we do, memory busses may be wider.

3dcgi · Jan 28, 2004

psurge said:
3dcgi - AFAICS the fact that stencil data is small would actually reduce the number of page breaks, since the stencil buffer will span a smaller number of pages than z/color buffers. It seems to me that you loose efficiency if the word size you can read from memory is significantly greater than the average number of stencil bits you actually need to read.

Either way, I think it comes down to whether or you usually need lots of stencil values in one go or not... but it seems to me that you would read a whole bunch of them at a time, given increasing levels of AA and shaders running on 2x2 pixel stamps.

The problem is that stencil data is too small. In order to achieve high speeds memory controllers want to transfer large chunks of data at a time. Maybe 256 or 512 bits. A 256 bit transfer means 32 8 bit stencil values. If all of those stencil values are used then the efficiency is great, but if only a few of those pixels are touched the efficiency isn't so great. Long skinny shadow volumes would only touch a few pixels in an area and those stencil values might not be around in the cache when they're touched again. That of course depends on the size of the cache and the length of the triangle.

Today's memory controllers have 512 bit memory interfaces internally and these are likely divided into 4 mini controllers. I believe Nvidia has at least said this. Thats 128 bit transfers from each mini controller and I'm sure each transfer is for more than one clock. I'd guess they want 4 or more clocks which is a 512 bit transfer. An 8x8 block of stencils from a 512 bit transfer might be ok, but if you get larger than that I think the rendering efficiency will start to drop.

I'm not a memory controller expert, but these are some observations from talking with some memory controller people in the past.

ET · Jan 30, 2004

darkblu said:
just for the record (after refreshing my memory): matrox g400 had 24/8 and 32/0. and i seem to remember at least one other card, but the memory is really vague, if i find anything about it i'll post it.

IIRC all ATI cards before the Radeon 9700 (Rage 128 and up) had the option of 32/0, although the only one I can tell this for certain about is the Radeon 7500.

24bit Z-Buffer or 32 bit Z-Buffer ?

Dio

Dio

Simon F

Tea maker

Guden Oden

Senior Member

darkblu

Dio

Hyp-X

Irregular

Dio

Prophet

psurge

OpenGL guy

3dcgi

KimB

Ostsol

KimB

psurge

Prophet

KimB

3dcgi

ET

Similar threads