Interesting R200 HyperZ notes

Sharkfood

Regular
On the R300 NDA thread, demalion noticed an interesting quote from one of the sites reporting news/information concerning the R300-

AA modes on this chip can take advantage of HyperZ, which, it turns out, wasn't the case on the Radeon 8500. Also, the R300 sidesteps one of the big drawbacks of multisampling AA because it can handle "edges" inside of alpha- blended textures properly. Existing multisampling implementations, like NVIDIA's, don't touch those jaggies.

Rather than hijack that thread with this R200 interesting tidbit, I've moved this to here.

It seems as though at least 4xQ Smoothvision indeed disables benefit from HyperZ-II from some very limited testing. I have a crappy P3 system ready to image, chucked an 8500 retail in it, popped on the latest Catalyst drivers and this is what I found:

(again, please forgive the extremely CPU bound no-AA scores as the margin will be quite significantly higher on a more powerful rig)

Villagemark D3D V1.19, 1024x768x32, no AF
NoAA (aka registry value control)
disableHyperZ "0" : 93 fps
disableHyperZ "1" : 63 fps

2xQuality
disableHyperZ "0" : 44 fps
disableHyperZ "1" : 31 fps

4xQuality
disableHyperZ "0" : 16 fps
disableHyperZ "1" : 16 fps

---------------
Quake3 V1.27, 1024x768x32, demo127, no sound, trilinear, no AF
NoAA (aka registry value control)
disableHyperZ 0x0 : 127.6 fps
disableHyperZ 0x1 : 120.9 fps

2xQuality
disableHyperZ 0x0 : 74.5 fps
disableHyperZ 0x1 : 53.9 fps

4xQuality
disableHyperZ 0x0 : 34.1 fps
disableHyperZ 0x1 : 34.1 fps

------------

More modes/testing might yield some insight and I'm not entirely sure if this is Catalyst specific, some sort of registry prerequisite or mismatch with 4xQ (i.e. relies upon other dependent settings to truly disable, etc.etc.) or what have you.

Perhaps the most insightful look into this might yield answers concerning the odd resolution caps found with SV, Zbuffer compression and optimizations.
 
it also answers why performance with 4x still suffered in older games, far more than you would expect.
 
Why the odd cap of 800x600 with 3xq?

With the different settings it seems that it won't allow the supersample resolution to go above 2048 in either direction. Seems odd considering what ati say SV1 is on the 9700 flash tech demo. Bending the truth a bit?
 
that because the R300 using their multisampling AA stores the samples in the Zbuffer and therefore HyperZ can be applied. This isn't the case for the R200.
 
Isn't it interesting that 2xQ SV seemed to degrade a bit about when HyperZ started working properly? Is there any version of the drivers with both fully working HyperZ II (specifically Hierarchal Z, I'd think...perhaps Z compression as well) that has 2xQ SV handling near vertical edges well?

As for the 800x600 limit...is it possible to find out the size of the ZBuffer on the card? What if Q SV increases the ZBuffer resolution as well as the frame buffer resolution? My thinking is that odd numbered SV methods might have increased ZBuffer size (for Quality mode), with the framebuffer size of the previous even numbered SV. It seems possible that this would explain some things...perhaps the way fog is implemented using the ZBuffer would preclude the effective use of the ZBuffer data in whatever method they depend upon for Quality mode when HyperZ optimizations are in effect. Also, I noticed slightly better light blending compared to no AA/Peformance AA compared to Quality AA on Quake 3 (I'm too lazy to go link the images at the moment...it was a thread Uriel started over on Rage3D).

A tool to dump info about memory usage on the card would be handy about now. ;)
 
Hmm...look at this.

ExtremeTech said:
In SmoothVision 2.0, ATI took lessons learned from the first iteration of SmoothVision, and delivered several notable improvements. For starters, the awkward control pane for controlling the feature has been replaced by a sleeker UI that makes setting an AA level much easier. Gone is the distinction between "performance" (bilinear) and "quality" (trilinear) modes. Instead, there are three levels of SmoothVision available to end-users: 2X, 4X and 6X. This second incarnation of SmoothVision retains its predecessor's programmable sample patterns, although the default sample pattern was actually decided upon by ATI 3D guru Mark Leather, who garnered a technical Oscar for work he did while at Pixar. At a driver level, SmoothVision's sampling pattern is completely programmable, and DirectX 9 apparently allows applications to designate what the multi-sampling pattern will be. ATI was at press time undecided as to whether this feature would be exposed to D3D apps as a menu of pre-determined sampling patterns, or whether to expose the ability for an app to create its own sampling pattern.

Ok, so does this mean that the sampling for the Quality SV samples was based on adjacent pixels in the 3rd dimension as well? Is that perhaps why the odd valued SV modes have such odd resolution restrictions...perhaps it offers some odd enhancement to texture filtering? Notice they are gone in the new list of modes. Either a misunderstanding or all sorts of interesting info about the 8500 is coming out of the 9700 launch.
 
2xq in d3d no fog still does vertical edges in current drivers.

The performance modes are pretty much the same as quality atm in that they appear to be just plain supersampling. If you take a screenshot in game (eg Q3) then you will get what looks to be the top left part of the supersample buffer. In the 7206 drivers where the pattern was non ordered you got a normal complete screenshot. This was using q3s in game screenshot command. Using printscreen on the keyboard got a complete shot in both drivers although the overbright bits do not show up when using this.

Using the FSAA view program i think the SV modes are supersampling with these ratios:
2xq - 2x1
3xq - 3x1
4xq - 2x2
6xq - 3x2
2xp - 1.5x1
3xp - 1.5x1.5
4xp - 2x1.5
5xp - 2x2 but with blurring vertically
6xp - something like 2x2.5

5xq however is odd. Using the FSAA view program I got this:

5xq.gif


The AA used for the top left half is different to that used for the bottom right :-?. You might need to change the gamma to see all the colour graduations.
 
How would representative would that FSAA view program be if the Z Buffer is used as a sampling criteria for SV? I need to go do a search for some info on it I guess.

With more and more previews referring to how the ZBuffer is used for SV in the R300, and referring to how this is similar to what was done in the R200, I'm still curious as to how exactly it is used...

It seems that a good idea is that the highest (assuming this means "farthest") of the Z buffer values in a blend should have the most weight, proportional to how much higher it is. With this occuring both on pixels that would normally more strongly weighted for the near simples with a simple blend, and on pixels that would normally be more strongly weighted for the far sample, it seems this should result in a overall blend that serves to soften the near object's edge more than a simple blend would.

My theory as stated above about larger Z buffers for this...could it be that the odd valued SV options expand in an alternate direction than the frame buffer? With higher Z buffer resolution differing weighting factors could cause similar behavior to higher amounts of sampling in the framebuffer, and (based purely on my theoretical understanding and no actual backup) perhaps offer similar image quality to a higher amount of frame buffer sampling. I am aware that this might seem to contradict this xbit article, but the sample application is not an actual test program but an illustration of the author's theories...the suppositions are not conclusive enough to exclude my theory (as far as I understand).

My question is whether this is perhaps what the R200 tries to do? If not, could someone offer an alternate explanation? My explanation, depending on how the weighting was accomplished, seems to me to possibly fit the criteria for a "adaptive pseudo random jittered sample pattern" function, since all the elements seem to be there to satisfy all the catch phrases with the right weighting algorithm. ;)

Also, how would this affect an application like the FSAA view program?

I ask again for corrections and analysis of what is wrong with my theory (and I really mean it since I find it educational, though people don't seem to take me up on it :( ).

EDIT: I also had the thought that some HyperZ functionality could perhaps conflict with the most effective application of this approach, and that is why earlier drivers had better looking 2xQ SV than later drivers with fully functioning HyperZ...no one has clarified if this correlation is valid (Sharkfood?). I've tried disabling HyperZ functions, but haven't noticed a difference in my limited testing...one explanation could be that the if this approach indeed had problems with HyperZ, it was simply removed/modified for targetting HyperZ in all cases, whether actually on or not.
 
I'm not sure but I thought hyperZ was woking fine in the 7206 drivers with the better AA, I'm not sure though I'll have to reinstall them.

The better 2xq did create a weird diagonal tearing in those drivers in opengl. There were to parts to it, first was a straight tear from top right to bottom left just like a vsync tear, the second was a tear in the same direction but made it look as if the frames were rendered in small blocks (say 6x4 pixels) its very hard to describe so when I install the drivers I'll see if I can get some pictures with my camera.
 
I'd just like to quote on something I found interesting:

Also, the R300 sidesteps one of the big drawbacks of multisampling AA because it can handle "edges" inside of alpha- blended textures properly.

I hope that was a typo, because alpha-blended textures do not need any edge AA. All multisampling AA handles alpha-blended texture edges just fine. It's the alpha test that can become a problem.
 
Chalnoth said:
I'd just like to quote on something I found interesting:

Also, the R300 sidesteps one of the big drawbacks of multisampling AA because it can handle "edges" inside of alpha- blended textures properly.

I hope that was a typo, because alpha-blended textures do not need any edge AA. All multisampling AA handles alpha-blended texture edges just fine. It's the alpha test that can become a problem.
you sure love them alpha blended textures, huh?
But doesnt it require sorting the alpha textures?
 
Althornin said:
you sure love them alpha blended textures, huh?
But doesnt it require sorting the alpha textures?

Yes, they do require sorting of the alpha textures. But, that's often not a problem. It does mean that it's not completely trivial to just enable alpha blends in all situations, but I was able to do it just fine in UT...
 
Presumably they meant alpha tested textures.

This is the tearing in the 7206 drivers, 2xq 1024x768:

2xqtearings.jpg


original size is here http://www.jamesbambury.pwp.blueyonder.co.uk/2xqtearing.jpg

The straight tear is fixed from top right to bottom left, the jagged one can be anywhere (although I've only noticed it under the first tear) and the size and shape of the blocks varies depending on the resolution, in 640x480 they are thinner but wider. Any idea whats going on here :-?
 
Bambers said:
The straight tear is fixed from top right to bottom left, the jagged one can be anywhere (although I've only noticed it under the first tear) and the size and shape of the blocks varies depending on the resolution, in 640x480 they are thinner but wider. Any idea whats going on here :-?

Well, that jagged line is almost certainly due to z-buffer errors. The error could be "moved" by switching between a z-buffer and a w-buffer (This won't eliminate the error, just change where it occurs...I think the w-buffer is more or less independent of depth, while the z-buffer shows more errors further from the viewpoint).

The only way to wholly eliminate these errors is to use a higher-depth z-buffer. Hopefully the NV30 and R300 both have support for higher bit depths than 24-bit z's. Moving to 32-bit z-buffers should decrease those errors by a factor of 256, pretty much eliminating them. I'm not certain that 32-bit z is all we need, though...we'd have to see what 32-bit z can do, and whether or not everything is eliminated, before knowing that for sure.
 
Chalnoth said:
Bambers said:
The straight tear is fixed from top right to bottom left, the jagged one can be anywhere (although I've only noticed it under the first tear) and the size and shape of the blocks varies depending on the resolution, in 640x480 they are thinner but wider. Any idea whats going on here :-?

Well, that jagged line is almost certainly due to z-buffer errors. The error could be "moved" by switching between a z-buffer and a w-buffer (This won't eliminate the error, just change where it occurs...I think the w-buffer is more or less independent of depth, while the z-buffer shows more errors further from the viewpoint).

The only way to wholly eliminate these errors is to use a higher-depth z-buffer. Hopefully the NV30 and R300 both have support for higher bit depths than 24-bit z's. Moving to 32-bit z-buffers should decrease those errors by a factor of 256, pretty much eliminating them. I'm not certain that 32-bit z is all we need, though...we'd have to see what 32-bit z can do, and whether or not everything is eliminated, before knowing that for sure.

ATi has been supporting a 32 bit Z-buffer in OGL and D3D since R100 (dunno about the Rage series)
I notice that my GeForce 4 is prone to show z-buffer errors (SS:SE is a horrific example) way more than my Radeon 8500 which I think sucks. The GF4 is capable of support 32 bits but it's disabled in newer drivers if I'm not completely mistaken. (Yeah I know: they support 24+8=32 but that's not what I'm talking about, for games which don't use the stencil buffer those 8 bits mean jack)

Maybe I'm mistundertsnading you.. but isn't 32 bit Z-buffers sorta "old news". (Ie it's been around for a couple of years)

But then again, with todays 32 bit z-buffers we loose the 8 bit stencil buffer..
 
Chalnoth said:
Well, that jagged line is almost certainly due to z-buffer errors. The error could be "moved" by switching between a z-buffer and a w-buffer (This won't eliminate the error, just change where it occurs...I think the w-buffer is more or less independent of depth, while the z-buffer shows more errors further from the viewpoint).

snip

i believe it's a bit premature to make such conclusions, as quake is of the type of games that as a rule do not exhibit z-buffer probles on a 24bit zbuffer. AAMOF, i haven't seen any in quake even on a 16bit zbuffer. generally, z-buffer precision problems are typical for titles where there are considerably-greater variances in the scenes depths than getting from one room to the next which is the tipical scenario in closed-areas FPSs, i.e. it takes greatrer scene-depth dynamics, combined with greater object density in those scenes.

visually, on that screen shot the problem manifests itself like being z-buffer precision one, but i'm very sceptical about that. apropos, what happens when hyperZ gets enabled/disabled?

otherwise, yes, you're right aboout the z-buffer being hyperbolic and the w-buffer being linear. btw, you can do the math and see what happens where in which case.
 
As far as I know, z-buffer accuracy is solely dependent upon the transform matrix (given the same hardware settings...i.e. same z-buffer depth, method).

Specifically, the ratio of far to near determines the z-buffer accuracy. While it is certainly true that the layout of the scene will affect how much accuracy is needed for good rendering, I don't believe it has any effect on the actual accuracy.
 
Bambers, is that with Hyper Z on? Is there any combination of tweaks that doesn't exhibit the tearing? Maybe I can do a search on Rage3D and find out...btw, I still can't find the fsaa view program or info on what method it uses...do you have either available? (I've PMed pcchen whom I think was the author, but no response yet).

It does look like it occurs like some sort of calculation error after a certain distance...with your description of how that one is always fixed in position, and another randomnly shows up, it seems to me that it fits a calculation error. I don't see what calculation error would be fixed in location for the bottom right diagnol half, but I do see that the lightening doesn't occur on the stairs that are nearer. This would seem to confirm that color values are being determined due to the ZBuffer.

Is this unique to the Quake 3 game? The Quake 3 engine? Did you have W buffer support enbaled in the drivers? Stencil in the drivers supported? Stencil shadows in Quake 3? If not, what happens when you turn them on?
I'm wondering if the algorithm perhaps broke down with stencil usage...i.e., it wasn't smart enough to handle a stencil buffer...and the artifact may be an interaction of Quake 3 stencil buffer handling with whatever formula is used for the blending. Still don't get how that works out to always affecting the lower right that way without more info on stencil interaction...maybe it is a constant accumulated error with a flawed handling of the stencil buffer where it offsets the Z buffer value it reads because the mechanism that does this calculation fails to properly distinguish between a 32-bit Z buffer and a 24-bit Z Buffer and 8-bit stencil...though I'd need an education as to what difference in storage alignement, if any, there was between the two to understand this properly. Say, how do the HyperZ algorithms handle the 8 bit stencil buffer? Perhaps that is the key.
 
Chalnoth said:
As far as I know, z-buffer accuracy is solely dependent upon the transform matrix (given the same hardware settings...i.e. same z-buffer depth, method).

Specifically, the ratio of far to near determines the z-buffer accuracy. While it is certainly true that the layout of the scene will affect how much accuracy is needed for good rendering, I don't believe it has any effect on the actual accuracy.

My extremely casual understanding would say that a Z buffer would always be accurate for near items, and progressively less accurate as distance increases, such that the near buffer accuracy is in excess of what is necessary, and thus wasted. A W buffer reduces near accuracy, and leaves more accuracy for far objects, by including consideration for perspective in how pixel depth is represented.

In other words, I don't see how your text really contradicts what darkblu said.
 
Back
Top