Is UltraShadow/gl_nv_depth_bounds possible on R420?

SirkoZ

Newcomer
Greetings!

I was searching Google and this fine forum and didn't find anything.
Would anyone know if it is to expect ATI supporting NVIDIA's Ultrashadow technology on it's R4xx boards, that now have two-sided stencil support and are presumably (not yet proven) capable of 32 stencil tests per clock under certain conditions? (the way that gl_nv_occlusion_query got into Catalyst 3.1)

If yes, would it be possible on R3xx series with the improved OpenGL ICD, that is supposedly being re/written?

Regards

SirkoZ
 
Nope. Ultrashadow is completely different from two-sided stencil.

And by the way, the R3xx has been able to do two samples per clock with FSAA all this time, so that capability has nothing to do with Ultrashadow, either.
 
Chalnoth said:
Nope. Ultrashadow is completely different from two-sided stencil.
Actually I'm pretty sure it does. Though the OpenGl extentsions for ultrashadow have nothing to do with 2 sided stencils
 
I know, that Ultrashadow is different from two-side stencil, two-side stencil is shading (in the real meaning of the word :)) and ultrashd. is software "stencil HSR". Why is it not possible to implement it on the Radeon? Which part/engine/proc. of GPU is performing that on GeForce (FX)?
 
actually, ultrashadow was a marketing term wich contains all sort of things that makes shadows faster.

2 sided stencil
depth clamp
scissor test, too
2x as fast depth-stencil-only writes.

2 sided, and scissor is of course on ati, too. depth clamp is not. 2x as fast for stencil only.. i'm unsure but ati does at least have stencil-only optimisations, too..
 
I think the term UltraShadow was introduced with NV35, and the only new thing NV35 brought was depth bounds test. two-sided stencil and double stencil performance were introduced with NV30, and depth clamp came with NV20.
 
No, at NV30 launch/in it's specs, there isn't any Ultrashadow mentioned...
It has 2-sided stenciling though and 8 z/stencil updates/clock - Ultrashd. wasn't introduced until NV35.

And I think, that Ultrashadow is "just" the GL-depth_bounds extension and not anything else - so ATI would just have to have hardware ready for it. Weren't the specifications made available @ NV35 launch, because I remember NVIDIA saying they like that other grf. firms adopting their extensions...

An without it R4xx would/will really waste performance - I mean Ultrashd. is not near perfect like a tile-based renderer (PowerVR ;)), but it's very nice improvement if taken advantage of it...
 
nvOpenGLspecs.pdf

EXT_depth_bounds_test
NV1x:Em
NV2x:Em
NV3x:supported (min. driver R50)
NV4x:supported
Notes:NV35, NV36, NV4x in hw only

So it is support on every NV3x and Nv4x card, however it is only hw accelerated on NV35, NV36 and NV4x and just a driver feature for all other nv3x gpu's.

Thomas
 
Ah so... ;)

So how would/will this affect performance on - oh, let's say - GeForce FX 5800? Isn't the whole extension/depth_bounds (=stencil HSR) software already - I mean a programmer must tell which parts to be lit.
 
All the depth bounds check does is potentially stop you rendering a lot of stencil pixels that have no effect on the output.

In hardware it's a clip against some Zmin/max, I have no Idea what the software emulation does. Could well be a noop.

The saving are potentially dramatic, most stencil rendering is fill bound, so rendering less pixels is a big win.
 
tb said:
nvOpenGLspecs.pdf

EXT_depth_bounds_test
NV1x:Em
NV2x:Em
NV3x:supported (min. driver R50)
NV4x:supported
Notes:NV35, NV36, NV4x in hw only

So it is support on every NV3x and Nv4x card, however it is only hw accelerated on NV35, NV36 and NV4x and just a driver feature for all other nv3x gpu's.

Thomas

Maybe I made a mistake. "in hw only" could mean that only the nv35, nv36 and nv4x support it in hw, other vpu's do not support it.

http://www.delphi3d.net/hardware/extsupport.php?extension=GL_EXT_depth_bounds_test

Thomas
 
Thanks Thomas - it seem that either they can't or don't want to support it on the whole NV3x line.

ERP - if it it's "just" clip against some Zmin/max in hardware, wouldn't it be possible to support it on any harware?
 
SirkoZ said:
ERP - if it it's "just" clip against some Zmin/max in hardware, wouldn't it be possible to support it on any harware?

Its compares the z value between a min/max and reject the fragment based on that. If the chip doesn't have the per fragment circuit it can't do it. Its occurs early (before pixel shaders) to speed up fill-rate.

Unless ATI have this hardware without telling anybody (And I'm pretty certain the don't), a driver couldn't add support.
 
I see. It's really a pitty, I mean a "new" generation of chip and still 4-bit subpixel precision, cheating @ even bilinear (as 3Dcenter.de showed), no software-stencil-HSR. It's just pointless... :rolleyes:
OK, no PS 3.0 no problem, but all the other "shortcuts"... :p
 
Depth bounds can be emulated on any hardware that can
a)transfer interpolated z into the fragment stage
b)reject a fragment based on a min/max comparison

Ie everything that supports some sort of "KIL" instruction with an interpolated attribute input. If you have some free resources to spend on this emulation (a little help from the vertex shader, a dependent read and/or alpha test), you can do it on NV20 and R200.

It would behave correctly, but it wouldn't perform like the real thing, because you absolutely have to start a pixel shader of sorts to do the rejection test. As this whole thing is all about saving fill, this is a rather pointless endeavour IMO.
 
Yeah, emulation could destroy other z-buffer optimizations, and could well be more detrimental to performance than beneficial.
 
zeckensack said:
Depth bounds can be emulated on any hardware that can
a)transfer interpolated z into the fragment stage
b)reject a fragment based on a min/max comparison

Ie everything that supports some sort of "KIL" instruction with an interpolated attribute input. If you have some free resources to spend on this emulation (a little help from the vertex shader, a dependent read and/or alpha test), you can do it on NV20 and R200.

It would behave correctly, but it wouldn't perform like the real thing, because you absolutely have to start a pixel shader of sorts to do the rejection test. As this whole thing is all about saving fill, this is a rather pointless endeavour IMO.

I think you should have read what depth bounds test does...

The depth bounds
test compares the depth value stored at the location given by the
incoming fragment's (xw,yw) coordinates to a user-defined minimum
and maximum depth value. If the stored depth value is outside the
user-defined range (exclusive), the incoming fragment is discarded.

Unlike the depth test, the depth bounds test has NO dependency on
the fragment's window-space depth value.
 
Back
Top