void myPS ()
{
float multifetchedValues[17][17] = texCUBEMultiFetched(cubeSampler,myVec3,17,17);
}
Full support stereo rendering, not just stereo backbuffer.
- 64bits FP precision support ( that includes ZBuffer and stencil )
Nope really! I want more than 8bits for stencil... lets say 32Z + 32 stencil. I think basically the stencil should be able to work with 32bits objectIDs... 256 IDs are definitely not much hehe.Why would you want 64bit Zbuffers? Well, DX10 has 64bit depth-stencil surfaces, it's just that 24bits are not used. Are you hoping for like 56bits depth and 8bit stencil?
Unless you also get the ability to output stencil from the pixel shader (which is likely to come with a performance impact) then supporting this number of stencil bits for object IDs would imply you're rendering each of those object in their own separate call which isn't good for batch performance.I think basically the stencil should be able to work with 32bits objectIDs... 256 IDs are definitely not much hehe.
Why not, however currently the lack of depth precision often comes from poor utilization of projection matrices more than the "limited" bit precision in depth buffers. I suppose a space rendering engine (with planets and spaceships etc.) might benefit from a 64-bits depth without the hassle of having to partition your depth range.Also I could use a 64bits Zbuffer ( double precision ) with no stencil
Just use an int32 texture and dynamic branching for "early out". I seriously doubt a hardware-implemented stencil buffer would be any faster than that on modern hardware particularly if you're outputting stencil values from the shader. Hell I can't even get early-stencil to work properly in many *normal* cases!I want more than 8bits for stencil... lets say 32Z + 32 stencil. I think basically the stencil should be able to work with 32bits objectIDs... 256 IDs are definitely not much hehe. DX10 supports it but I don't think the current HW can use it actually.
Unless you also get the ability to output stencil from the pixel shader (which is likely to come with a performance impact)
Ok, what about something like a "blend shader" stage with the ability to read and write? That could be nice!Just use an int32 texture and dynamic branching for "early out". I seriously doubt a hardware-implemented stencil buffer would be any faster than that on modern hardware particularly if you're outputting stencil values from the shader
Can you give a little bit more info on that? I'm very curious on what doesn't work there, and if you have any idea why!Hell I can't even get early-stencil to work properly in many *normal* cases!
I was using stencil for a while with deferred shading to stencil out light volumes (works nicely with z-buffering). However after some benchmarking I realized that while the stencil was *working*, it wasn't actually making it any faster than just shading the whole screen. I spoke to NVIDIA about it and they jokingly suggested that I rename my app to "Doom3.exe" Basically early-stencil seems to work for exactly the case that Doom3's rendering path uses and pretty much nothing else, even in cases where it is theoretically possible as there are no data dependencies.Can you give a little bit more info on that? I'm very curious on what doesn't work there, and if you have any idea why!
I believe what you're telling us, but that makes no sense at all! If you have a lot of volume lights to apply then the cost of shading your scene should be much higher with a fullscreen pass per light compared to marking the volume areas with stencil and only shading those for each light. Of course this depends on your shader complexity but overall this should be true (even if you use dynamic branching to reject out-of-range pixels during the shading passes). You're not using insanely-tesselated volumes (spheres?) for the volume lights are you? (on a unified architecture this may take some of the power you wanted for pixel shading).I was using stencil for a while with deferred shading to stencil out light volumes (works nicely with z-buffering). However after some benchmarking I realized that while the stencil was *working*, it wasn't actually making it any faster than just shading the whole screen.
Oh definitely it should have been faster - that's why I was using it I ended up just projecting the light volume BB's and using a scissor test on the GPU in that implementation which was fast enough. Note that stencil was *entirely* broken on ATI/OpenGL at the time and I still don't think early stencil works properly in that demo.I believe what you're telling us, but that makes no sense at all! If you have a lot of volume lights to apply then the cost of shading your scene should be much higher with a fullscreen pass per light compared to marking the volume areas with stencil and only shading those for each light.
Oh of course not - seriously, early stencil reject was just not working... it was happening *after* the shader.You're not using insanely-tesselated volumes (spheres?) for the volume lights are you? (on a unified architecture this may take some of the power you wanted for pixel shading).
I dunno, it seems to me that hardware is getting pretty general and most of the specific API functionality is implemented in a general way in the driver anyways. This is particularly true when you look at the design and flexibility that you get with something like CTM (particularly) or even CUDA. Maybe not this generation, but I don't see a need for a fixed-function stencil buffer in the long run.With regard to your comment that textures "can" do the same thing as stencil, yes, they probably can (especially on D3D10) but you cannot expect the same level of performance as stencil buffering.
I was using stencil for a while with deferred shading to stencil out light volumes (works nicely with z-buffering). However after some benchmarking I realized that while the stencil was *working*, it wasn't actually making it any faster than just shading the whole screen. I spoke to NVIDIA about it and they jokingly suggested that I rename my app to "Doom3.exe" Basically early-stencil seems to work for exactly the case that Doom3's rendering path uses and pretty much nothing else, even in cases where it is theoretically possible as there are no data dependencies.
Yeah but that's all entirely besides the point since ATI has quite possibly the most terrible MRT implementation in OpenGL (which this app was) that I've ever worked with Because of that simple fact, no ATI hardware could even touch NVIDIA 6's and 7's, let alone 8's.Nvidia hardware seems to be a lot more sensitive with stencil. On ATI hardware you should not have any trouble with early-out stencil. In fact, you'll probably see better performance that way in many cases than using dynamic branching.
Cool, although like I said I don't care that much about stencil. It can be useful for a few algorithms but IMHO it's a bit of a hold-over from fixed-function days that's only still there in hardware because of shadow volumes, which I also don't care forOn R600 it should be even better as it has Hierarchical-stencil as well, unlike previous generations that could only reject on the EarlyZ stage. I haven't revisited this topic with R600, but my gut feeling is that early-out with stencil should be better than ever.
I spoke to NVIDIA about it and they jokingly suggested that I rename my app to "Doom3.exe" Basically early-stencil seems to work for exactly the case that Doom3's rendering path uses and pretty much nothing else, even in cases where it is theoretically possible as there are no data dependencies.
Yes it is the same in OGL, but I did that and every other thing they asked, and still no early stencilIn DX you need to set the stencil state to D3DSTENCILOP_KEEP to allow early stencil. See .http://forum.beyond3d.com/showthread.php?p=286194#post286194
Could be the same in OGL ?
It's important to clear the stencil buffer every frame, not just once, even if you completely fill it again and again.Yes it is the same in OGL, but I did that and every other thing they asked, and still no early stencil