Looking for a good approach to copy the depth-buffer in the middle of a frame

Ethatron

Regular
Supporter
Hy; Yeah, I'm still pushing the limits ... :smile:

Okay, we need a depth-buffer "resolve" in the middle of a render-pass, no z-prepass possible in the moment. This is what I tried:

- Using INTZ depth-buffer texture [CreateTexture,GetSurfaceLevel]
- render first part of the scene
- EndScene()
- create a clone-surface (same parameters as for the initial allocation, incl. DEPTH_STENCIL-USAGE)
- StretchRect()
- BeginScene()

Seems not to work, reason: "The source and destination surfaces must be plain depth stencil surfaces (not textures)".
Then I was reading up really anything what was written, and ATI supposely allows me to copy-"resolve" the INTZ-buffer via the RESZ-hack (it doesn't need to be multisampled). That's too hackish and restricted (some specific cards).

So I'm back at thinking about shader-copying the depth-buffer. The problem is I must not change any renderstate basically. Which makes it necessary to fake a full-screen pass in projection-space. I'm not quite sure how to do that.

- read the world & projection transform
- position a quad at the z-near frustum-corners and multiply it by the inverse world-transform as well as inverse projection-transform? does this give me the correct full-screen quad?
- make a backup of the current vertex- and pixel-shader, the first bound texture, the rendertarget and the z-write/check states
- set pass-through vertex- and pixel-shader, bind depth-buffer to sampler 0, bind depth-buffer target-copy to rendertarget, disable z-fail and disable z-write
- DrawPrimitive of my calculated quad
- push back the backup-values
- continue as usual

Is that a sound concept? It works by applying minimal state-changes. All DX9. I know it's horrible, thanks to nVidia ... ;)

Thanks

Edit: I think it's more performant to do the quad in the vertex-shader isn't it?

Quad-VertexArray with index:
matrix FullScreenQuad[] = {
{1,1,0,0},
{-1,1,1,0},
{-1,-1,2,0},
{1,-1,3,0}
};

Code:
matrix FullScreenQuad;

struct VSInput
{
        float4 Position : POSITION;
        float2 TexCoord : TEXCOORD;
        uint VertexID : SV_VertexID;
};

float4 VSMain(in VSInput input) : POSITION
{
        return float4(FullScreenQuad[VertexID].xy, 0, 1);
}

sampler2D DepthTexture;

float4 PSMain(in VSOutput input) : COLOR0
{
        return tex2D(DepthTexture, input.position.xy * 0.5);
}
In case DX9 doesn't support vertex-id it can be done with vertex-colors-as-index.
 
Last edited by a moderator:
The problem is I must not change any renderstate basically.
? Why can't you just change it and change it back? Presumably you need a shader for this anyways, so assuming you're changing shaders, why not just write the trivial full screen quad VS instead of the matrix projection one?

- position a quad at the z-near frustum-corners and multiply it by the inverse world-transform as well as inverse projection-transform?
You can get the view space frustum corners right by differencing columns/rows in the projection matrix IIRC. Just google it, it's pretty easy. With those you can project back to work space if you so desire.

- set pass-through vertex- and pixel-shader, bind depth-buffer to sampler 0, bind depth-buffer target-copy to rendertarget, disable z-fail and disable z-write
Now I'm confused... if you're changing the vertex shader anyways, why not just bind one that does what you want (i.e. doesn't multiply by the projection matrix)?

You're saving state and restoring it (which I don't generally recommend as an engine model, but whatever) so I don't get why you think there's any constraint on rendering normally? Just save/restore any relevant state.

matrix FullScreenQuad[] = {
{1,1,0,0},
{-1,1,1,0},
{-1,-1,2,0},
{1,-1,3,0}
};
This array is unnecessary - it's pretty easy to compute with math in the shader. But if you need a fallback for lack of vertex ID, just send the vertices (not an index). Vertex transform is never your bottleneck in a full screen pass (nor is passing it in with the input assembler significant slower).
 
Thanks for the answer, I'm sure I come back to them.
Just a quick note, I'm not writing an engine, I'm hacking into the graphics core of a Gamebryo 2.1 game, it's an untouchable binary.

I tried, a lot, and such simple things as turning z-write off and back on for the lifetime of one of the native shaders corrupts the renderer (not only that ones shader output as expectable). It's against what's suppose to happen and against intuition, but it's happening. The logical explanation is that we haven't understood enough of GB inner workings. What do you do when you don't know what exactly happens when you do some thing?

I just try to be conservative. If I can get the depth-buffer without touching anything at all I'd be happy. Sadly I have to bend reality just to get the depth-buffer copy at all. ... :-|

It's frustating ...
 
Okay, I nailed it, the missing hint was that it's possible to feed pre-transformed geometry to DX9.

Here is the segment (more or less drop-in, no calcs):

Code:
  IDirect3DTexture9 *pCurrDT;
  if ((pCurrDT = passDepthT[currentPass])) {
    D3DSURFACE_DESC CurrD;              pCurrDT->GetLevelDesc(0, &CurrD);
    D3DSURFACE_DESC GrabD; if (pTextDS) pTextDS->GetLevelDesc(0, &GrabD);

    /* different */
    if (!pTextDS || ((CurrD.Width != GrabD.Width) || (CurrD.Height != GrabD.Height))) {
      if (pGrabDS) pGrabDS->Release();
      if (pTextDS) pTextDS->Release();
      if (pGrabVX) pGrabVX->Release();

      pGrabDS = NULL; pTextDS = NULL; pGrabVX = NULL;
      if (StateDevice->CreateTexture(CurrD.Width, CurrD.Height, 1, D3DUSAGE_RENDERTARGET /*CurrD.Usage*/, D3DFMT_R32F /*CurrD.Format*/, CurrD.Pool, &pTextDS, NULL) == D3D_OK) {
        if (pTextDS->GetSurfaceLevel(0, &pGrabDS) == D3D_OK) {
          void *VertexPointer;

      /* pixel exact screen-quad */
          const float width  = CurrD.Width  - 0.5f;
          const float height = CurrD.Height - 0.5f;

          CameraQuad ShaderVertices[] = {
            {-0.5f,  -0.5f,  0.5f,  1.0f,    0.0f, 0.0f},
            {width,  -0.5f,  0.5f,  1.0f,    1.0f, 0.0f},
            {-0.5f, height,  0.5f,  1.0f,    0.0f, 1.0f},
            {width, height,  0.5f,  1.0f,    1.0f, 1.0f}
          };

          SceneDevice->CreateVertexBuffer(4 * sizeof(CameraQuad), D3DUSAGE_WRITEONLY, CAMERAQUADFORMAT, D3DPOOL_DEFAULT, &pGrabVX, 0);
          pGrabVX->Lock(0, 0, &VertexPointer, 0);
          CopyMemory(VertexPointer, ShaderVertices, sizeof(ShaderVertices));
          pGrabVX->Unlock();
        }
      }
    }

    if (pGrabDS) {
      ShaderManager *sm = ShaderManager::GetSingleton();

      IDirect3DBaseTexture9 *pCurrTX;
      IDirect3DVertexShader9 *pCurrVS;
      IDirect3DPixelShader9 *pCurrPS;
      DWORD dCurrCL;
      DWORD dCurrAB;

      SceneDevice->GetTexture(0, &pCurrTX);
      SceneDevice->GetVertexShader(&pCurrVS);
      SceneDevice->GetPixelShader(&pCurrPS);
      SceneDevice->GetRenderState(D3DRS_CULLMODE, &dCurrCL);
      SceneDevice->GetRenderState(D3DRS_ALPHABLENDENABLE, &dCurrAB);

      SceneDevice->BeginScene();
      SceneDevice->SetRenderTarget(0, pGrabDS);
      SceneDevice->SetDepthStencilSurface(NULL);
      SceneDevice->SetTexture(0, pCurrDT);
      SceneDevice->SetVertexShader((IDirect3DVertexShader9 *)sm->cqv->pDX9ShaderClss);
      SceneDevice->SetPixelShader ((IDirect3DPixelShader9  *)sm->cqp->pDX9ShaderClss);
      SceneDevice->SetPixelShaderConstantF(0, (const float *)&sm->ShaderConst.ZRange, 1);
      SceneDevice->SetPixelShaderConstantF(1, (const float *)&sm->ShaderConst.proj, 4);
      SceneDevice->SetRenderState(D3DRS_CULLMODE, D3DCULL_NONE);
      SceneDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, FALSE);

      SceneDevice->SetStreamSource(0, pGrabVX, 0, sizeof(CameraQuad));
      SceneDevice->SetFVF(CAMERAQUADFORMAT);
      SceneDevice->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, 2);
      SceneDevice->EndScene();

      SceneDevice->SetRenderTarget(0, passSurface[currentPass]);
      SceneDevice->SetDepthStencilSurface(passDepth[currentPass]);
      SceneDevice->SetTexture(0, pCurrTX);
      SceneDevice->SetVertexShader(pCurrVS);
      SceneDevice->SetPixelShader(pCurrPS);
      SceneDevice->SetRenderState(D3DRS_CULLMODE, dCurrCL);
      SceneDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, dCurrAB);

      bDFilled = true;
    }
  }
Maybe it can be made shorter, I don't know. I read the overhead of the device-functions are somewhere between 2000-8000 cycles. I which it would be less calls, but .. it just matters I got the depth-buffer finally:

OBGEv3-zbuffer.jpg
 
Back
Top