Delta resource updates

Discussion in 'Rendering Technology and APIs' started by Ext3h, Jul 15, 2020.

  1. Ext3h

    Regular Newcomer

    Sep 4, 2015
    Likes Received:
    This is about the old topic of resource updates.

    If you go through the published material from the past 5 years or so, you essentially only see techniques which are always uploading buffers in whole to the GPU

    What does differ is how the buffers are allocated, and in rare cases you see use of ID3D11DeviceContext::UpdateSubresource / glBufferSubData and alike for updating sub-allocations, but ultimately it always boils down to a direct write to the final buffer, either by discard strategy or synchronous mapping.

    What I personally found to work quite well, are to record only deltas as updates are signaled by the engine (actually: record dirty entities, serialize them into delta in bulk and copy to buffer) and to patch the persistent / resident buffers by compute shaders. And while at it, also switch data layout in that step, from interleaved / packed form for delta recording / upload to formats better suited for further processing on GPU.

    The resident buffers have been pre-allocated with sufficient spare room to grow, and even the initial content is streamed by this technique.
    struct Data {
        // Long list of attributes where updates are correlated
        vec4 position;
    struct DataUpdate {
        uint offset;
        Data data;
    layout(std430, binding = 0) readonly restrict buffer src0
        DataUpdate data[];
    } dataUpdate;
    layout(std430, binding = 1) writeonly restrict buffer dst0
        vec4 data[];
    } position;
    uniform int uUpdateSize;
    layout(local_size_x = 64) in;
    void main()
        uint id = gl_GlobalInvocationID.x;
        if(id < uUpdateSize)
            DataUpdate update =[id];
  [data.offset] =;
    // As you may have realized, this is OpenGL 4.3 syntax, so yes, this approach works perfectly with the older APIs too

    From naive performance tests, that proofed to outperform any other option by far. Specifically:
    • the combination of delta buffer + compute shader outperformed any form of Map, be it either discard strategy with full re-upload, or sparse writes in a synchronous map. Just in terms of raw update time, not even accounting for pipeline stalls.
    • significantly lower overhead compared to any sparse update method provided by the individual graphic APIs.
    • lower VRAM memory footprint compared to any buffer rotation strategy. That benefit even holds if you start to pre-record partial deltas for future frames!
    • trivial to buffer / split updates if amount of data exceeds available transfer volume for a frame.
    What came as quite a surprise was that this approach was quite robust with regard to recording deltas out-of-order, respectively in whatever order had been most efficient from engine perspective (with regard to host side cache hit rate). The samples may not have been representative, but there was no significant slowdown if the destination offsets were sparse or even "randomly" sorted during the initial, dense update.

    Of course this isn't a novel invention (see e.g. page 31), but it's somewhat a surprise to see so little awareness for something which works so well.
    corysama, JoeJ and BRiT like this.
  2. JoeJ

    Regular Newcomer

    Apr 1, 2018
    Likes Received:
    Well, i do it. But never made a comparison with alternatives - thanks for confirmation :)

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.