Ext3h
Regular
This is about the old topic of resource updates.
If you go through the published material from the past 5 years or so, you essentially only see techniques which are always uploading buffers in whole to the GPU
What does differ is how the buffers are allocated, and in rare cases you see use of ID3D11DeviceContext::UpdateSubresource / glBufferSubData and alike for updating sub-allocations, but ultimately it always boils down to a direct write to the final buffer, either by discard strategy or synchronous mapping.
What I personally found to work quite well, are to record only deltas as updates are signaled by the engine (actually: record dirty entities, serialize them into delta in bulk and copy to buffer) and to patch the persistent / resident buffers by compute shaders. And while at it, also switch data layout in that step, from interleaved / packed form for delta recording / upload to formats better suited for further processing on GPU.
The resident buffers have been pre-allocated with sufficient spare room to grow, and even the initial content is streamed by this technique.
From naive performance tests, that proofed to outperform any other option by far. Specifically:
Of course this isn't a novel invention (see e.g. https://on-demand.gputechconf.com/g...bisch-pierre-boudier-gpu-driven-rendering.pdf page 31), but it's somewhat a surprise to see so little awareness for something which works so well.
If you go through the published material from the past 5 years or so, you essentially only see techniques which are always uploading buffers in whole to the GPU
What does differ is how the buffers are allocated, and in rare cases you see use of ID3D11DeviceContext::UpdateSubresource / glBufferSubData and alike for updating sub-allocations, but ultimately it always boils down to a direct write to the final buffer, either by discard strategy or synchronous mapping.
What I personally found to work quite well, are to record only deltas as updates are signaled by the engine (actually: record dirty entities, serialize them into delta in bulk and copy to buffer) and to patch the persistent / resident buffers by compute shaders. And while at it, also switch data layout in that step, from interleaved / packed form for delta recording / upload to formats better suited for further processing on GPU.
The resident buffers have been pre-allocated with sufficient spare room to grow, and even the initial content is streamed by this technique.
Code:
struct Data {
// Long list of attributes where updates are correlated
vec4 position;
...
}
struct DataUpdate {
uint offset;
Data data;
};
layout(std430, binding = 0) readonly restrict buffer src0
{
DataUpdate data[];
} dataUpdate;
layout(std430, binding = 1) writeonly restrict buffer dst0
{
vec4 data[];
} position;
...
uniform int uUpdateSize;
layout(local_size_x = 64) in;
void main()
{
uint id = gl_GlobalInvocationID.x;
if(id < uUpdateSize)
{
DataUpdate update = dataUpdate.data[id];
position.data[data.offset] = update.data.position;
...
}
}
// As you may have realized, this is OpenGL 4.3 syntax, so yes, this approach works perfectly with the older APIs too
From naive performance tests, that proofed to outperform any other option by far. Specifically:
- the combination of delta buffer + compute shader outperformed any form of Map, be it either discard strategy with full re-upload, or sparse writes in a synchronous map. Just in terms of raw update time, not even accounting for pipeline stalls.
- significantly lower overhead compared to any sparse update method provided by the individual graphic APIs.
- lower VRAM memory footprint compared to any buffer rotation strategy. That benefit even holds if you start to pre-record partial deltas for future frames!
- trivial to buffer / split updates if amount of data exceeds available transfer volume for a frame.
Of course this isn't a novel invention (see e.g. https://on-demand.gputechconf.com/g...bisch-pierre-boudier-gpu-driven-rendering.pdf page 31), but it's somewhat a surprise to see so little awareness for something which works so well.