D3D11 stream output & compute shader

Novum

Regular
Is there a way to process the data from the stream output stage with a compute shader?

Untill now I couldn't find one. Is this a black spot in the API?
 
Last edited by a moderator:
You can't use DrawAuto, but you can consume streamout data with a compute shader if you know how much data was streamed out.

I don't know the API commands, but I don't see why it wouldn't be possible.
 
I had hoped that this could stay entirely on the GPU. The only way to get the size of the streamed out data is to do a query :(
 
As far as I know, you can't determine the size of the stream output in a compute shader. Therefore DispatchIndirect is not useful.

DrawIndirect doesn't help me at all, because it's a draw call.
 
Why not write the size out during the stream out to an indirect buffer? then call compute indirect.

During a stream out you can easily add an atomic add to compute the size yourself and output [N, 1, 0, 0] to the indirect instance buffer (or computer version of it).

I have added debug line drawer to compute shaders and cuda this way, an atomic updates number of line, whilst streaming out vertex data (note I don't do it using the stream API so maybe i've overlooked something?)

Deano
 
However can you use another stream or pixel shader to output the count? Maybe a prior CS to convert it into indirect buffer?

Read that from the CS for length.
 
I can't count primitives in the pixel shader, because the pixel shader runs for each pixel generated by the rasterizer stage.

Also I want to avoid using the pixel shader, because just binding it and rendering nothing slows down things considerably.
 
Last edited by a moderator:
Hmm tough problem. Here's a crazy question... is this output something that you could easily generate in a compute shader yourself, or are you using tessellation or something in the fixed-function pipeline to generate the triangle stream?

There are of course some hacks you could do with various performance impacts. For instance you could probably have your GS instead of emiting the primitive, write the primitive data to attributes and emit a single point primitive somewhere on the screen, then have the pixel shader append the triangle to an Append buffer (rather than a stream out buffer) and disable color writes. Your render target would be irrelevant and could probably just be 1x1. This definitely isn't 100% efficient but might work...
 
Hmm tough problem. Here's a crazy question... is this output something that you could easily generate in a compute shader yourself, or are you using tessellation or something in the fixed-function pipeline to generate the triangle stream
I would like to use the tesselation unit. Emulating that in a compute shader would certainly possible, but not very performant. Using the pixel shader is costly as I said, but I didn't yet try to disable color writes. But I doubt that will make much of a difference.

One thing I could try is to use D3D11_QUERY_SO_STATISTICS and double or triple buffering to hide the latency.
 
Using the pixel shader is costly as I said, but I didn't yet try to disable color writes. But I doubt that will make much of a difference.
Well make sure to render point primitives - one per polygon that you want to output - and disable color writes. The key here is to just invoke a pixel shader 1:1 with the geometry shader (and try to avoid the rasterizer as much as possible) so you can use UAVs. Silly and not really the best performance but it should work.
 
As I said just by enabling the pixel shader I get a performance drop from 33 fps to 25 fps. But perhaps I have to live with that.
 
Back
Top