What is it polling? Something that resides in 300ns memory, something in the FIFO or in the post-transform path that feeds into the primitive assembly pipeline?It wouldn't retroactively decide, but poll input from prior geometry as one possible test.
The specific reference in the presentation linked earlier was the performance penalty of an indirect draw that was empty. Just the initial read was a memory operation, and the single command processor's limited concurrency means the system at large can feel the effects of more than a few such requests.
There would be no primitive to evaluate in order to poll anything, and even in the conventional pipeline the system wouldn't get far enough to consider polling anything.
That scenario is not served well by a primitive shader, which runs in the middle of the vertex processing stages. The empty draw's primitive shader cannot poll anything since that initial step on the part of the command processor occurs before the primitive shader's instance comes into existence--and it won't because there's nothing to invoke it for. The ballot process that removes empty draws happens in a prior separate shader call, and while I've seen AMD discussing culling possibilities for triangles, I have not seen it promise overriding or replacing developer-coded draw call submission.
For an engine that already does much the same thing as promised by primitive shaders, this may be a case where primitive shaders do not help as much or could be detrimental if the duplication of work or contention with the ongoing compute shaders is significant.
Is the inspiration for the your statement concerning 300 instructions per triangle from the Frostbite presentation, or was it pulled from something else?
What exactly would that entail, for a general programmable shader in the GCN ISA? If this starts invoking reads to caches or buffers, the prospect of misses (latency), conservative hierarchies (inaccuracy), or timing issues comes into play.Move Z culling into the primitive testing at a per bin resolution.
My understanding of the claim is that this is all a big programmable shader--the mechanism is presumably "code something".There would also be some mechanism to analyze or reduce the bins.
Is this what you're suggesting would be a good idea, or what you are suggesting AMD is doing?That's roughly what I'm suggesting. All of that running on a CU until satisfied with whatever bins we're created. A single draw call possibly instantiating only one wavefront for establishing bins.
I think a quick skim of the primitive shader section of the Vega whitepaper would serve to counter the latter.
That this big shader is tracking over at least three points where the architecture hands off between concurrent hardware units with amplification/decimation possibilities and a FIFO is at least one point where I would have questions about the former.
Intresting Patents found:
https://www.google.de/patents/US20140362102?dq=ininventor:"Michael+Mantor"&hl=en&sa=X&ved=0ahUKEwiRw9np97jXAhUM9YMKHYs2CjkQ6AEITDAF
https://www.google.de/patents/US20160371873?dq=ininventor:"Michael+Mantor"&hl=en&sa=X&ved=0ahUKEwiRw9np97jXAhUM9YMKHYs2CjkQ6AEILjAB
Intresting:
https://www.google.de/patents/EP3008701A1?cl=en&dq=ininventor:"David+Simpson"&hl=en&sa=X&ved=0ahUKEwim0rv_-bjXAhVMkJAKHXnPCScQ6AEIvQIwJg
Maybe this is primitive shader?
The patents not covering the binning rasterizer cover using compute as way to submit geometry to the front end, which in recent times has also a shown up as a graphics API extension.
One discussed variation that includes culling and compute+vertex pairs linked by ring buffers is quite close to what Mark Cerny publicly described for the PS4's triangle sieve customization (note: that culling was not a guaranteed benefit).
What AMD is promising for primitive shaders is not something new or necessarily limited to AMD's architectures. That other patents from other companies have been cited makes it probable that a few other parties likely have similar patents or concepts based on them.
Primitive shaders currently are more of a tweak in where AMD intends to place the culling in abstract graphics pipeline and whether formerly separate shaders are combined into a one shader.
Perhaps some of the difficulty getting primitive shaders rolled out is baked into the age of some of this discussion. Some of Vega's features have a documented lineage that goes back to the early days before some of the more advanced techniques had the chance to be developed and deployed in various engines, and the apparent lag in Vega's IP features getting to market may have meant the workloads it was meant for have moved to a point that its gains are less impressive.