Jawed
Legend
Maybe there's some data available on the performance of Star Swarm with various NVidia drivers, so we could see if there's been a change in performance.Nvidia would be very good at doing what Oxide is doing, or its inherent overhead is low enough to give that much leeway for the analysis, or some combination of the two.
I have no idea why Star Swarm uses so many draw calls. It may be separating work into draw calls for no good reason and NVidia has tuned into that. It could also be that there are certain kinds of parallelism in NVidia's GPU state (e.g. a simple ping-pong state change model, where a state change can be set up in hardware across the chip, while work for an existing state is still under way, then a simple flip cuts over "instantaneously" to the new state) that enables the GPU to move the bottleneck deeper, beyond the CP.
Perhaps NVidia has a near-stateless architecture, such that most work is distributed with piecemeal "state" solely for its own use? Not just bindless resources, but "bindless state".