RedditUserB
Newcomer
That would be the case for the results we're getting with MDolenc's tests, yes. (BTW that's a "highway lanes analogy, not a cars analogy )
Basically, if (graphics+compute) time = (graphics time) + (compute time), then at least with this code the hardware isn't running Async Compute.
And that's what we're seeing with both Kepler+Maxwell 1 (which do not support Async Compute by nVidia's own spec) and Maxwell 2.
As far as I can see, there are 3 very odd things with the results so far:
1 - Maxwell 2 isn't doing Async Compute in this test. Pretty much all results are showing that.
Razor1 pointed to someone with two Titan Xs being seemingly able to do Async but it seems the driver is just cleverly sending the render to one card and the compute to another (which for PhysX is actually something that you could toggle in the driver since G80, so the capability was been there for many years). Of course, if you're using two Maxwell cards for SLI in the typical Alternate Frame Rendering mode, this "feature" will be useless because both cards are rendering. The same thing will happen for a VR implementation where each card is rendering each eye.
2 - Forcing "no Async" in the test (single command queue) is making nVidia chips to serialize everything. This means that the last test with rendering + 512 kernels will take the Render-time + 512x(Compute-time of 1 kernel). That's why the test times end up ballooning, which eventually crashes the display driver.
3 - Forcing "no Async" is making GCN 1.1 chips doing some very weird stuff (perhaps the driver is recognizing a pattern and skipping some calculations as suggested before?). GCN 1.0 like Tahiti in the 7950 is behaving like it "should": (compute[n] + render) time = compute[n] time + render time.
MDolenc, any thoughts? Are we interpreting functional Async Compute correctly?