It isn't bad but it might not be great,
A later post stated that garnering benefits would require software to be written to take advantage of it. Perhaps the first statement would have been worded better if it stated current or existing game engines would not benefit, which in terms of timing is the same as saying "modern" without the connotation that what current engines do is the optimum solution.
Code targeting existing hardware would be paranoid about any methods that might have synchronization occurring out of step with the SIMD hardware. As noted by Nvidia, one way to prevent problems is to use very coarse locking over the whole structure, even if contention is likely to be rare. Transforming the structures or algorithms so that they bulk up to match SIMD width can constrain storage or bandwidth, and may be impractical to implement.
Depending on how Volta's selection logic works, there may also be some side benefits in a divergence case, if each path is allowed to issue some long-latency instructions like a memory access. That could overlap some memory latency, potentially reducing some of the cost of occasional divergence at the price of potentially contending for cache and cache bandwidth (which Nvidia significantly increased).
I recall synchronization in the presence of divergent control flow within a wavefront being cautioned against. The specific balancing points each architecture might take for which path is selected to be active can have synchronization patterns that will deadlock them, so long as an active path that doesn't explicitly reach a point that the execution mask is switched so that other lanes that have data it needs to make meaningful progress.In regards to the deadlock, to my understanding, a branch would run ahead and create an infinite loop. On GCN, for example, the scheduler would be selecting waves based on some metric.
An infinite loop should bias the selection rather quickly. So it may be slow, but it shouldn't lock.
If this is positing that the existing scalar unit can do this automatically, ascribing intelligence of that sort to existing hardware seems overly optimistic.Excluding a condition where nothing could advance, which really is a software issue. The scalar unit might be able to detect and resolve those issues as well during runtime.
If this is some hypothetical:
If done in software, that goes to the point of what kind of abstraction the vendors supply. Nvidia is attempting to seal some of the leaks in the SIMT abstraction. GCN at a low level has backed away from that abstraction somewhat, but AMD doesn't seem to have forsworn the convenience+glass jaw at all levels.
Even if done in software, the exact methods employed may be impractical to employ throughout loops, or require more complex analysis of the code since there may be built-in assumptions about whether the other paths do or do not make forward progress while the current path is active.