Significantly indeed: at ×64, Tonga goes from ~10 FPS to ~40 FPS, which isn't very obvious since this is on a log scale. I wonder what's going on in the chip.
If I recall correctly, there was a possible bottleneck past modest factors when the amount of domain shader wavefronts becomes large relative to the wavefronts needed at the vertex or hull stages. Jawed posited that DS wavefronts were being locked to the same CUs as the HS wavefronts that were feeding them input, and the Xbox One SDK mentions this scenario as well. After that point, even though the workload would strongly benefit from more DS wavefronts, the occupancy of the CUs puts a ceiling on the domain shader portion.
The indicated alternative was an off-chip mode that streamed to memory, which could break the restriction that domain shaders had to run on the increasingly cramped CUs the originating hull and vertex shaders ran on, at the cost of bandwidth and latency.
The lower factors in the graph may be reflective of the scenario where GCN's preferred on-chip method could still be used, or is most likely effective.
I guess I'm not entirely sure that I would say that the driver changes just flipped the switch between on-chip and off, since there are gaps in performance at the lower factors that could be where bandwidth is affecting performance.
If it is on, then the 16x threshold might represent the threshold where the SDK said that devs should benchmark that mode, since it might not help.
It doesn't seem to be a purely memory or CU-based problem, since Tonga without the beta drivers degrades somewhat like Hawaii.
That hump at the later part of the Hawaii curve might be where the GPU gets out of the range where the memory subsystem's latency or thrashing is a factor due to the amount of data or the GPU/driver giving up on its optimization efforts. Then its raw bandwidth and CU count over Tonga takes over, at a markedly lower level.
Perhaps Tonga at 5.15 has changed its algorithm. I'm drawn to the idea that an architectural change in Carrizo for preemption might mean that a similar feature could be in newer IP like Tonga. Rather than fiddle with buffering at high factors, maybe Tonga has synched the stages better, or can pause the stages feeding DS launch before they can spill or thrash.