The vector destination cache mentioned in the second sentence is potentially related to the destination register cache in the LLVM changes. A similar destination cache at the output of the ALUs is mentioned in the so-called super-SIMD concept (
http://www.freepatentsonline.com/y2018/0121386.html).
What's curious to me is that both have diagrams of a representative SIMD based on the "old" architecture, which has multiple register banks. So why GFX10 would be labelled as having a banked register file even if it implements something like those patents is unclear, unless something else can somehow alter the throughput of those banks in a way that a shader can detect. (Also unclear, how different the "old" operand network is from the new one.)
The super-SIMD patent labels each bank as belonging to one of the rows of a wavefront--which correspond to different cycles in the cadence. That patent lists the register file as being multiple banks, with registers 0 through N in each bank.
The register file patent, on the other hand, numbers the registers more as a global count (V0,V4,V8 in bank 0, V1,V5,V9 in bank 1, and so on).
That can come about by a designer specializing in specific subsections of the architecture seeing things in terms of their chosen specialty, so the patents could be using different language for the same thing. It's also possible that the different emphases can lose correctness for the parts outside the scope the individual design element, or they may not be describing the same exact embodiments. The way the register cache is banked, and how it is connected to the operand network is not rendered to the same depth in both.
Even so, I'm not sure what in GFX10 would make this worthy of an external target flag unless there's some specific combination of claims or omissions from one or both that makes GFX10 act differently in practice.
late edit:
Also, the register file one does indicate the destination cache is banked, but going by my understanding it is banked in a way that should line up with the established cadence.
There is a new source of stalls possible with the cache, if the ALU cannot allocate an entry in the cycle output is to be written out. That is new, and might be part of the +1 latency mentioned in the GFX10 changes, but what it takes to get that kind of stall and whether this is affected by banking is not clear.