There's been a modest stream of LLVM changes coming out, and a few curious benchmark database entries.
I think the acronym is RDNA. At least from the code changes, I haven't run into any mention of RDNA despite many GCN references and shared flags with GCN GPUs, including many that have GFX10 lining up with older GCN architectures.
Perhaps the lack of mention is due to secrecy purposes, or the RDNA label is not used by a number of staff responsible for supporting the architectures for other reasons like not being communicated to them or not used by them.
I'm not 100% certain I'm reading the autotranslated text totally right, and I'm not ruling out a possible change like dual-issue or a difference in issue latency.
However, based on my (non-authoritative) interpretation and what I think is being said, I think the changes are being misconstrued.
There is a potential difference in how workgroups allocate their wavefronts, with it being possible in one mode for a workgroup to have wavefronts on more than one CU. There are implications as far as what that means for barriers that only had to be supported within a CU when workgroups were limited to one CU each. The memory comments seem concerned about visibility/ordering of workgroup memory accesses in the event that wavefronts are no longer reading/writing to one CU's local cache. This seems pf higher importance in the context of the code those changes were made to dealing with synchronization and writes to possible shared global memory.
There may be something new about this L0, as there is a new bit for coherence, more active discussion of invalidations versus the write-through GCN L1, and a new memory counter. What specifically the L0 is versus the L1 in prior generations isn't clear, as the vector memory path in current GCN does order accesses within a wavefront at least.
I don't think it's the same as the patents' register cache, which is local to a SIMD/cluster and on the wrong end of the memory pipeline to be of any concern for other CUs or wavefronts.
There is a single reference about a register destination cache in
https://github.com/llvm-mirror/llvm...3380939#diff-ad4812397731e1d4ff6992207b4d38fa, which is a different file with a different purpose.
There's some discussion of code comments for the buggy prefetch instruction, and I think some discussion of the size of either the vector cache or L0 that I think may be red herrings. For one thing, the prefetch and I$ comments are dealing with instruction fetch, which is not subject to synchronization operations or atomic writes. Ordering concerns between CUs for static code seems unnecessary. Claims as to the size of the destination cache somehow matching a workgroup don't seem supported in what I have read, and I think they wouldn't be consistent with the patents--I may be misreading the translation, though.