This VOPD related revision looks juicy:
reviews.llvm.org
PostRA sounds like code that's specific to handing ray accelerator query results, i.e. hits and misses. I can't actually find any code that's specifically related to that. Work-in-progress code...
I get the sense that this is a well-known technique:
Code:
/// Adapts design from MacroFusion
/// Puts valid candidate instructions back-to-back so they can easily
/// be turned into VOPD instructions
/// Greedily pairs instruction candidates. O(n^2) algorithm.
The other thing that's notable about this code is there is no concept of the destination operand cache (Do$) that we've talked about as intrinsic to Super SIMD. So the checks upon sources being from different register banks (GCNVOPDUtils.cpp) enforce those constraints and nothing more. Again, work-in-progress code... Or maybe Do$ and VOPD are just banned from interacting with each other in the hardware. Seems strange, because that's a scenario requiring high register read bandwidth.
Also, it seems puzzling to me that this coding effort is happening now and not years ago when hardware design options were being evaluated. Naively, I'd expect AMD to have a hardware simulator which runs code according to various design options. That code can either be hand-written, or compiled. These efforts imply that if AMD does have a simulator, the options were evaluated with hand-written code... I'd say that's a pretty serious design-cycle gap, putting a very low ceiling on the complexity of design-option evaluation. OK, this is tricky stuff (compiler evolution seems glacial), but we are talking about the fundamental internal practices of a megacorp...