I noticed this story concerning some commits surrounding changes for GFX10
https://www.phoronix.com/scan.php?page=news_item&px=Open-Source-Navi-GFX1010.
Subsequent checks of github show additional changes are wending their way through the process.
An un-ordered list of things I ran across in my browsing:
Primarily from
https://github.com/llvm-mirror/llvm...3380939#diff-ad4812397731e1d4ff6992207b4d38fa with some references to some recent commits.
*note that some areas seem more preliminary than others and could change
There's a few feature flags for GFX10 that are interesting:
HasNoSdstCMPX - a change in how the cmpx variant of comparison instructions work, where prior ISAs would write to the execution mask and a vector condition code (an aliased register mapped to 2 scalare registers but also buffered in some way on the vector side). If I'm interpreting the ISA document, this would mean that the non-x variant writes VCC only, and the x variant writes only to EXE rather than both. This may be part of the last feature flag in this group.
HasVscnt - there's a separate counter for stores. This may go with one of the ordered memory flags, or could be related to some of the new instruction encodings. I didn't see where exactly this was used or how, although perhaps it's related to why GFX10 has quadrupled the count for lgkmct. There may be a differentiation in writes versus reads to memory, and the writes may be linked more closely to the export/system message path.
HasRegisterBanking-unclear on what this may entail. This may mean GCN's traditional lack of concern about what register addresses are used at the same time has changed, and that certain patterns may cause stalls. Whether this is in some way related to any of the patents mentioned before isn't clear. For many of them, the register file already had banks of some sort and the addressing logic and wavefront cadence meant accesses didn't conflict regardless of physical banks being present.
HasVOP3Literal - GCN's ISA gave the option for a 32-bit literal after the shorter vector encodings, while the longer 3-operand VOP3 instructions did not. Presumably, GFX10 has altered the pattern for instruction+immediate so that the 64-bit VOP3 operations can have an immediate with a length I didn't see specified.
HasNoDataDepHazard- there's a string stating "Does not need SW waitstates". There are still wait counts, and probably even more of those. This may refer to the ISA doc section on required NOPs for various places where the pipeline doesn't check whether a result has written back to a register or does not forward results immediately. This seems to indicate that many of these special cases are now being caught or have been made unnecessary. That could be from better interlocking of the pipeline or reduced forwarding latency in some places. That doesn't mean there still aren't wait states, as there are new architectural hazard flags for non-data dependencies that appear to map to some of the existing set. As for why, it might be from a revamped implementation of the pipeline, or could have been precipitated by some other concern like new instructions, or concern over an increasing size of the set of hazards. Another random thought is that the wait states needed, and their lengths, aren't quite in sync between Sea Islands--used by the consoles--and later GCN architectures. A backwards-compatible architecture might try to mold its latencies to match, or find a way to forward/stall where hazards could arise.
A few omissions or reappearances of features are also interesting.
GFX10 marks the return of a flag indicating there's no SRAM ECC, which seems expected for a gaming architecture.
A minor reintroduction is FeatureMIMG_R128, which is a bit used for texture resource formats that Vega removed.
A potentially larger omission is the lack of FeatureGCN3Encoding for GFX10. I have seen discussion in various fora that Navi is a repudiation of Vega, and that it's a return to Polaris or something like that. However, the lack of GCN3 encoding flag (and I reviewed some of the opcodes listed in later updates) makes it seem like a significant number of opcodes have been changed to match the console-generation instructions, if they were present at the time. This means before Polaris, Tonga, and Fiji. Architectural advances since Sea Islands appear to still be present, such as the various parallel and packed extensions and scalar memory operations. There are also references to Primitive Order Pixel Shading (POPS) in other scalar ISA commits that were in the Vega ISA doc, message types from Vega not supported by other GPUs, and some things like DLI instructions from Vega 20.*
*One possible caveat: I am not sure whether there's more to interpret from the decision to move the scalar operation flags and others into a separate sub-version 10.1 versus the overall GFX10 set. That may mean some variation of Navi could be missing one or more of these operations, and the lack of the scalar ops would be more like the consoles--though it might be more of a regression than some of the more niche flags.
So GFX10 appears to have a mix of returning some operations in a way that might align it with the consoles, while still having more recent or new features from GFX8 and GFX9. Some changes like the HasNoSdstCMPX change, might be a place where Navi deviates from both the console and PC space.
Other items:
There's some sort of NSA encoding for image (texture) instructions. It's mentioned alongside MIMG instructions in a bug flag for GFX10.
There's apparently an S_INST_PREFETCH instruction that will freeze a shader, which seems odd to document as an available instruction if it's that bugged.
There's a speed model for GFX10 that seems to point to some extra latency in the pipeline. The model doesn't divide cycles by 4 due to the cadence, although many of the high numbers are more consistent with GCN prior if they were. There's a comment about an extra cycle for vector register reads that might throwing off a clean division.