Stuff like that I'd have thought would be better off handled in customised hardware, such as having a PS4 execution mode that maps a whole load of stuff differently. Even if so, the wavefronts are executed completely differently on RDNA than GCN so the code flow will be different. If the wrong number of CUs is enough to break stuff, wouldn't the change to RDNA be even more impactful with code working at that low a level? And what are the GCN-compatilbility features of RDNA and are they removed in RDNA2??
Wave64 mode looks like it provides a partial foundation for handling the change in width.
It's not sufficient to carry the architecture all the way to transparent backwards compatibility, since RDNA issues two instructions per single original GCN op.
However, some operations in RDNA had small changes made to allow them to behave differently in 32 and 64 lane scenarios. The read and write lane operations can apparently be given larger register ID values to cover for the larger number of lanes.
Other cross-lane operations introduced in later GCN generations are more restricted in how they can read from lanes in RDNA, making the change in width explicit. Sea Islands compatibility isn't affected since it predates those operations. A theoretical enhanced compatibility mode might be able to create a decode method that would extend instruction behavior automatically to wave64 mode without requiring the second instruction issue. There's a direct path for the hardware's decision making at the point of instruction decode in that mode.
There are other issues with RDNA that would require workarounds, such as missing instructions and some bugs that will likely need fixing.
A change to CU count, if PS4 code was allowed lower-level access, might leave the hardware hamstrung if the code can set or read back CU counts or CU bit masks. The hardware wouldn't be able to analyze the overall shader to know whether it can safely run past the 18-CU limit, treat it as if a barrier of some kind existed there, or if it needs to map a given command to a given CU or a second CU in the PS4 Pro. A query of CU values (activity mask, hardware ID) leaves the hardware stuck with no means of communicating higher CU numbers to code that isn't expecting higher values.
Desktop RDNA on Navi 10 may drop instructions from previous GCN, but PS5's RDNAx definitely does not.
Otherwise the 9 WGP @ 800MHz and 18 WGP @ 911MHz tests wouldn't make any sense.
Unless you're assuming the github leak might be false.
There are ways of catching unsupported operations, or ways of implementing extra paths to support them. That's not the same thing as supporting them in a cycle-exact manner.
One difference I made note of for RDNA is its 25% longer VALU latency, which doesn't affect instruction support while having a realistic chance of changing performance. Re-engineering the SIMDs to have a register file and ALU pipeline of equal latency to GCN is one approach, although more difficult to implement if the idea is the hardware can also support RDNA execution at RDNA+ clocks.
Some removed instructions/modes include VSKIP and the fork/join instructions for complex control flow. The loss of the former seems to indicate that this part of RDNA is quite different from GCN, and the loss of the latter seems to make sense since the branch-handling stack has baked-in assumptions about how deep it needs to be with 64-wide waves. RDNA's caches are different and may have corner cases concerning changes in alignment and preferred granularity, which can effect timings.
I'm not discounting some undisclosed way to make RDNA behave that differently from its new pipeline arrangement in hardware, but there are other methods that aren't slavishly beholden to cycle-exact behavior. Sony has patents for being able to cleanly support changing parameters like clock and timings on different hardware. So in that regard, I think there is some kind of context missing for the backwards compatibility modes under discussion that have numbers that don't make allowances for such things.