To finish a thought I started earlier about clauses, RDNA introduces an instruction that defines a mode where long streams of certain instruction types can monopolize the wavefront scheduling for that type. The CU will no longer let another wavefront issue instructions of that type until the current wavefront reaches some kind of exit condition.
The instruction, S_CLAUSE, will let a wavefront get exclusive instruction issue in subsequent cycles for whatever type of instruction comes immediately afterward (if of the following types: VALU, SMEM, LDS, FLAT, Texture, buffer, global and scratch). Most of these are some kind of memory access, outside of VALU. The clause continues until an instruction of a different type is encountered, at which point it automatically ends. There may be a number of other exit conditions, as there seems to be a mention of a numerical limit for scalar memory.
One possible source of confusion here is that AMD has used the word clause in different ways. VLIW GPUs had clause types similar to this, with ALU clauses and a variety of memory-access type clauses. The details of what makes up those memory types has changed. For example, the LDS type has been promoted to be its own clause, and vertex fetch isn't its own type.
VLIW clause instructions are heavier in weight, have explicit types instead of checking what the next instruction type is, and have explicit counts for how many instructions they contain instead of automatically detecting when a new instruction type is fetched.
The monopolization of instruction issue is somewhat akin to VLIW, although the wording in the RDNA doc seems less clear on whether it's monopolizing all instruction issue or just the issue of the specific type within a CU.
For GCN GPUs, the clause term showed up on occasion, but this seemed to be more of a description of when many instructions of the same type occurred in a row rather than a hard-defined clause. For RDNA, these seem to have been been renamed as instruction groups, while the term clause has been promoted to a run of the same type of instructions whose issue is architecturally-enforced with S_CLAUSE.
What penalties and benefits there are aren't wholly spelled out. VLIW had large ~40 cycle penalties for changing clauses, but RDNA's GCN heritage would give a much lower overhead due to it switching wavefronts more often. Whether that means zero switching overhead isn't clear, I think there is some loss if switching happens too often.
On the other hand, a clause monopolizing instruction issue would presumably hurt CU throughput, although it may allow for faster run-through of phases of execution like setup or writeback that might benefit from not having interference from other wavefronts.
(edit: Clauses also showed up with ARM's GPU architecture for arranging execution (not type based?). They were ISA elements in Bifrost that were then dumped with Valhall.)
It used to be far easier to reverse-engineer the silicon with semiconduction production processes of the past; here is
the ARM1 die from 1985, produced at 1 µm. Today's 7 nm should be like 150 times smaller, even the best
die shots by Fritzchens Fritz are unable to resolve such fine details.
OK, I'm oficially baffled as to why only 10-20% of the die surface is seemingly used for the actual logic, and why a thick copper layer covers the entire remaining surface (
it has to be sanded off to reveal the die).
I'm pretty sure it's some kind of flip-chip integration, so the upper surface should be the closest to the transistor layer. What's above it is the silicon substrate, which should be uninvolved with internal interconnects. All of the metal layers would be below that surface (hence flip-chip), and I assume what is being scraped off isn't those layers since I didn't see any solder balls or other elements on the surface.
The now-top of the die has some layer of silicon substrate, which could be thinned if desired or left for mechanical stiffness. I'm not sure if Polaris had something plated onto it or deposited. Zen did have an alloy plated onto that part of the die to allow for soldering to the heat-spreader.
The description is that it's sanding through the IHS, if that's what is on top of the die.
If so, why these sparsely placed blocks of silicon are even visible, and why the usable die area is so small with the rest being just copper layer?
If the blocks you mean are those on the perimeter, those would mostly be PHY and analog devices. Those are physically larger since they operate at different and frequently higher voltages, and analog properties are more closely aligned with physical dimensions. I think their implementation can lead to them etching much deeper into the silicon, and so in the reverse situation scraping from below would reach elements of them sooner.
I thought it could be some fancy
multi-layer 2.5D package where SRAM is attached with TSVs to the actual logic below and the substrate wafer seves as an interposer for the SRAM layer - but then those fixed-function 0.18 micron videochips from 2000s look pertty much the same:
Perhaps that's a wire-bonded chip rather than flip chip? An earlier picture showing the plastic enclosure being broken seemed to show wire bonding, and in that case we wouldn't have the flipped order of transistor and metal layers of modern chipos. Then, the PHY and non-transistor layers would be sanded through first.