I guess in RDNA4 the WMMA op's are handled by an actual dedicated ALU, but very tightly integrated with the vector units i.e. sharing the same data path and issue port so concurrent execution is not possible, just like on RDNA3, but the capabilities are significantly enhanced -- extended type...