The new SIMD configuration in each Compute Unit could be 4 x 32 instead of 4 x 16 or 2 x 32.
Of course this is only speculation from my part.
Unlike a "CU" in the GCN linage resembling a complete "core", "CU" in the RDNA lineage has become sort of an abstract rectangle box of 2 "SIMD"s sharing the memory pipeline (incl. texture and RT) and L0 Cache.
"SIMD" is the complete core in RDNA, with almost all CU-level blocks in GCN having become "SIMD"-dedicated resources in RDNA. They have now also 2x the L0 cache capacity & bandwidth per SIMD lane (so as CDNA 3, by the way). I can't see them walking back on any of these changes. These all make "more SIMDs in a CU" seem more far-fetched than ever IMO — because that would have reduced the L0 capacity & bandwidth per SIMD lane, undoing the bump.
What most likely would happen IMO is:
1. New-but-still-32-wide "SIMD" architecture with e.g. the CDNA-style Matrix Core and CDNA 3's (presumably) proper dual-issue.
2. Wave64 mode stays for graphics (?) and to enable easier porting of existing GCN kernels.
3. Stack more CUs/WGPs — Heck, they introduced the middle-level cache (L1) in RDNA to help simplify the data fabric... which is a strong indicator of "more WGPs in an SE, then more SEs" being the intended scaling dimensions.
4. Don't bother with stripping out texture and RT units for big compute chips. Leave them in as dark silicon.
Voila, you get the one unified IP block to rule all GPU products.
I would errr on anything else that sound spectacular or novel. Well... unless you are very keen on some previous community-favourite speculations, like Super-SIMD.