I would look at the allocation granularity for hints (which has grown by a factor 2 from RDNA1 to RDNA2 and will go up an additional +50% [3 times that of RDNA1] with N31/N32).What I meant is — those are the two choices, but the patch did not suggest which.
If anything, an earlier VOPD patch on the bundling algorithm suggested it might be sticking to 4 VGPR banks still, though it could still be updated at any time.
How do you do register allocation that is both relatively easy to do and has also the property of maintaining an equal distribution of the allocated registers over all banks? For sure you don't want to allocate different number of registers in different banks. That means the allocation granularity will always be a multiple of the number of register banks, right?
By the way, the number of register banks is not necessary the same as the number of ports to the register file (in GCN and RDNA1 it may have been), as you usually have some kind of crossbar between the register file banks and the register file ports (the actual SRAM for the GPRs in GPUs is likely single ported, so we have a pseudo multiported setup as it is way cheaper to implement). Keeping the number of ports constant while increasing the number of banks just reduces the probability of bank conflicts but does not increase the register file bandwidth.
TLDR: The increased allocation granularity tells us in pretty certain terms that the number of banks went up, but it does not necessarily mean an increased register bandwidth. But that would definitely help for the dual/co-issue stuff as we need in total (source and destination) 6 operands per clock for the VOPD FMACs for instance. And having some reserve for the other stuff going on may be good.