If you look at the block diagrams of GPUs (Nvidia and AMD alike), you'll see they're independent: 1 MC per 64 bits.
There are good reasons for this: it allows redundancy for cut down parts. It's also better for performance.
DDR has minimum transaction sizes that are larger than the width of the bus. Say 8 cycles of 32 bits, or 32 bytes. If you need to read or write only 1 byte, that's already quite a bit of overhead. If you'd simply share the address bus for all parallel DDRs, you'd multiply your minimum transaction accordingly and the inefficiency to access just one byte too.
Now accessing 1 byte is probably not very common, and a contrived example, but the more you increase your minimum transaction, the harder it will become to ensure that useful data is being read or written, in the same way that a VILW CPU will become less efficient as the width goes up.