Problem being if they implemented the flat addressing model and were using a fabric with large dataset. Virtual space could exceed 48 bits while the physical space is even larger. Like you said, it is strange to only add a single bit, but maybe that was sufficient for a partner, all they could manage, or some conceptual design?
You seem to have misunderstood what exactly the flat virtual address space is.
It is just a flattened, segmented view of the process virtual address space, the workgroup memory and the work-item private memory. There are also a few utility segments, but they can be collapsed into the global/private segment.
If in your mind "flat" means flat "across agents", that's not the case at all. Agents are interoperating within the global segment (i.e. the process virtual address space), even for agents that accept coarse-grained allocations to its non-coherent local memory.
Let's say GCN. A flat address would either:
1. lies within the 48-bit platform virtual address space (global/kernarg/readonly);
2. lies within the workgroup memory aperture (group); or
3. lies within the private aperture (private/arg/spill).
The first case doesn't need translation. The second and the third case are handled by subtracting the aperture base, and redirecting the access to somewhere else. Group segment addresses apparently go to the LDS. Private segment addresses would be computed from a base address that AFAIK lies within the 48-bit GPUVM address space, at least for the Linux implementation.
So nope. Addresses that points to the system memory would never exceed 48 bits.