I would say the main problem with AFR is not the actual copying that may be necessary and the bandwidth needed for that, but the synchronization. For instance take a simple exposure implementation. GPU0 renders its frame. Then it averages the pixels to compute overall exposure. This ends up in a 1x1 render target. In the next frame the frame brightness is adjusted using this render target as input. GPU1 now needs to wait until GPU0 is finished rendering to the render target. Although the data copied only amounts to just one pixel, each GPU ends up idle most of the frame just because it doesn't have all its data ready from the other GPU. Even if the GPUs had a shared memory pool it wouldn't help, you'd still see scaling of say less than 10%.