There's nothing at all true about that. CELL has a full 64-bit address space. The limit is a fallacy created by assumptions around the fact that CELL simply has two XDR 32-bit controllers. With x16 (16-bit bus) XDR DRAM devices, you can only put 4 DRAMs to those two controllers (4 DRAMs * 512 Mbit = 256 MB). XDR DRAMs can scale all the way down to x1, so you can actually put 36 devices to a controller (32 data + 4 ECC). It just so happens that nobody is manufacturing x1 or even x8 XDR DRAMs, and besides which, keeping the number of DRAMs down does indeed enable you to make the unit small and compact. If there were 1 Gbit XDR DRAMs in mass production, then they'd use them if only to show up with the biggest specs and stick their tongues out at MS.
The access to opposite RAMs is not direct. There's the FlexIO bus connecting CPU<-->GPU. The CPU can queue requests to the GPU's memory controller giving its device ID, so that whatever it requests are directly transferred. The GPU does likewise. It's not a direct link, so there's added latency, all right, but from the software standpoint, it's supposedly transparent (the OS direct maps all the memories in the system to specific ranges in virtual memory space -- at least, that's what I got from the presentation slides).