Most of the latency of main memory comes from the fact that its DRAM rather then SRAM. If standard DRAM wasn't so much slower then SRAM when it comes to latency then I'm sure CPU/GPU manufacturers would use DRAM as cache ram, considering it uses 6 times less transistors. As I said GC's 1T-Sram was regularly praised by developers as being extremely efficient/forgiving main memory. Some described it as being almost like having one massive cache.
That's basically due to the refresh cycle isn't it? When it's on-die it becomes a big deal, but when it's not, the memory controller and signalling time takes over. I doubt the benefits of 1T-SRAM are that dramatic. You simply aren't going to have 20 cycle main memory accesses like you would in L2 cache. It's still probably a hundred cycles or more on average and I can't see this improving dramatically ever.