Already done. 780G and later motherboards with sideport memory give you the option in the bios to switch between Sideport only, UMA only and Sideport+UMA.
I don't think it would be much harder to implement a driver-enabled "high performance mode" with the Sideport enabled and the rest of the time just use the UMA.
You need seemless switching. Off at idle or light load, on otherwise (also I'm not sure those boards actually powered the memory down, given it was ddr2/3 it probably didn't draw much power and on the desktop noone cared anyway, while it saved power for the notebooks as otherwise you had constant HT/MC fetches even for display scanout).
Is it that much more?
Bloomfield (3-channel) has 200 more "pins" than Lynnfield (2-channel), and Lynnfield actually has 40M transistors more because of integrated PCI-Express and DMA.
It might not be that much more but it's still a budget cpu, after all. There is significantly more room for such things on the high end.
I don't really understand what you mean by that. AMD has been using motherboards using Sideport+UMA combinations for several years, increasing the IGP's performance. What's so different here?
That was rather primitive and it didn't really help performance all that much (cause sideport was very low bandwidth). But if both main memory and side port have similar memory bandwidth (as it would be with 64bit gddr5) I'm not sure that scheme would be sufficient. You could think about framebuffers in gddr5 sideport, textures in main memory or something, but the needs might also be dictated for what parts of the memory you still want to be able to access it with the cpu (with reasonable performance). Not saying it's impossible just that it probably gets a bit complex.
That's the thing. How much performance would the GPU get for using L3 cache, if at all? Isn't there a good reason why there haven't been any mid-to-high end GPUs using eDRAM, for example?
Increased memory bandwidth has shown to drastically change Llano's results (25% more gaming performance with 33% higher bandwidth).
No doubt. I think if you're only looking at discrete gpus, it's probably just not worth it because increasing overall bandwidth doesn't really add much complexity - it's still one interface, just faster (of course this still increases i/o and stuff). I just think the balance shifts quite a bit when you have a APU.
I don't know how much performance you can really gain with L3, but I find the sandy bridge results with 1 memory channel (also in that techreport article) quite amazing on that front, it only loses about 20% of the performance for half the memory bandwidth. Sure part of that is because the GPU isn't all that fast compared to Llano (hence it needs less memory bandwidth), but still I think part of that is the usage of L3 cache for the GPU. I don't have any proof for that though (some comparisons with Arrendale could be interesting maybe, unfortunately you can't switch off the L3 cache AFAIK...).
Of course, UMA is the future.. Given Llano's results, I think a high-performance Sideport could be a good temporary option, untill DDR4 is ready for market.
That would be a quite a long standing temporary solution, since ddr4 isn't predicted before 2014 (and really 2015 for volume) according to latest report. I don't think it would help all that much anyway since by then surely the gpus will be a lot faster too (assuming ddr4 is twice as fast, certainly gpus will be faster by more than that in 2015).