I've been wondering lately about Charlie claims on the next box which resumes mostly as it will be SoC.
People doesn't seem turn on by the idea of a 128 bits bus connecting 2GN of GDDR5 offering +60GB/s of bandwidth. I also read that people (including devs) want EDRAM again. So let assume Charlie claims is true and that the SoC is produce using 32nm an IBM/GF process.
I believe that 8 power A2 cores, some L2 and a low clocked bart like part is the best case scenario.
So such a SoC with 2GB of RAM would be competitive from an economic POV (cost) but adding the mobo, flash memory, various chips, HDD (depending on sku) peripheral as pad, Kinect, etc. and say a 499$ high end SKU is likely to loose money. No matter the inflation I don't think they can go higher in price.
I don't believe that the cost of adding EDRAM somewhere in the design will be offset by switching to DDR3 ram, but let assume that they are willing to loss a bit more to extend the system potential and potentially (that's not automatic) the system life time. How to do the most of the investment in EDRAM?
Honestly I don't believe that a "360 like" implementation will cut it, you have neat benefits but you will lost a lot of the perf when you resolve/copy your render target in the main RAM. Most likely the amount of data were speaking about will double, using DDR3 the bandwidth to main ram won't, and it implies that the link between the SoC and the EDRAM will have to be faster. Either way you allow the SoC to read directly from EDRAM, I assume this will cost both silicon and even more bandwidth till it's looks like the most efficient and convenient option to me. So Edram would mostly be a limited amount a really fast VRAM (which it is not now it doesn't act as VRAM does). If you have efficiency in mind you may want to enough EDRAM to fit various render targets, your frame buffer, a g-buffer, etc.
You may also want to send the framebuffer to RAMDAC form the EDRAM (why copy to main RAM and lose perfs on your investment in EDRAM.
Overall EDRAM can be an option, but if one manufacturer want to make the most of its investment it better invest a significant amount of its silicon on this peace of hardware. So they the SoC I describe above is +/- 300mm², the manufacturer remove the RBEs (move to EDRAM), still they add a fast link the EDRAM chip, overall they would have to down grade the chip imho, to give picture move from 8 cores 12 SIMD array to 6 cores and 8 SIMD array possibly up the clocks. Still to have a convenient amount of "really smart" EDRAM they will need to invest in a big peace of silicon (I say big as +200mm²) and other costs as the communication link between the chip possibly some cooling for the second Chip. A second chip as well as the bus connecting it to the SoC will have an impact on the mobo lay out. If I go back to my initial system this will be costlier, even with a "360 like" implementation it will be costlier than the first offering, I believe significantly.
Honestly I thought a bit about it as it looks like a really "most wanted feature by some" but assuming that deferred renderer are catching up as well as rendering at 1080p the use of more and more render targets, I can't see the investment in EDRAM being a tiny one. I reach the conclusion (many may disagree with) that the cheapest way to solve the bandwidth problem is a wider bus. It clearly has a cost but it's a known quantity and have benefits on many account from a software pov as well as mobo lay out one. I hear the wolves howling but bus size have been increasing with every generation of hardware, we're in an age where the size of the data involved in rendering is starting to discarding the EDRAM as an option.
Overall keeping in mind cost, I would favor a 2GB of cheap GDDR5 on a 256 bits wide bus to any options including more RAM but offering lesser bandwidth or including some form of edram, I'm not sure the trade off are worse it. 256 bits wide bus sounds like an absolute taboo here but looks at the system as a whole (even including software), I'm not sure the taboo is legit. It's the simplest and most elegant solution imho. It has a "fixed cost" the same is true for the connection between the hypothetical SoC and the hypothetical "smart EDRAM" chip. The Smart EDRAM may end needing some cooling even passive, etc. Then you have the cost to supporting the platform as a software platform, the cost for editor.
I start to really question this "256 bits wide bus is not an option" mantra, it could actually be a "win-win" situation.
And those that are scared about shrinking the SoC and no longer being able to fit a 256bits bus well I believe it won't happen anytime soon, assuming a chip north of 300mm² @ 32nm, would still be beefy enough @22nm, as for 16nm well I wonder when will IBM/GF get there and more importantly when it will become more interesting from an economic POV vs than a well "worn" 22nm one. Imho next gen system may see only one shrink, 16nm should be when IBM/GF catchup with Intel in regard to trigate transistors, once this one will mature that could be a perfect time to launch a new system (I feel like it will be a long while). By the way I also is argument "because this gen last X years, next gen will have to last x years" is let say a stretch. It depends on many thinks, the same many things that made this gen long but those many things are in no way constants.
EDIT
Actually a system enjoying +115GB/s worse of bandwidth and assuming way more bandwidth efficient RBE actually 360 emulation might become possible.