I was wondering about something close are consoles manufacturers come back from the "General Purpose everything" the "one side fits it all' and move to more heterogeneous hardware.
I think old the old running argument Nick, from Softshader, used to have others members about software rendering, I also think of the POV Tim Sweeney was defending on the matter years ago. There is one point Nick was right about that is that unification of the graphic pipeline did not come for free even the cost was well hidden by the steady process improvement back in time.
* I would state the same thing with regard to Intel CPU, Intel pushes some really wide SIMD units and it got a huge impact on their designs, those things are huge. it may make sense for Xeon class type of CPU as you can only add so many cores but for the public offering... it is balls, look at the Core i3 I wonder who would not trade those 2 cores with massive AVX2 SIMD units for a three core design with lesser SIMD.
* Another example of things gone too far is GPGPU, it is nice but we are actually finding out that FP32 for example is not really mandatory for pixels calculation (iirc some Sebbbi's and others members posts). I may not cost much, but still. Manufacturers are aware of the change in the technological environment years ago they started to push DP64 on their design, as Moore's law was starting to show its age and power concerns were growing, Manufacturers like Nvidia quit providing the option on their mainline GPU. Had the Moore's law continue to provide cheaper, faster more energy efficient silicon they would have not.
* Focusing on the GPU a little longer it is interesting to notice that ARM still think it is worse it to keep it Utguard architecture around. It is also interesting to look at the Tegra 4 SoC and the performances they were delivering against their silicon footprint. I do remember that not a few months ago I was told on those forum that we would see a 14/16nm GPU this year, it was a given. Well we have to bow to evidence it did not happened. Actually the only devices that used advanced silicon (be it 20nm or 14/16nm) were mobile SOC, it is definitely coming our way but people have to face the evidence of the disruptive impact of the end of Moore's law (nb not the end of the silicon roadmap). Long story short I would bet ARM think those Mali 470 are the last renditions of their Utguard architecture yet it would not surprise me ( the odds are lower though I may not bet... or less) if they have to reconsider their pov a couple extra times.
* Back to CPUs, SMP has been the design of choices for many years, it is the easiest but it is also costly, ASM allows for more processing within the same power profile and silicon footprint. Soon may be it will no longer be seen as a power optimization but a way to the most out of a slab of silicon which price is no longer going down (slightly up actually).
* Not exactely the same argument but the breaking of Moore's law also affects storage. Stacking is nice but it will not resurrect Moore's law, and HDD size no longer augments that much. I suspect it is the same with optical media.
It might be time to consider less general purpose units that the ones usually found in PC and the last round of consoles. We may see units already at play in PC/Consoles architecture take a greater importance, it seems the me the mobile computing is pushing technology into that direction, power and cost are great driver for efficiency. Actually I expect nothing really new, no physics or dedicated compute units, those specialized units as found in mobile Soc for image processing and for compression/decompression could definitely proves useful. I wish we could see a sound processing units but the market does not seem to care so without volume...
Overall the next gen system will be asked to do more with a relatively tiny increase in available resources and I believe it is doable. I expect the next round of consoles to have both a lesser silicon and power budget and it would not surprise me if the amount of RAM does not increase.
It would interesting the image processing units doing preprocessing of texture ahead of the GPU, overall it could be interesting to give up some quality and storage (HDD, media) at the cost of extra computation done efficiently.
I read some post about the hypothetical disappearance of ROPS and whereas I sort of get the point I do not believe in it, again I believe mobile designs in some aspects of the design we see in console/PC. I wonder if it not more sense to more proper GPU Cores aka having the ROPs and a couple number of what are now call shader cores way more tightly packed together and move to a more 'self reliant" building blocks. I've this feeling that should help with the design of datapath and making the most of data locality, etc. Speaking of data path it would be nice if GPU manufacturers go back to designing for FP16 instead of FP32.
It is perfectly fine to be able to process FP32 at half speed but it is different altogether than designing for FP32. Nvidia designed its GPU with warp of 32 FP32 elements (f****D up wording), AMD 64 FP32 elements. Even if you can process FP16 at twice the speed it has overhead on the design, you have twice the elements in flights which may have a little impact on many things. Speaking of Nvidia if 32 element is the good with for the SIMD, then designing for FP16 their SIMD would half as wide as they are now, the register files, in turn datapath could be half what they are, bandwidth-memory usage, etc.
People will be say "and ultimately it pushes half the FLOPS" but the things is the result on screen will have nothing to with say halving the resolution, it is a lot more subtle and if there few calculation that needs FP32 you still can do it half speed on a modern GPU is still fast. It is a personal belief from an outsiders of the industry but with the end of Moore's law manufacturers might have to reconsider what was considered "free" as it may amount in fact to a lot of silicon unnecessary for the main (by a giant extent) usage of a modern GPU 3D graphics computations.