And the most sensible thing to fill that dark silicon is cache.
Aren't caches fairly large energy consumers?
It's important to note that dark silicon will basically never mean areas of the die left literally blank. It just means you have to design your system so that not all of it can be switching at the same time. Pretty much the archetypical not-often-switching large structure is a block of cache.
I try to familiarize myself with the creative ideas that have been brought forward regarding dark silicon. This one's an interesting overview:
A landscape of the new dark solicon design regime
Spatial and temporal switching, seems streight forward an idea. I assume there are severe placement solving problems for spatially active regions, so some local optimum may be still (say) 10% larger than a globally optimal solution, which in turn is still a 25% larger than an impossible ideal solution. 3D would allow further exploitation ofc.
What I find super interesting is the suggestion of more FF (or C-cores) to fill up the space, much easier to layout spatially, temporal exclusivity is almost a given, as GPUs are until now not truely super-scalar, and spatially these regions are neighbours as they share the data-paths.
Because you mentioned caches, and I tried to understand the energy-profile of it (didn't really find anything besides 6% D-cache and 21% I-cache energy contribution to instruction execution, which sounds a lot, but I guess the alternative is much worst), there are these switchable cache-configs:
Switchable cache: Utilising dark silicon for application specific cache optimisations
I only read the abstract, but I find this tempting for a GP-GPU, as the different workloads und utilization types certainly have different characteristics in regards to data-access (especially the difference between a BVH-data request and a swizzled texture-data access).
But then, intuitively I believe all the proposals to describe too risky, too complex, ideas/solutions. I personally believe the answer is a very simple one.