It's hard to know exactly what they exactly mean by this. I'm going for a combo of some real transistor optimizations as well as using that terms for low level clock gating optimizations.
When you look at real transistor optimizations, there's not that much you can do unless you're willing to move major parts from a standard cell based flow to full custom. I don't think that's a very likely option. What's left then are minor improvements that can be deployed widely. Think RAM building blocks that are used in generators. Or a few custom standard cells the expand the default library for some specific cases that could be unusually common in a particular design. In any case, these are the kind of optimizations that will gain you a few percentage points in improvement, but since low level power optimization is a long slog for small gains, they are in the same league as low level clock gating. You just need to find enough of those cases.
But all of that pales in comparison to what you can do with architectural stuff. Low level optimizations is a way to try to do things that have to be done a bit more efficiently, architectural stuff is about not doing stuff at all, or in a completely different way. Low level optimizations have been used extensively for over a decade now, so whatever was done additionally for Maxwell wouldn't be low hanging fruit anymore. No chance of huge gains.
Finally: your suggestions that they might be exploiting some process improvements along the way. How would that work?
Now the real question is just how much of that can be attributed to the clock speed increase from Kepler to Maxwell?
Clock speed increases on the same process are pretty much guaranteed to be architectural. How could they not be?
Other questions might be as FreneticPony suggested - Will FinFet's top-out at similar frequencies to these anyway? There are for sure reasons to believe that Nvidia may not increase on Maxwell's clock speeds, or not by much. 1.2GHz is the magic number I believe.
Yeah, I don't buy Mr. Pony's theories at all.
He basing it off a dimensionless chart that says "chart for illustrative purposes only".
Nvidia didn't have a power problem in 28nm, and 16nm will be much better no matter what. There is no justification whatsoever to not search the high speed/relatively lower perf/W corner for their next designs.
Obviously Nvidia can push the clock speeds higher but after a certain point the gains won't be worth the extra power.
If AMD is willing to trade away perf for reduced power, they'll lose (don't worry, they will not do that) and Nvidia will simply play the absolute performance card. And rightfully so. Maxwell was great was not because perf/W was excellent, it was because perf/W and absolute performance were both excellent.
The extra gains they got from Maxwell due to knowledge of the 28nm node may not carry foward to 16FF.
That's the biggest issue: your whole argument rests on this broken premise.
What exactly is that secret magic? Why are you choosing something mysterious when there are logical explanations: major architectural changes that are well known and visible for anybody who's willing to look for it?