Jensen mentioned nvidia's focus on clockspeeds with Pascal and AMD seem to have followed suit with Vega but are still woefully short.
I believe he stated that they were somewhat surprised how effective their optimizations were, which can happen. Sometimes a good prediction can come up with higher dividends than normal.
Not really, more like 'why is that Rottweiler eating so much if it can't run as fast'.
I'm more out of my depth with canine physiology than I am with car analogies, but from the following it seems like Greyhound breeding has made them very optimized for what they do.
https://en.wikipedia.org/wiki/Greyhound
"The key to the speed of a Greyhound can be found in its light but muscular build, large heart, highest percentage of fast-twitch muscle of any breed, double suspension gallop, and extreme flexibility of its spine. "Double suspension rotary gallop" describes the fastest running gait of the Greyhound in which all four feet are free from the ground in two phases, contracted and extended, during each full stride."
Additionally, there are swaths of body tissue that are minimized, such as body fat, undercoat, red blood cell count, and liver capacity. Racing emphasizes their prey chasing instincts, apparently to the point that it can seriously endanger them in environments with car traffic.
From
https://en.wikipedia.org/wiki/Rottweiler, we see gait, physiology, and temperament that will not motivate them to the extremes of a Greyhound or will physically work against them at the operating points of forcing a canine body to 70 km/h.
They would lose out in acceleration and energy loss per stride, and devote resources to maintaining bulk and strength for purposes other than running. A thin-skinned 70 km/h lightweight with lesser clotting capacity and a tendency to bolt doesn't provide utility for herding or holding its own in a drag-out fight.
To make the analogy fit GPU design more closely, it would probably be more like AMD and Nvidia had the chance to design a dog, and they had to guess in more general terms what sorts of muscle fibers they'd have on hand or how far they could push the spine and legs, then compare it to how many races versus fights the dog would be in.
Then there's a question of how often they could iterate on their guesses, and what unexpected stumbling blocks each would hit and the timing of them.
As for it using more diespace, I think nvidia had to go that route as well, they didn't mention transistor figures however.
Circuits face a trade-off between transistor performance and wire delay, with the latter becoming significantly more limiting with every node. There are a number of choices that increase the capability of the transistor portion to drive signals down wires by sacrificing area and power, such as repeaters or extra pipeline stages with their corresponding latches. Since wire delay is quadratic with the distance traveled, cutting a path in half can significantly boost signal propagation at the cost of the switching delay of the transistors involved--which have significantly outpaced wire delay.
If the overall mix of wire versus transistor is mostly locked-in, then driving voltages up will provide more drive strength to the transistors--to a point. At some point, the transistors' ability to carry current saturates or the limits of their connections to their wires is reached, and the quadratic cost of greater voltage.
In the nodes since 28nm, the foundries became more conservative with wire scaling than they have with transistors, and the emphasis on density can lead to worse wire delay because thin wires have higher resistance. GF's 7nm presentation shows that its upcoming FinFETs have significantly more refined transistors, with better materials and taller fins. The high-performance variant of the process explicitly goes for more metal layers and less dense cells+wires versus the single density-optimized option with relatively stubby fins available for GF's 14nm.
Why AMD are behind in clockspeed? Is it architecture as in shaders/TMU/ROPs/schedulers layout or the transistors themselves?
It's more of a holistic question at this point. Architectures are designed with certain predictions about their workloads and the physical and electrical realities they will face at the time of manufacturing.
Decisions such as the overall balance of per-cycle work and working transistors versus on-die connectivity will be decided long before the chips are taped out and need to face the realities of their manufacturing.
I think it's impossible for us to tell.
There is something of a hint when AMD started talking about wire delay in Vega when the concept didn't come up before.