I'm still baffled though that they didn't do proofs of concept for something this complicated.
A similar question arises with GDDR5, in theory. Though, in practice, NVidia's first GDDR5 was supposedly planned to be ready for the end of 2008 on its first 40nm chips.
But with those chips ending-up rather late, NVidia didn't get much time to sort out GDDR5.
I don't know how similar or different the GDDR5 issues are in comparison with the fabric issues. Fabric is very wide (as well as being many<->many), whereas GDDR5 channels are fairly well constrained and point-to-point on the ultra-high-speed side - though clocks are very high there.
I had assumed all designs went through some sort of physical prototyping phase. Is it all just based on simulations and claims made by TSMC?
I interpret the fabric in GF100 as an increased-complexity version of what's seen in GT200. GT200 doesn't require clusters to talk to each other, but there is a wide crossbar between the clusters and the ROPs/MCs. GF100 requires triangle data exchange amongst the GPCs, which adds a new dimension of complexity. Not sure what other data is inter-GPC. (The ring bus in ATI was a solution to this kind of everyone speaks to everyone problem...)
Conceptually it is "just wires" as he says. Obviously the tricky bit is the physical environment. And everyone who's using 40nm at TSMC has struggled with the mismatch between the specification and reality.
But as far as I can tell everyone expects there to be a mismatch, for any process. And for it to be worse when it's new.
So then you get into questions of managing the transition to a new process. The ATI guys seem to have a better handle on that, but R600 and 110nm are their lows. And 90nm was some kind of third party problem.