Tegra2 went from what I recall into mass production in the same quarter as GF100 (Q1 2010), for reasons obviously only NVIDIA knows. Ironically it had its tape out according to rumors in late 2008, so either NV needed a couple of spins or yields or capacities were still crappy in late 2009 or eventually both. Either way NVIDIA went into production with a 49mm2 SoC and a ~530mm2 high end GPU chip in the same quarter.
NVIDIA definitely had Tegra2 samples back at MWC09 (although they didn't have them at the show) so they definitely taped-out in late 2008. However they only started sampling to tablet manufacturers in July 2009 (and even later for smartphones) so I would expect at least one respin was necessary before they could sample it even to lead partners (unlike Kal-El).
Also you can definitely expect a chip taped-out so early in the process to be using suboptimal Design for Manufacturing rules (they improve over time) so I'd expect yields even today to be slightly lower than for a more recent 40nm chip. I think what delayed them more than anything is Android's maturity for tablets though - which turned out pretty good in the end when they became the exclusive initial provider for Honeycomb.
Texas Instruments on the other hand sounds like they'll manufacture OMAP5 on 28HP but obviously primarily at UMC; according to their own claims things are going well so far, but you never know either.
OMAP5 is 28LP at UMC and very likely dual-sourcing at GlobalFoundries. That's SiON rather than High-K, however remember Cortex-A15 clocks significantly higher than Cortex-A9 on the same process, and it's possible they are using something like 28LPG (ala Tegra's 40LPG) rather than only 28LP.
For NV's GPU hotclocked ALUs I wouldn't suggest that it doesn't come also at an area cost; but compared to having only core clocks and more ALUs instead, there should definitely be a gain overall otherwise they'd be foolish to opt for it.
Unlike on the desktop where the area cost should be obvious, in the handheld world they could presumably simply use significantly higher leakage (->higher performance) transistors (like they already need for the CPU). So the area cost would be smaller but the static power cost would be higher. I'd expect that to be a good trade-off but it's hard to be certain.
Well my own reasoning for HPL being a better candidate is that if NV manages to get under 40nm a quad A9 to 1.5GHz, 2.0 GHz for the same CPU config doesn't sound like much of a problem hypotheticall under HPL.
Hmm, I'm a bit skeptical. The lack of a SiGe strain on 28HPL is a pretty big deal and remember the CPU is really using 40G Standard Vt transistors which are nothing to sneeze at. I'd expect 28HPL Low Vt transistors to be competitive with that, but 33% faster might be pushing it. We'll see.
Also 28HPL isn't ready before 28HP on paper although maybe yields will be good enough faster. And 28HPM is a 'jack of both trades' between 28HP and 28HPL, it's probably the ideal process for something like Tegra, but since metafor said the models aren't even ready yet for it, that would mean that either T4 will easily be one of the first chips on the process (not impossible given T2 was easily one of the first on 40LP) or that it'll be released slightly later than we expect. Or a combination of the two!
unless I'm again terribly wrong I don't expect them for Tegra before 20nm.
I'd be very surprised if Logan (late 28nm generation and first Tegra with Cortex-A15) didn't have the next-generation GPU architecture but who knows.