Not only that, if what Fudzilla are reporting is true (mainstream GF100 parts in June) then Nvidia are clearly a lot further along with their fab process than Charlie likes to make out.
Compared to Cypress, that would be 9 months late. Compared to Juniper, still 8 months late.
Besides, even if GF100 exceeded expectations and should end up being on average ~50% faster than the 5870 and ~equal to 5970 (which, based on what was leaked/shown so far, is rather optimistic), that would still mean they needed ~550mm² (let's just assume the number is very close to reality for the time being) @ 40nm to accomplish that.
And here's the catch, and Nvidias real problem:
Half a GF100 @40nm would then be ~275mm² if you could cut everthing into half, but you can't (and for ROPs/mem interface it also doesn't make sense, see * ). Several things, like the Gigathread engine, Display output and Video Processor can't be downscaled, obviously. Maybe you can cut down the L2-cache to 256KB instead of 384, but I'm not sure if that even makes sense. So at a minimum, a 256 CudaCores/32 TextureUnits/32* ROPs/256*-bit derivative would be close to 300mm² (probably bigger), and therefore very close to Cypress (or even the same/bigger, who knows?). Do you think with those specs it will be able to compete with Cypress? I highly doubt it, not even with full 256SP/32TMU and higher clockspeeds compared to GF100.
Sure, the fastest part -might- be able to compete with a 5830/5850, but after 9 months of reaping nice margins, AMD can easily afford to lower prices into regions that make this a low-margin part from the beginning.
*24 ROP/192-bit is unlikely IMO, 768 MB is bad for marketing and hi-res performance, 1.5 GB over 1 GB negates the cost advantage of the chip itself, and of course performance would be lower too (less pixel fillrate, less memory bandwidth)
And that problem of course trickles down to mainstream/low-end parts as well. If the above chip is ~300mm², a 128SP/16TMU/16ROP/128bit-part would likely end up somewhere around ~160-170mm². That would be approx. the size of Juniper, for a part that - based on specs - would maybe land in the ballpark of 4850/GTS250/5750, performance-wise. And I think that's again a rather optimistic assumption.
To cut a long story short, it would be better (or less bad...) for Nvidia to do what AMD did with RV610/630 and go for 28nm right away. @40nm, the combination of being
much later to market and most likely losing on perf. per mm²/watt would just make these parts obsolete from day one. The performance mainstream-part @40nm -might- still happen simply because they're in desperate need for a GTX260/275 replacement in the 150-250$ segment, but mainstream- and low-end DX11-parts from Nvidia @40nm? Would be highly surprised to see that.