Extremely efficient? Not in the least when compared to RV770. It seems like you have an opinion and are trying to convince yourself of its veracity
Efficiency is not that bad if we consider CPU limitation in the benchmarks.
Crysis Warhead, for example, was 27% faster on an HD5870 than on a GTX285 in 1920 with 4x AA in "gamer" quality with 8.66 RC6, but with "enthusiast" quality at the same resolution the lead increased to 38%, same trend for 2560 with a 33% lead in "gamer" quality.
Many games show this, and I strongly doubt NV could show way higher numbers. In fact, many games seem to be so limited that we see better performance on HD5770 CF than on one HD5870, by hiding latency via AFR while in fact games are less playable.
There are still some issues with Cypress scaling, for example highly tessellated scenes in Heaven not rendering much faster than on Juniper at stock clocks (only about 40% faster despite almost everything is doubled, pointing to triangle setup limitation) and not that much slower at half clocks (only ~40% slower but already ~10% slower without heavy tessellation, whereas half-clocked Juniper shows perfect scaling, nullifying the first assumption), but there are opposites too, STALKER CoP for example, shows nearly perfect scaling.
So, Cypress seems to be limited internally, perhaps by its shared memory which could have the same bandwidth as Juniper, but we have proofs this can be as significant as Fermi's high DP throughput and we'll have to wait to see if this has any effect on DX11 games, for which the only hardware currently available to experiment on is AMD's. By the way, does Fermi have a way higher throughput for this shared memory? As far as I know, its accumulated L1+shared throughput is at most 50% higher.