There is an either-or fallacy going on in these threads. With respect to 3dfx, they went with SLI in lieu of using bleeding edge memory, process, and bus width.
I don't think anyone is arguing that 2 mid-range cards in SLI are more efficient than 1 high range card. But, given that NVidia and ATI are using the same memory, similar processes, similar bus widths, and similar # of pipelines, once these factors are maxed out, the only way to scale further is go SLI.
Once ATI and Nvidia hit the limits of the current memory available, and current transistor densities, how else do you scale performance? 512-bit bus? I don't think so. Tweak the IMR further? The IMR and shaders are already fairly efficient. There's no much left to do except for hax (drop filtering adaptively, etc)
If you're not satisifed with the highest end card each IHV offers, than SLI is the only alternative for more GPU performance.