Even with improved load/store and other tweaks, I have a hard time to believe they can make up for the loss of the third pipe - might be close though.
Certainly however, multithreaded performance should be much better than K8 (without making the chip too big, multithreaded performance per transistor (or per die area) should be higher hence more cores at the same die size (or in the same power envelope)).
Intel's architecure also only has two issue ports to ALUs and I'm sure we can agree it does quite well.
Bulldozer brings AMD on par with Intel, microarchitecture-wise. The question now is, how fast does it clock.
Cheers