There seems to have been an A and B version of Bulldozer, going by a demarcation of sorts in the patents.Bulldozer wasn't anything new, from time perspective. There were patent fillings for a similar multi-threading architecture in 2003 or 2004, AFAIK. I can guess Bulldozer was in-fact the "B-plan" for AMD, kicked up in a hustle some time after the K10h fiasco.
I think that was different from an even more speed-oriented design, as described on comp.arch (by Mitch Alsup, if my fuzzy memory is a guide) as a design that removed result forwarding in order to save the gate delays.
Less certain were rumors of an aborted wide architecture that was another failed attempt at replacing K8.
It still boggles me, why AMD decided to invest in a NetBurst redux after it was quite clear that the future is definitely not in the power-sucking high-speed monsters. Not that they had a choice back then.
An interesting aspect about CMT in one direction put forward by Glew was splitting things up in order to slim down the critical execution loop for a fireball-type design that could really crank up the clock, which Bulldozer does not quite reach for given its modest reduction in per-stage complexity.
It was that goal and/or the implementation of enhanced speculation that should have been in mind if CMT were selected at the beginning of the design process. Perhaps the weirdness now comes from either one or both coming to dead ends, and the design we have now is an awkward fattening of a too-skinny pipeline and/or an awkward grafting of what could be salvaged from abandoned interesting directions that AMD found wouldn't work or could not be made to work with the resources available.