I have a hard time believing they all could have been that naive. There are, or at least were, many engineers doing in-depth analysis of the workloads BD would face. The case for such a regression is even weaker since the design is badly delayed and would have come out facing even more serial workloads if it were on time. It should have been out on the 45nm node.I sincerely doubt they couldn't. I think they simply expected software to easily scale to many threads by now and not suffer from Amdahl's Law. In such a world Bulldozer would have made a lot of sense, if it was well executed.
A wide OoO x86 running in the 3-4 GHz range is a significant undertaking to convert to SMT.
It took Intel quite some time to get it right, and lets note that it had to do this twice, possibly three times if we consider that SB probably heavily rearchitected key parts of the execution engine significantly compared to Nehalem.
Aside from Intel, the other significant high-performance wide OoO SMT design is POWER.
AMD may have decided that CMT gave them the best return on their engineering buck, possibly because of the difficulty in validating the design and making it run optimally while staying within power and manufacturing constraints.
Intel has managed all of this with far more resources, very good engineering, and far superior manufacturing.
IBM has managed this with a massive subsidy from its system and software side, some nice engineering, control over its hardware and software stack, and very relaxed power and yield requirements thanks to the previously mentioned subsidy and control over the platform.
In AMD's case, it needs to match the following without the money, engineering, or process. Faced with uncertainty about doing a wide OoO SMT design with the resources on hand, it may have hoped it could min
I think they knew how hard it would be. The chip that they produced did not clock high enough and there are niggling issues with its memory performance throughout the hierarchy. They made the TLP situation worse with their evolutionary take on the uncore, which is still unimpressive when it comes to inter-core communication and still lags in memory utilization.Unfortunately they miscalculated how hard it is to exploit TLP.
They went in the wrong direction more than once. BD is not the first planned successor to K8. At least one SMT design flamed out, and there may have been more than one phase of BD.That's one interesting theory. I have little doubt that somewhere along the development of Bulldozer they realized they were moving in the wrong direction, resulting in resources to be cut and valuable time lost figuring out what to do.
The design seems like it is missing something. AMD seems to have architected the cores to handle a memory pipeline that is apparently very paranoid about synchronization and write combining, prone to burdening the L/S units, but also primed for long queues of ops and high straight line speed.
Minimizing the impact of redoing a failed transaction may have fit in that philosophy.