K7-K8-K10-K10.5 were great evolutionary leap forward that brought a lot of nice features (x86-64, IMC, SSE(2,3), Virtualization, L3 cache) But obviously AMD saw that they need huge amount of time to rework that architecture for every "new instruction" implementation and facility improvements that they needed. Front end x86 decoder/dispatcher was simply aged and become non-scalable hot spot in this chips. It worked pretty well if you consider same thing was deployed on 350nm process and accomodate that arch as a good fellow all those years, reaching its apex above 4GHz during overclocking sessions.
Instructions in and of themselves are comparatively cheap to implement, as long as they don't wildly change the behavior of the processor. A few extra media instructions are not what AMD was spending all its time trying to implement.
The more fundamental problem is that K7 was architected to match the properties of silicon at nodes that existed a decade ago. The assumptions and tradeoffs of that time do not match the realities of now. Wire scaling has dropped severely, voltage scaling is severely constrained, variability is one of the greatest threats to manufacturability, and the engineering effort to get anything working at a new node is much higher.
AMD showed the massive disparity in switching power and clock gating between an old design and BD. At some point, the decisions and legacy of a reused pipeline become prohibitive, and it could be argued AMD hit that limit several nodes ago.
Llano was a design that was supposed to be a power-efficient version of Phenom at 32nm.
While there were complications that muddy the waters a bit, it should be noted that this power-efficient quad-core solution had SKUs that could smack into a 100W TDP barrier without a GPU with the barest turbo hop above 3 GHz.
This is turbo Llano could barely maintain for a few milliseconds.
AMD still has a problem with some kind of clock or power wall with Bulldozer, though it's 30% higher.
If they managed to release BD supporting only SSE5 (along SSE4) in 2009 and start new evolution there then they're might be in lot less trouble.
I don't find this argument compelling. AMD could never bet any kind of success on ISA changes when the only real party in the driver's seat is Intel. Once, and only once, did that change.
That, and SSE5 is far uglier and in many or most aspects inferior to AVX.
Why? Because AVX was Intel's extension, and Intel is allowed to make more fundamental changes to semantics because whatever Intel says is the direction of x86 goes. SSE5 is uglier and hackier because it had to go by already established operand and instruction behaviors Intel decided to remove.
Post PC?!? What is that supposed to mean?
The direction a huge portion of consumer spending is going towards is consumption devices that are portals to vertically integrated hardware and media platforms.
The money isn't made on the hardware, but on everything else on having the consumer dependent on the consumption device.
The PC used to be this portal, but it is comparatively clunky and difficult to monetize because it isn't controlled. Components cost more because they have to have their own profit margins, and the machines they go into can connect out and buy products from just about anybody.
The thing is that most users are consumers, and for their use model almost everything about a PC is extraneous and extra expensive. That doesn't mean other uses the PC is still needed for such as content creation or business suddenly stop being relevant, but a lot of that money that went to PCs because no other devices offered quite the same consumption options no longer goes there.
The question is whether AMD is struggling to remain afloat and somehow relevant long enough for some large conglomerate to make it part of a vertically integrated empire.
The threat to Intel's x86-based dominance in so many consumer-facing fields in future computing is actually business-related, not technical. This harkens back to how x86 won out over many demonstrably superior solutions in its history. Can even Intel maintain the treadmill when the growth curve in selling components of media portals doesn't provide the kind of war chest that owning the consumers' viewing and consumption portals does? There's still the massive growth in server-side computing it can rely on, so it's not like Intel can't benefit from this change, unless other vertically integrated media device/service players decide to become more vertical.
It's also a question I think any silicon design or manufacturing company must ask itself, since two big players in this future don't mind pulling the silicon portion in-house in the pursuit of more influence in monetizing consumers.
GK110 (or whatever is called) might outperform HD7970 in SP matters but when it came to DP it wouldn't surpass Tahiti they migh hold a tie or it could actually underperform that a 1-1.5yr old design.
Which variation on GK110 or Tahiti are you using?
If not the GHz edition, at least some of the clock/SMX combinations for GK110 have it winning DP and losing SP.