AMD Bulldozer Core Patent Diagrams

3dilettante · Oct 10, 2011

The one part of the cache hierarchy that did not appear to change much from a high-level view is hurting BD. However, it looks like there are problems with the shared module cache architecture when it comes to write combining, so maybe it's just a constant.

The write combining problem looks like a bug since a revision is coming to fix it.
I don't know how to describe the problem with the cross-invalidation problem.
AMD stuck with a big low-associativity cache and then they ran two threads through it.
By way of comparison, SB's Icache is half the size and four times the associativity (takes three bits out of the index), which I assume is why it doesn't need something like this.

AMD must, or should have, have seen this coming. Did they think it was an acceptable trade-off? Did they think the cost would be lower? It's a few percent off overall. Going by the comments by Linus Torvalds and other about this fix for Linux, there is some cosmic irony that this server architecture slightly compromises the level of security because it keeps the affected parts of the virtual address from being randomized.

snarfbot · Oct 10, 2011

i dont understand why l1 writes are so slow then, it should be at least equal to phenom 2 right?

l2 has twice the latency, and l3 is somewhat worse than phenom in that regard as well.

it doesnt make sense.

Albuquerque · Oct 10, 2011

Linus responds to the patch submission by AMD:

Linus said:
Argh. This is a small disaster, you know that, right? Suddenly we have user-visible allocation changes depending on which CPU you are running on...

Selective quotation to build the drama

Go read his full post here...

Rootax · Oct 10, 2011

Why they let a cpu be on the market if they need such patches to work at their best ?

rpg.314 · Oct 10, 2011

OS level patches are fine.

mczak · Oct 10, 2011

3dilettante said:
I don't know how to describe the problem with the cross-invalidation problem.
AMD stuck with a big low-associativity cache and then they ran two threads through it.
By way of comparison, SB's Icache is half the size and four times the associativity (takes three bits out of the index), which I assume is why it doesn't need something like this.

Yes I think that's pretty much the answer. AMD is essentially "missing 3 bits" so SNB 32KB/8-way has all bits covered. That said Nehalem/Westmere L1I was 32KB/4-way and I never heard this being a problem - probably that wasn't really a problem because "missing only 1 bit" you'd still have some associativity with 2 threads whereas it pretty much turns BD cache into direct mapped (and the faster L2 on Nehalem probably helps there too).
The L1 cache design just seems a bit too simple (carried over from previous cpus) - I think even for 1 thread case 32KB/8-way probably has higher hitrate than 64KB/2-way typically (though I'm not sure it's really cheaper to implement in hw, possibly not).
I wonder if Piledriver is still family 15 cpu - because from the comments family 15 cpus will always be like that but other future ones should not have that problem.

fellix · Oct 11, 2011

That's really sad, considering the instruction fetch in BD is now unblockable by the predictor (finally), such cheapish move to carry on an old tech was bound to fire back at some way. Looks like AMD really didn't have time to revamp the front-end for BD launch, considering that this is their first practical experience with SMT shared logic in the field. Just remember, Intel also didn't fared well with NetBurst in the beginning, with all of the weird uarch concepts -- the irony is that its main bottleneck was the front-end too!

mczak · Oct 11, 2011

fellix said:
Just remember, Intel also didn't fared well with NetBurst in the beginning, with all of the weird uarch concepts -- the irony is that its main bottleneck was the front-end too!

Did not do well with NetBurst in the beginning? Considering the complexity of the design I would say it was a disaster from start to finish... Yes some interesting concepts and some ideas "got ported over" to newer chips but considering the amount of transistors, power draw etc. it never reached good performance. The front-end has contributed to lackluster performance but the problems were different.

fellix · Oct 11, 2011

Northwood fared quite well, for its life time. With Prescott, things got again downhill, until the last iteration of the architecture under the 65nm process. The dual-core Prescotts had a short, but very explosive market invasion, considering the outrageous pricing for the dual-core Athlons from AMD at that time.

Gubbi · Oct 11, 2011

mczak said:
Did not do well with NetBurst in the beginning? Considering the complexity of the design I would say it was a disaster from start to finish...

Willamette's performance was lackluster, but Northwood was king of the hill for almost two years, only dethroned by Athlon 64.

The whole Netburst architecture was also the most profitable architecture family Intel ever made up to that point.

Cheers

Mendel · Oct 11, 2011

bulldozer review posted by lab501 posted by king-dubs at guru3d forums: http://forums.guru3d.com/showthread.php?t=352045

Albuquerque · Oct 11, 2011

Wow Mendel, that's painful.

AMD loses every benchmark,sometimes by as little as 3%, sometimes by as much as 45%. The only place AMD FX pulls ahead of the i7-2600k is it's 30% larger power consumption. Yikes.

The MSRP of the FX-8150 will put it a bit more expensive than the current i5-2500k but cheaper than the i7-2600k. But given these benchmarks, the i7-2600k is the clear winner. Sad.

swaaye · Oct 11, 2011

Wow it does look like Phenom II might beat it sometimes. Oh dear. Sort of like a new Willamette.

rpg.314 · Oct 11, 2011

AMD blew it.

At best, Piledriver will be able to recover about 10% of the deficit, which Ivy will easily gain.

Bottom line, AMD is hopeless. Look to ARM for any sense of competition.

Albuquerque · Oct 11, 2011

swaaye said:
Wow it does look like Phenom II might beat it sometimes. Oh dear. Sort of like a new Willamette.

I didn't think about that until you posted it, but you're exactly right. The initial P4 units were actually slower than the P3's they were replacing, at least until they hit around the 1.6Ghz mark. It does seem like AMD is relying on clock speed more and more, akin to Intel with their netburst foray and likely for similar reasons: if you cannot make the architecture faster, you're left with clockspeed and some seriously fancy cooling.

Hell, with BD pushing down prices on the 'old' PII chips, maybe we can find some ridiculous deals on PII processors for the ultimate price/perf rig. In a few more months when AMD rolls out the midrange GCN's, you might be able to put together an incredible gaming rig for less than $500 even including the LCD and OS license.

fellix · Oct 11, 2011

mczak · Oct 11, 2011

Gubbi said:
Willamette's performance was lackluster, but Northwood was king of the hill for almost two years, only dethroned by Athlon 64.

Oh certainly it was king of the hill because the competition was lacking. As they say among the blind the one-eyed guy is king...
But the performance was never where it should have been considering the complexity (transistor count).
A hypothetical competition in the form of a tweaked PIII with a faster bus interface (let's call it Pentium M for desktops...) had about the same performance with half the die size and power draw, but you couldn't buy it of course...

The whole Netburst architecture was also the most profitable architecture family Intel ever made up to that point.

But certainly not due to technical merit.

Albuquerque · Oct 11, 2011

mczak said:
But certainly not due to technical merit.

While you're certainly correct, it is also quite true that technical merit rarely dictates the best selling equipment

trinibwoy · Oct 12, 2011

I genuinely feel sorry for AMD and the guys who worked on Bulldozer. These reviews are painful to read. So many engineers worked for so many years on this.....they must feel terrible.

Rangers · Oct 12, 2011

What's most disappointing to me is gaming performance. It's now a non starter in the enthusiast market.

I've often though recently that AMD (or any manufacturer really, but AMD as a niche filler would be a more obvious choice given their market position) would do well to try to position itself as the gamers choice, and even design it's CPU's to excel in gaming at the expense of some other things at times. I really suspect this strategy would lead to a sales bonanza. Because really the one area consumers crave high performance is pretty much, only gaming. It's the one reason you actually want a really high performance CPU (provided you dont do some sort of specialized audio/video work), instead of just "good enough" which is fine for general purpose desktoping.

Instead they do the exact opposite with Bulldozer, facepalm. The benchmarks I saw at Anand, Bulldozer is objectively awful in gaming. Single handedly nobody who posts at any type of gaming or gaming related forum will ever buy one of these. Unbelievable.

Perhaps making it even more stinging is there was some pre-NDA lift supposed reviewer quote floating around at about how "Bulldozer will be the choice for gamers" or something like that. And naturally everybody got excited because, that's all most people care about.

Combine that with the fact it's much bigger and hotter than Intel's, it's almost a unmitigated disaster.

AMD Bulldozer Core Patent Diagrams

3dilettante

snarfbot

Albuquerque

Red-headed step child

Rootax

rpg.314

mczak

fellix

mczak

fellix

Gubbi

Mendel

Mr. Upgrade

Albuquerque

Red-headed step child

swaaye

Entirely Suboptimal

rpg.314

Albuquerque

Red-headed step child

fellix

mczak

Albuquerque

Red-headed step child

trinibwoy

Meh

Rangers