AMD Bulldozer Core Patent Diagrams


The one thing that is for-sure here is that every hardware review website rushed to be the first to publish an AMD FX-8150 review, they all used the same generic benchmarks and NONE did any real world computing. The game is fixed, the big-dog spreads around the most ad-dollars.
Who is the "big-dog" supposed to be? Intel? I think they're too busy laughing to be fixing much of anything right now. :oops:
 
New B3 revision listed:

revb31.gif


Source
 
I wonder what difference this will make. Any clues?
Since the revision guide only mentions B2 step (not older, not newer) and all bugs there are tagged with "No Fix planned" anyway it's hard to tell but I'd guess nothing earth-shattering. Maybe some slightly optimized design here and there to increase possible frequency at the same voltage a bit?
 
Performance is terribly depressing.

Why such slow L3/northbridge??? Its big but not that big & I'd expected the 32nm to allow faster cache plus expected they'd have tweaked it for better performance with the different core architecture & all the years since they launched Phenom I.

I saw reference to there being something like 900million transistors 'missing' somewhere in the uncore/northbridge. Its a huge number & they don't even have an onboard PCIE controller like Intel has.

Main core clocks are far below my expectation.

I can't understand the poor per-clock performance.
My understanding was that Bobcat cores were performing well per-clock on a similar architecture, which should have meant good things for Bulldozer.

Perhaps they could just stick 8 Bobcat cores on a die :p
 
They could try to make 2 module desktop chip without the L3 and giant uncore. 2 modules with 2 MB L2 cache vertically aligned would make just 2*30.9 mm². Thats just 62 mm² + IO and memory controller under the L2 cache. Improve cache (probably just leaving out the slow L3 and uncore would help a lot with latency, L1 associativity).

And with 95W TDP they could bump up base clocks to 5 GHz with that tiny die size. Which in turn would increase cache bandwith too and help a lot with single threaded performance.

The fact is 30.9 mm² module with 2MB L2 cache looks good, while the 2 bilion transistor serverdozer not.
 
The fact is 30.9 mm² module with 2MB L2 cache looks good, while the 2 bilion transistor serverdozer not.
It would look ok but not really good. Without any L2 cache a module appears to be slightly larger than a SNB core. Now it's hard to judge performance without considering uncore and even L2, but on a per-area basis it's difficult to imagine it would be more efficient than a SNB core. Still, the size would be manageable (though Llano's Husky core is only half the size again without L2, so there doesn't seem to be that much savings from a CMT module all things considered).

I saw reference to there being something like 900million transistors 'missing' somewhere in the uncore/northbridge. Its a huge number & they don't even have an onboard PCIE controller like Intel has.
There are not 900 million transistors missing. However, AMD is telling us a module is just 215 million transistors, which would make everything else 1.1 billion transistors if the chip has 2 billion transistors. L3 cache is already ~400 million transistors, which leaves 700 million for HT links, MC, etc. So while not 900 million transistors are missing, that number definitely looks way too large. Maybe the transistors are counted differently for the modules but it still looks like an awful lot.
Also, saying it doesn't even have onboard PCIE is a bit unfair. Even just one HT link will use about the same die area, and this thing has 4 of them, 3 of them unused in desktops. After all Westmere-EP doesn't have PCIE neither.
(Of course there's no IGP neither, and that's a fair chunk of die size and transistors of SNB - but this die would have space for an IGP if you'd leave out the unneeded HT links and could use the unused areas - I'd bet Trinity will make far more efficient use of the available die area for desktop use.)
 
I think AMD uses 8T SRAM cells for all major memory arrays in BD -- they already do for Llano's L1 caches at least. Factoring in the parity/ECC bits, the L3 cache alone should be ~600M transistors and that's without considering the bunch of SRAM tags. There's hardly any transistors "missing" in there.
 
It would look ok but not really good. Without any L2 cache a module appears to be slightly larger than a SNB core.

Ok, but if they targeted high frequency than it could change things. They pay for waffers, so a single module at 5 GHz (and if the shared cores would reach +50% performance) could reach first class performance/die-area numbers. Even if just a single module would eat up half of the TDP budget on 5 GHz, in the end they could fit more of them in a single waffer.

For AMD same performance on a smaller are would be crucial these days. They sold much bigger chips for less than intel now for several years.
 
Even just one HT link will use about the same die area, and this thing has 4 of them
Thuban/Istanbul has 4 HT links too & is only 900m transistors with 9MB of cache.

I think AMD uses 8T SRAM cells for all major memory arrays in BD
Thuban uses 6T? That would certainly make up some of the gap.
 
There are not 900 million transistors missing. However, AMD is telling us a module is just 215 million transistors, which would make everything else 1.1 billion transistors if the chip has 2 billion transistors. L3 cache is already ~400 million transistors, which leaves 700 million for HT links, MC, etc. So while not 900 million transistors are missing, that number definitely looks way too large.

Is it certain there isn't a problem similar to the Sandy Bridge 995M/1.16B mixup?
Depending on when the gate count is made, the totals can be different.
There was a margin of error of 165M transistors for SB, which is a chip close to 1/2 the transistor count of BD.

330M could be taken off if a proportionate mixup occurred relative to what happened with SB. Given that this is a marketing number, 100M either way could have been rounded in. There goes over half of the supposed disparity.
A less optimized circuit implementation may have an even larger inflation than SB.

Then we have the remainder for the expanded uncore and connectivity features.
 
I think AMD uses 8T SRAM cells for all major memory arrays in BD -- they already do for Llano's L1 caches at least. Factoring in the parity/ECC bits, the L3 cache alone should be ~600M transistors and that's without considering the bunch of SRAM tags. There's hardly any transistors "missing" in there.
You've got any source for the 8T sram for L2/L3? That's the first I've heard of it (haven't even seen rumors hinting about that). Last time I checked, AMD was using plain-jane 6T sram cells with no particular advantage over intel's one, except they were 30% larger...

Ok, but if they targeted high frequency than it could change things. They pay for waffers, so a single module at 5 GHz (and if the shared cores would reach +50% performance) could reach first class performance/die-area numbers. Even if just a single module would eat up half of the TDP budget on 5 GHz, in the end they could fit more of them in a single waffer.
Oh yes if the design target really is higher than for SNB then the area wouldn't be that big. Though assuming design target for SNB was ~4 Ghz it would really need to be like 5 Ghz for BD to look good. That is possible but I wouldn't take it for granted.

Is it certain there isn't a problem similar to the Sandy Bridge 995M/1.16B mixup?
Depending on when the gate count is made, the totals can be different.
There was a margin of error of 165M transistors for SB, which is a chip close to 1/2 the transistor count of BD.
That's possible indeed. AMD just said 900 million for Thuban and 2 billion for BD but they could have counted them differently (as well as have counted the BD modules the other way around).
Then we have the remainder for the expanded uncore and connectivity features.
Well uncore and connectivity remains largely the same as Thuban (granted I'm sure there's a bit more transistors there - improved HT frequency, larger SRQ etc. won't be quite free but for instance it's still the same number of HT links so apart from the larger cache I just don't see where the big increase would come from). Yet Thuban had 900 million transistors in total whereas if the transistors were counted the same BD would have more than that for uncore alone...
 
Last edited by a moderator:
The general themes seem consistent with an AMD that is trying to build an architecture competitive with Intel, but with more severe constraints in resources and process technology. Potentially, the company's organization and leadership are also inferior.

The passages lionizing Dirk Meyer I could do without, especially since BD is an architecture that was very much a part of his tenure at AMD. At best, the article could congratulate Dirk on owning up to a screwup he had a huge hand in bringing about instead of humiliating himself further.

There's some of ranting about how Dirk's honesty about screwing the pooch caused him to be punished by the financial community, as opposed to them rewarding him for failing to compete or something.

I'm not sure about Charlie's understanding of the cache hierarchy of BD. The text becomes increasingly muddled at the end, where he starts having problems distinguishing between the front end and the Icache path and the subdivided data cache path. He does not justify why spliting the L2 cache would massively reduce latency for this design.

I do not think the quality of Charlie's sources at AMD has improved, or it has, and the quality of AMD is what has gone down.

A lot of the article and its sequel is supposition with little in-depth analysis, and honestly I think this thread offers better insight in total, and definitely per word expended, and I do not claim that this thread has any great epiphanies in it.

I wonder how much of that article is compensating for Charlie's hinting about secret improvements that would surprise all the doubters in the leadup to the release. If they surprised anyone, they did so in the wrong direction.
 
Back
Top