AMD Bulldozer Core Patent Diagrams

3dilettante you really seem to know your stuff but it seems your coming from a point of view of looking for reasons that BD wont be able to compete. How about putting the glass is 1/2 full cap on and thinking from that perspective of what information hasn't been realesed yet what do you think BD would have to do to equal or excessed SB performace.
The half-full is the multithreaded server situation, BD's primary target.

The reasons why BD could have problems competing in the client space are public. For whatever reason, AMD has decided clients need not know any details on why it will win against SB, much less a 22nm shrink to Ivy Bridge that will likely be the actual competitor.

Clocks--the most critical element in a design that targets higher clock speeds at the expense of execution width and latency, both of which are inferior to the competition that will likely be replaced by the time it does reach the client space. The disclosed clocks for the competition are very good.

Die size: while not something users care about directly, it will go a long way in vindicating AMD's strategy. Core die size efficiency will help, though cores are but one component of the overall chip. It will be interesting to see how it compares to SB, and then in the client space to Ivy Bridge, which will be a full node ahead.
That aside, SB is ~225 mm2. Zambezi, the only BD we're seeing at all in the non-server market in 2011 is very likely to be larger than Westmere at ~248.

FP throughput: the FPU's issue capability is closer to that of a single core, and it has read capability no better than SB in the best case. The best-case is 128-bit SSE in a Zambezi 8-core versus an SB 4-core. Hopefully the FMAC units can be split to offer separate ADD and MUL capability.

Cache subsystem: The disclosed latencies put a ceiling on what can be done by the undisclosed parts of the L3 and uncore. The L2's capacity is the good part, its long latency is not. No matter how awesome the L3, its contribution is additive to the L2. The tiny L1 Dcache is on a per-cycle basis measurably worse than what preceded it, and its fallback to the L2 is worse on that basis. It comes back to clocks.
BD has been touted as being effective for server loads, with L3 and uncore optimizations for multisocket and high-bandwidth situations. The desktop market does not prioritize these.

John Fruehe has stressed severial times there is some stuff in the BD design that has not been disclosed yet spercificly designed around single thread performace. how about given what you know of the design so far take a guess on what they might be.
His primary domain is the server market, and there is a limited amount he can say about the client side.
The bulk of AMD's promises are that it will be better than its predecessor, not relative to its competition.
Fruehe has not promised that it will match or beat SB in client applications.
I was more bullish on BD in the desktop market before some of these details came to light. The confirmation of Zambezi being the best AMD can offer on desktop in 2011, the cramped FPU, the ridiculous AVX/FMA/WTF fiasco, continued process delays, the server focus, etc.

Then there was the disclosure of details on SB, which has very good features, clocks and a 3-4 quarters head-start in the client space.
I don't care much about the IGP, but the rest of the architecture is very solid and it has no need for "secret sauce improvements" to potentially impress me in the future.

I am allowed to be more down on BD with the appearance of more information.
SB looks better than I expected, and BD looks worse than I had expected.
 
Last edited by a moderator:
Well, at least their Bobcat based parts should be quite good against their competition.
\sigh

Since bobcat 1.0 is on bulk, I am assuming the roadmap is to fab it on bulk process only. That should help with the process disparity.

AMD has a very good chance for clients without a discrete GPU which is where the volume lies. Clients using a discrete GPU are likely to go with Intel CPU. Their best hope is to increase the lower bound on GPU prices over time increasing market opportunity for their fusion parts.
 
well AMD have said sampling to partners in Q4( i wonder if early or late in Q4), im not that farmilar with this stuff but how long in general from sampling to actual release?
 
Barcelona taped out in 8/06, and started sampling months later. It was launched in 9/07.
Sandy Bridge went into volume sampling 4/10, and will launch 1/11.

It seems 3 quarters to a year doesn't sound too far out there.
 
deneb was novemeber sample to jan release.
i cant find an exact sample date for istanbul but it was being sampled in april and realesed june.

Magny-Cours sampled in Jan "released" very late March

not having a new chipset must help with the time to market aspect a little bit, granted this is a new arch on a new process.

We all know the disaster that Barcolona was in just about every way. I also remember C2D's ES being out for ages before release. So is it possible to assume that Barcolna was a complete fuck up and intel choose to take a longer time from sample to release, could this have to do with product ramp given they need much higher volumes the AMD?

im being very half full :p

edit: amd have also said they had BD sillicon in Q2
 
Xbit reports a 'huge' 8MB L3, 16MB total.
Isn't SB going to have 16MB just in L3?
Only another 1MB total in the L2s makes it about even though I guess.

Edit: Hmm, seems that people are saying 8MB L3 for SB.

Also, I'm not seeing anything nifty about SB using a ringbus with over 8 stops o_O
ATI came to the conclusion that its a pretty dumb idea & went back to an xbar for RV770 & later.
 
Anand thought that the ring is mainly there for adding more clients in future iterations (i.e. Ivy Bridge).I support his view. You must agree that in the core count++ era, the temptation to drop point-to-point connections will always be there.

I believe the main advantage of ring bus is die size reduction (less total wires, wires which don't achieve great densities). So it's as usual a trade-off, as the connections probably have now bigger (or less predictable) latency. Intel decided to go for it and I personally doubt it will prove a failure.
 
I believe the main advantage of ring bus is die size reduction (less total wires, wires which don't achieve great densities). So it's as usual a trade-off, as the connections probably have now bigger (or less predictable) latency. Intel decided to go for it and I personally doubt it will prove a failure.

AFAIK, the wires are implemented on the metal layers so they don't consume area. The delay/power they add is another matter though.
 
Am actually just reading the Anand article.
Hmm, L3 at core clock rate is a pretty nifty thing :cool:

Variable latency, slices running at different clocks & contention for access with the GPU is iffy though.
 
Last edited by a moderator:
AFAIK, the wires are implemented on the metal layers so they don't consume area. The delay/power they add is another matter though.
They consume area on the metal layers and metal connects to the base so more wires means less density in the base. As the process technology shrinks wires will have a bigger impact on area.
 
It's just a via. And iirc, in a well done design, area consumed by via's can be minimized by careful placement.

PS: Multiple vias for redundancy are another matter though.
 
New Chipset 2011Q2

http://www.xbitlabs.com/news/mainbo...er_Compatible_Core_Logic_Sets_in_Q2_2011.html

Although there will be several minor differences between AMD 8-series and AMD 9-series core-logic sets, the main two features of the new Scorpius platform will be support for new Bulldozer micro-architecture processors as well as IOMMU[...]

While it is clear that IOMMU sports a number of advantages in server platforms and virtualized environments, its main advantage in desktop platforms will likely be improvements in heterogeneous computing.
It seems to imply better support for virtualisation of the graphics processor, I suppose. Don't understand this stuff.
 
http://www.xbitlabs.com/news/mainbo...er_Compatible_Core_Logic_Sets_in_Q2_2011.html


It seems to imply better support for virtualisation of the graphics processor, I suppose. Don't understand this stuff.

That article isn't quite correct. 890FX already supports IOMMU today, so the these "new" chipsets bring... absolutely NOTHING new!

Good job AMD for rebranding the 770 Chipset twice in three years. :rolleyes:

At the time of Bulldozer's release that chip will be four years old not even nVidia ever extended one chip for so long, apart from the MCP55 maybe.
 
WHY would they consume area on the silicon layer?
Good question, supposedly Larrabee's ring bus runs under L2 cache, so supposedly doesn't consume extra area. Does the ring bus in Sandy Bridge have the same kind of layout? Isn't there another Intel processor already out there with a ring bus? Does that consume area?
 
That article isn't quite correct. 890FX already supports IOMMU today, so the these "new" chipsets bring... absolutely NOTHING new!

Good job AMD for rebranding the 770 Chipset twice in three years. :rolleyes:

At the time of Bulldozer's release that chip will be four years old not even nVidia ever extended one chip for so long, apart from the MCP55 maybe.

They didn't rebrand 790FX into 890FX.
790FX doesn't support IOMMU and 890FX does. Apart from that they are almost identical, but 890 is more refined thanks to being also used in server platforms as SR5690 chipset logic.
 
WHY would they consume area on the silicon layer?

Blocks on a chip can either be logic limited or wire limited. Its not unusual to have areas which are completely dominated wiring such that the logic density either isn't high or is incredibly sparse. The transistor layer isn't the only important layer on a chip as far as area. There are some real constraints on wires that can impact where and how you can actually route them esp with things like wide cycle aligned buses.
 
Back
Top