AMD Bulldozer Core Patent Diagrams

Our man Dresdenboy has found apparently leaked cache size info for Bulldozer:
http://citavia.blog.de/2010/01/21/bulldozer-s-cache-sizes-leaked-7846952/

64KB L1, 2MB L2 shared between the two cores, a bit surprising because that would give them 8MB just in L2 for a 4 Module chip :oops:
Roadmaps say ">8MB cache"

The Open64 compiler only models 2 levels of the cache system. So you have the first level cache because it is the most important, and you have the last level cache because it is the second most important (defining main memory latency with the miss latency).

The last level cache would then be 2MB per core module, ie. 8MB for a 4 core module (8 context) device.

Cheers
 
AM3 compatible?

11snj40.jpg
 
maybe compatible with my AM2 mobo then (not even am2+ but houses an am3 now)

I like that attitude, not changing the socket because there's no need to change it at all :)
ddr4 still far away, hypertransport just does the job. I expect 1156 and 1366 to carry on as well
 
Is JEDEC considering more extensions to official DDR3 spec?

1600 is enough for a lot, but what about more ambitious stuff like Fusion?
 
Is AMD's bulldozer the one thats supposedly going to have virtual cores/threads (similar Intel hyper-threading - HT) ? Is so then that would be quite the upgrade to go from possible 2-4 (dual to quad core) then simply update BIOS, drop in CPU.. 16 threads ! ?
 
there not virtual cores, they are two sets of resources (cores) that share the same front end. each set of two "cores" is a bulldozer module(or cluster, CMT). so a quad core will be two modules. There is no reason why AMD couldn't then run SMT on each cluster in the CMT to have something like 8 modules with 16 "cores" with 32 threads.


but we haven't seen anything pointing in the SMT direction so its not likely to happen.
 
Is AMD's bulldozer the one thats supposedly going to have virtual cores/threads (similar Intel hyper-threading - HT) ? Is so then that would be quite the upgrade to go from possible 2-4 (dual to quad core) then simply update BIOS, drop in CPU.. 16 threads ! ?

AMD seems to be doubling execution units and retooling the other portions of the pipeline to handle multi-threading instead of scheduling two threads around pipeline bubbles / stalls w/ one set of execution units as Intel is doing. AMD is claiming that the die space overhead is small and the performance increase in multi-threaded scenarios is large, so it's hoping that one of its clusters (i.e. 2 cores, or sets of execution units tightly coupled) will be comparable in size to a single Intel i7 / Sandybridge core and have better performance in multi-threaded situations than hyper-threading. Here's hoping we'll see something concrete this year at last.
 
. There is no reason why AMD couldn't then run SMT on each cluster in the CMT to have something like 8 modules with 16 "cores" with 32 threads.

Assuming increasing the register amount is cheap, they would need to beef up the front end.


Edit: actually, it is more likely that AMD is assuming they can reach decent utilization on their execution units in the current BD configuration . Therefore, SMT will bring only marginal improvements.
 
Last edited by a moderator:
Oh, L4 & chip stacking...

The odd language & spelling mistakes could be dismissed as non-native English booboos & a bunch of this is stuff Dresdenboy has pointed to or came from AMD already but L4 is :rolleyes:

Dynamic cache & buffer scaling for power management seems unlikely too, but then Fermi has adjustable cache partitioning <shrug>

Postby JF-AMD on Thu Apr 01, 2010 10:12 am
This is TOTALLY uncool.

I could lose my job over this prank.

TAKE THIS THING DOWN NOW, DO NOT SPREAD IT.

I don't know who had this idea, but this is not good.
 
Last edited by a moderator:
If I'm reading the diagram correctly it looks like each BD module has the same FMAC throughput as FADD/FMUL. Great for future workloads, but not so hot for today's software if I'm not mistaken.
 
If I'm reading the diagram correctly it looks like each BD module has the same FMAC throughput as FADD/FMUL. Great for future workloads, but not so hot for today's software if I'm not mistaken.

http://citavia.blog.de/2009/11/23/some-additional-bits-of-information-7441398/

It is assumed each fmac unit can execute either (fadd AND fmul), OR one fmac.
So it's very fast also with today's software.

(that's why the fadd and fmul are separate boxes inside the fmac unit, with separate input busses drawn there)
 
I got an irritating question: earlier the SIMD was on top of the FP, then it got it's own backbone. Where is it in bulldozer? Will we have integers SIMD in the integer modules, and floating-point SIMD in the fp module? Then all modules must share a single register-file to be fast enough. Is it all in the fp module? So SIMD throughput mirrors the 2:1 fp:int? YMM-use "appears" to need a mode-switch, which means the processor knows when the 256bit fp-unit has to serve 2x 128bit XMM-SIMDs and could rewire the 256bit-block into 2x 128bit blocks, if only the FP would have two schedulers (micro-ones, just enough to have two different instructions run). Most SIMD instructions allready have 1/3 throughput, in case of XMM, does this mean the 1/3 is preserved, and in case the fp can NOT be split in two, it can issue 2x 1/3 (1/6), as two independent instructions data can fit into 256bit? So much opportunity, if wasted (no improments), I'm going to boycott. Ugh. Nasty thoughts. :smile:
 
Back
Top