AMD Bulldozer Core Patent Diagrams

hoom · Jan 22, 2010

Our man Dresdenboy has found apparently leaked cache size info for Bulldozer:
http://citavia.blog.de/2010/01/21/bulldozer-s-cache-sizes-leaked-7846952/

64KB L1, 2MB L2 shared between the two cores, a bit surprising because that would give them 8MB just in L2 for a 4 Module chip

Roadmaps say ">8MB cache"

Ninjaprime · Jan 22, 2010

hoom said:
Our man Dresdenboy has found apparently leaked cache size info for Bulldozer:
http://citavia.blog.de/2010/01/21/bulldozer-s-cache-sizes-leaked-7846952/

64KB L1, 2MB L2 shared between the two cores, a bit surprising because that would give them 8MB just in L2 for a 4 Module chip
Roadmaps say ">8MB cache"

Shouldn't they have T-RAM by then? Supposedly the same density as DRAM, could explain it.

entity279 · Jan 22, 2010

So apparently no trace cache?

Gubbi · Jan 22, 2010

hoom said:
Our man Dresdenboy has found apparently leaked cache size info for Bulldozer:
http://citavia.blog.de/2010/01/21/bulldozer-s-cache-sizes-leaked-7846952/

64KB L1, 2MB L2 shared between the two cores, a bit surprising because that would give them 8MB just in L2 for a 4 Module chip
Roadmaps say ">8MB cache"

The Open64 compiler only models 2 levels of the cache system. So you have the first level cache because it is the most important, and you have the last level cache because it is the second most important (defining main memory latency with the miss latency).

The last level cache would then be 2MB per core module, ie. 8MB for a 4 core module (8 context) device.

Cheers

Squilliam · Jan 23, 2010

AM3 compatible?

itsmydamnation · Jan 23, 2010

not sure, the 12 and 16 "core" aren't ( G34, quad channel memory)

the 4 and 8 i dont know.

almighty · Jan 26, 2010

Squilliam said:
AM3 compatible?

Blazkowicz · Jan 26, 2010

maybe compatible with my AM2 mobo then (not even am2+ but houses an am3 now)

I like that attitude, not changing the socket because there's no need to change it at all

ddr4 still far away, hypertransport just does the job. I expect 1156 and 1366 to carry on as well

Tchock · Jan 28, 2010

Is JEDEC considering more extensions to official DDR3 spec?

1600 is enough for a lot, but what about more ambitious stuff like Fusion?

FrameBuffer · Jan 29, 2010

Is AMD's bulldozer the one thats supposedly going to have virtual cores/threads (similar Intel hyper-threading - HT) ? Is so then that would be quite the upgrade to go from possible 2-4 (dual to quad core) then simply update BIOS, drop in CPU.. 16 threads ! ?

itsmydamnation · Jan 29, 2010

there not virtual cores, they are two sets of resources (cores) that share the same front end. each set of two "cores" is a bulldozer module(or cluster, CMT). so a quad core will be two modules. There is no reason why AMD couldn't then run SMT on each cluster in the CMT to have something like 8 modules with 16 "cores" with 32 threads.

but we haven't seen anything pointing in the SMT direction so its not likely to happen.

Raqia · Jan 29, 2010

FrameBuffer said:
Is AMD's bulldozer the one thats supposedly going to have virtual cores/threads (similar Intel hyper-threading - HT) ? Is so then that would be quite the upgrade to go from possible 2-4 (dual to quad core) then simply update BIOS, drop in CPU.. 16 threads ! ?

AMD seems to be doubling execution units and retooling the other portions of the pipeline to handle multi-threading instead of scheduling two threads around pipeline bubbles / stalls w/ one set of execution units as Intel is doing. AMD is claiming that the die space overhead is small and the performance increase in multi-threaded scenarios is large, so it's hoping that one of its clusters (i.e. 2 cores, or sets of execution units tightly coupled) will be comparable in size to a single Intel i7 / Sandybridge core and have better performance in multi-threaded situations than hyper-threading. Here's hoping we'll see something concrete this year at last.

entity279 · Jan 29, 2010

itsmydamnation said:
. There is no reason why AMD couldn't then run SMT on each cluster in the CMT to have something like 8 modules with 16 "cores" with 32 threads.

Assuming increasing the register amount is cheap, they would need to beef up the front end.

Edit: actually, it is more likely that AMD is assuming they can reach decent utilization on their execution units in the current BD configuration . Therefore, SMT will bring only marginal improvements.

hoom · Apr 1, 2010

Gotta be an April fool?
http://www.amdzone.com/joomla/index...ldozer-exclusive&catid=60&Itemid=92&showall=1
Have only skimmed but the only blatant thing I have seen other than the 'too good to be true' factor is that images are in a /bulldozeraf/ directory.

rpg.314 · Apr 1, 2010

hoom said:
Gotta be an April fool?
http://www.amdzone.com/joomla/index...ldozer-exclusive&catid=60&Itemid=92&showall=1
Have only skimmed but the only blatant thing I have seen other than the 'too good to be true' factor is that images are in a /bulldozeraf/ directory.

Quite blatant and waay over the top. I liked the addition of the trace cache.

I guess the people laughing the hardest will be Intel employees who are working on Sandy Bridge.

hoom · Apr 1, 2010

Oh, L4 & chip stacking...

The odd language & spelling mistakes could be dismissed as non-native English booboos & a bunch of this is stuff Dresdenboy has pointed to or came from AMD already but L4 is

Dynamic cache & buffer scaling for power management seems unlikely too, but then Fermi has adjustable cache partitioning <shrug>

Postby JF-AMD on Thu Apr 01, 2010 10:12 am
This is TOTALLY uncool.

I could lose my job over this prank.

TAKE THIS THING DOWN NOW, DO NOT SPREAD IT.

I don't know who had this idea, but this is not good.

fehu · Apr 23, 2010

Dresden boy on the bulldozer architecture

really
it's only this image

ShaidarHaran · Apr 24, 2010

If I'm reading the diagram correctly it looks like each BD module has the same FMAC throughput as FADD/FMUL. Great for future workloads, but not so hot for today's software if I'm not mistaken.

hkultala · Apr 25, 2010

ShaidarHaran said:
If I'm reading the diagram correctly it looks like each BD module has the same FMAC throughput as FADD/FMUL. Great for future workloads, but not so hot for today's software if I'm not mistaken.

http://citavia.blog.de/2009/11/23/some-additional-bits-of-information-7441398/

It is assumed each fmac unit can execute either (fadd AND fmul), OR one fmac.
So it's very fast also with today's software.

(that's why the fadd and fmul are separate boxes inside the fmac unit, with separate input busses drawn there)

Ethatron · Apr 25, 2010

I got an irritating question: earlier the SIMD was on top of the FP, then it got it's own backbone. Where is it in bulldozer? Will we have integers SIMD in the integer modules, and floating-point SIMD in the fp module? Then all modules must share a single register-file to be fast enough. Is it all in the fp module? So SIMD throughput mirrors the 2:1 fp:int? YMM-use "appears" to need a mode-switch, which means the processor knows when the 256bit fp-unit has to serve 2x 128bit XMM-SIMDs and could rewire the 256bit-block into 2x 128bit blocks, if only the FP would have two schedulers (micro-ones, just enough to have two different instructions run). Most SIMD instructions allready have 1/3 throughput, in case of XMM, does this mean the 1/3 is preserved, and in case the fp can NOT be split in two, it can issue 2x 1/3 (1/6), as two independent instructions data can fit into 256bit? So much opportunity, if wasted (no improments), I'm going to boycott. Ugh. Nasty thoughts. :smile:

AMD Bulldozer Core Patent Diagrams

hoom

Ninjaprime

entity279

Gubbi

Squilliam

Beyond3d isn't defined yet

itsmydamnation

almighty

Blazkowicz

Tchock

FrameBuffer

itsmydamnation

Raqia

entity279

hoom

rpg.314

hoom

fehu

ShaidarHaran

hardware monkey

hkultala

Ethatron