AMD Bulldozer Core Patent Diagrams

DarthShader · Feb 22, 2011

That guy got it wrong. It should be more like:

As Michael Golden, an AMD engineer, explained during its presentation, each dual-core module, when fully loaded, is capable of delivering 90% of the speed of a similar native dual-core processor, while featuring a lower power consumption and utilizing less die space.

http://news.softpedia.com/news/AMD-Talks-About-Bulldozer-Architecture-at-ISSCC-2011-185657.shtml

Which is known for about half a year now.

3dcgi · Feb 23, 2011

3dilettante said:
This sounds like a miswording or a misquote.
The targets for Bulldozer's core were not set at 90% of earlier cores, it was supposed to be higher-performance.

It could be a misplaced quote about Bobcat.

jakobx · Feb 23, 2011

I thought it meant that running two cores in one module achieves 90% of the performance compared to running only one core in a module. I could be completely wrong

Triskaine · Feb 23, 2011

jakobx said:
I thought it meant that running two cores in one module achieves 90% of the performance compared to running only one core in a module. I could be completely wrong

You are right. Mister Golden was simply misquoted, here's what he said originaly:

"...As Michael Golden, an AMD engineer, explained during its presentation, each dual-core module, when fully loaded, is capable of delivering 90% of the speed of a similar native dual-core processor, while featuring a lower power consumption and utilizing less die space.
This enables AMD to pack more cores inside the same die space and power budget..."

hoom · Feb 26, 2011

There are a bunch of new blog posts regarding ISSCC presentations, doesn't seem to be anything much in the way of detail there though
http://blogs.amd.com/work/2011/02/18/what-to-expect-from-amd-at-isscc-2011/
http://blogs.amd.com/work/2011/02/21/amd-at-isscc-bulldozer-design-solutions/
http://blogs.amd.com/work/2011/02/21/amd-at-isscc-whats-in-a-box/
http://blogs.amd.com/work/2011/02/22/amd-at-isscc-bulldozer-innovations-target-energy-efficiency/
http://blogs.amd.com/work/2011/02/23/amd-at-isscc-the-cool-kids/

There is a link in one of them to the Hotchips presentation which I haven't seen before & does include nice detail though
http://www.hotchips.org/uploads/archive22/HC22.24.720-Butler-AMD-Bulldozer.pdf

itsmydamnation · Feb 27, 2011

hoom said:
There are a bunch of new blog posts regarding ISSCC presentations, doesn't seem to be anything much in the way of detail there though
http://blogs.amd.com/work/2011/02/18/what-to-expect-from-amd-at-isscc-2011/
http://blogs.amd.com/work/2011/02/21/amd-at-isscc-bulldozer-design-solutions/
http://blogs.amd.com/work/2011/02/21/amd-at-isscc-whats-in-a-box/
http://blogs.amd.com/work/2011/02/22/amd-at-isscc-bulldozer-innovations-target-energy-efficiency/
http://blogs.amd.com/work/2011/02/23/amd-at-isscc-the-cool-kids/

There is a link in one of them to the Hotchips presentation which I haven't seen before & does include nice detail though
http://www.hotchips.org/uploads/archive22/HC22.24.720-Butler-AMD-Bulldozer.pdf

here is the hot chips presentation video
http://www.hotchips.org/archives/hc22video/session7.html

hoom · Feb 27, 2011

Oh cool thanks

fellix · Mar 1, 2011

Pretty much "undoctored" die-shot me thinks. Very modular design... too much, if you ask me.

Npl · Mar 1, 2011

whats to the left of the northbridge (across the crossbar)? Seems way to big to be simple filler space or traces. Could almost fit another 2MB cache there

fellix · Mar 1, 2011

You can see a lot of such "empty" space on the six-core Phenom die. Just traces, what it looks like.

Blazkowicz · Mar 1, 2011

lots of breathing room

also, cut in half and imagine a next-gen GPU on the left side.

Triskaine · Mar 2, 2011

Blazkowicz said:
Also, cut in half and imagine a next-gen GPU on the left side.

That's what Trinity will be in 2012.

fellix · Mar 7, 2011

Leaked benchmarks of Interlagos in F@H:

F@H Benchmarks Interlagos (Without(!) Turbo core 2.0):
ubuntu 10.10 server x64
512G DDR3-1333
P6901
Average time/frame: 00:03:52

[09:15:07] Completed 0 out of 250000 steps (0%)
[09:18:59] Completed 2500 out of 250000 steps (1%)
[09:22:42] Completed 5000 out of 250000 steps (2%)
[09:26:16] Completed 7500 out of 250000 steps (3%)
[09:30:08] Completed 10000 out of 250000 steps (4%)
[09:34:06] Completed 12500 out of 250000 steps (5%)

For comparison:
Bulldozer "Interlagos" 16x4@ 1.8GHz* = 00:03:52
Opteron "Magny Cours" 12x4@ 2.2GHz = 00:06:40

Source

~58% higher single-threaded performance compared to K10, clock for clock.

3dilettante · Mar 7, 2011

Is that F@H benchmark single-threaded?

rpg.314 · Mar 7, 2011

It's highly unlikely that it is serial, it's a pretty old client and I am sure that it is well threaded. It scores more with lower clocks. Although AVX may have tipped the balance if it was serial.

1.8G is pretty low for a speed racer. I was expecting almost the same clocks as MC at at launch. Although speeds might increase with more mature process.

rpg.314 · Mar 7, 2011

That page also has Llano benchmarks. I would have liked to see a comparison of Llano with a discrete gpu, especially the power benches.

entity279 · Mar 8, 2011

rpg.314 said:
1.8G is pretty low for a speed racer. I was expecting almost the same clocks as MC at at launch. Although speeds might increase with more mature process.

Hence there are 3 mouths+ till launch ...

Miksu · Mar 9, 2011

Nice to see that some software vendors are already taking advantage of the new features provided by Bulldozer (and Sandy Bridge). From the Visual Studio 2010 Service Pack 1 readme:

Visual Studio 2010 SP1 adds intrinsic functions or intrinsics to enable the extensions on the AMD and Intel new microprocessors that will be released next year. The intrinsic functions allow highly efficient computing without the overhead of a function call. For more information about the intrinsics function, visit the following website:
Compiler Intrinsics

For more information about the extensions, visit the following third-party websites:
Intel AVX
AMD Bulldozer instruction sets

hoho · Mar 9, 2011

Miksu said:
Nice to see that some software vendors are already taking advantage of the new features provided by Bulldozer (and Sandy Bridge). From the Visual Studio 2010 Service Pack 1 readme:

I'm fairly certain GCC has included AVX support for at least couple of years now, not sure how good it is though.

itsmydamnation · Mar 11, 2011

interesting couple of posts from JFAMD on anandtech:

We have a 256b FP datapath (pipes 0 and 1) AND a 256b INT datapath (pipes 2 and 3), so

2 128b FP + 2 128b INT
or
1 256b FP + 2 128b INT
or
1 256b FP + 1 256b INT
or
2 128b FP + 1 256b INT

The INT here is an integer unit for doing the integer portion of math inside an SSE instruction, that is not the integer clusters that you would commonly call cores.

Plus there is a really cool feature around moves. Technically, we can do 4 128b SSE moves per cycle with a ZERO cycle latency. This is known as “MOVE ELIMINATION”.

And to further clarify directly:

Also there are some features AMD downplayed so far in my opinion. It is because obviously AMD has not only 2 FPU pipes and 2 MMX pipes. Those MMX pipes don't do MMX they are full 128 Bit integer SSE pipelines
(true).

So all register moves and load/stores can be executed also in those two pipelines
(not really, reg-reg moves for SSE and AVX-128 can be done with mov-elimination

Load – doesn’t actually require an execution pipe in the FP at all – but is limited to 2 128b loads/cycle max throughput.
Store – does take an execution pipe, but can only execute down 1 of the pipes. That & LS restrictions limit it to 1 128b store/c throughput)

I recently read a source that those two don't do 64 Bit MMX but 128 Bit SSE! Really don't know why AMD was so quiet about that so far and obfuscated that by using the wrong term "MMX". Therefore AMD can do 4 * 128 Bit SSE/cycle!

(yes, “MMX” is likely a bad name to use in describing the BullDozer micro-architecture and is somewhat misleading. Yes, we can do 4 128b arithmetic operations/cycle: 2 “floating-point” and 2 “SSE/AVX-128 integer”. Or/instead/in-combination we can also do 2 x87 “floating point” and 2 mmx “integer” per cycle – and by mmx I really mean the architected “mmx”).

And that is the sound of me clapping my hands like a blackjack dealer and saying "all done", can't get any further into this topic.