AMD Bulldozer Core Patent Diagrams

Perforamnce slides from AMD

Looks like the proprietary XOP and FMA4 ISA is the new "3DNow!". :LOL:

At least this time it will find some use in HPC tasks and most likely x264 codec will get optimization as well.

Of course I don't expect broad adoption any time soon, not until Intel jumps on FMA4 bandwagon.

One area where AMD can and will utilize it is of course for their GPU drivers. OpenCL especially ... so not quite as bad as 3DNow!
 
Of course I don't expect broad adoption any time soon, not until Intel jumps on FMA4 bandwagon.
Intel will use FMA3 for their next AVX ISA extension in Haswell, so AMD's implementation will be incompatible, at least in this first iteration of Bulldozer.
 
Intel will use FMA3 for their next AVX ISA extension in Haswell, so AMD's implementation will be incompatible, at least in this first iteration of Bulldozer.

True, Intel moved goalpost mid match with regards to FMA.
Anyway AMD already suggested they will introduce FMA3 with future revisions as you say.

It will be interesting to see how this pans out and compare to the rate of adoption of SSE4.x which also is not great.
 
I would be glad, if this positioning of BD against 980X could have some real effect -- if this mini price war from AMD could slash any Gulftown SKU into some more manageable purchase option, it will be very nice upgrade point for many LGA1366 users, including me. :p
 
Looks like they really missed the clocks they wanted by a big notch- and despite the large turbo numbers, it doesn't really help that much (?)- less than 10% gains allround.

Right now it seems like a rather inefficient use of die area, but if they keep churning this out and later on push a 20-30% higher clocked (and 140W obviously :rolleyes:) "8190", that might actually sell the platform well (say what you may, but AM3+ is probably much cheaper to bring over than a new socketed motherboard).

But right now they're really in SB territory, and that's not a good place to be. SB GT2 is pretty much an amazing sweet spot chip to say the least, you really wonder what the GT1 is for...
 
It's a marketing fault that BD doesn't look good as an 8 core.
If you look at BD as 4 core with 8 threads it does quite well, especially looking at leaked prices.
One thing where it fails is obviously die area, but that always was case with AMD and their strategy to fit one die for servers and desktops.
 
While looking at those performance slides, I couldn't help but snicker and laugh at the content because it just reeked of desperation.
 
It's a marketing fault that BD doesn't look good as an 8 core.
If you look at BD as 4 core with 8 threads it does quite well, especially looking at leaked prices.
One thing where it fails is obviously die area, but that always was case with AMD and their strategy to fit one die for servers and desktops.
Wasn't the whole point of CMT was that you would get nearly the same throughput of two full cores and constantly repeated again and again by AMD?

So far the leaks indicate that BD has decent throughput but extremely weak single-threaded performance, which supports that BD behaves more like a 8 core CPU. Intel 4C/8T processors like the 2600K have exceptional single-threaded performance which combined with the ~20% boost of Hyperthreading gives it good throughput.

Based on the performance from the leaks, BD's key problem is that the cores are only about as fast as a K8 of the same clock speed.
 
Last edited by a moderator:
At least this time it will find some use in HPC tasks and most likely x264 codec will get optimization as well.

Of course I don't expect broad adoption any time soon, not until Intel jumps on FMA4 bandwagon.

One area where AMD can and will utilize it is of course for their GPU drivers. OpenCL especially ... so not quite as bad as 3DNow!

Ugh, x264 is nice because the maintainers are awesome and support whatever's nice for them (IIRC they also included support for POPCNT from SSE4A, which makes them the...only? people to support that ISA extension). It means pretty much jack for general adoption/impact, though. And I'm seriously missing how this will help their GPU drivers in any significant form. Their CL stack should first figure out how to spew SSE code in any worthwhile manner, before moving on to FMAs and XOP, IMHO. Intel at least tries to do it!
 
So, another disastrous CPU. The only upside seems to be that Piledriver is about ~6 months away so this debacle shouldn't last much longer than their TLB fiasco.
 
So, another disastrous CPU. The only upside seems to be that Piledriver is about ~6 months away so this debacle shouldn't last much longer than their TLB fiasco.

Pretty much depends on what we expect Piledriver to be, no? Also, this isn't as bad as Failcelona IMHO, at least they're not vastly underperforming compared to their prior offerings. Also no nonsense about "definitely in the double digits" this round, although some of the official on-forum noise was somewhat disturbing, to say the least.
 
Well, considering that AMD has stuffed BD with 16MB of caches (and not from the most dense type), it's bound to be big, and for a host of many other reasons, of course.
On the matter of whether BD is 4 or 8 core design, I'm more inclined to accept it as a 4-core CPU... or 8-core, with shared front-end and FP/SIMD logic - meh. Whatever, just bring it on!
 
Well, considering that AMD has stuffed BD with 16MB of caches (and not from the most dense type)

Maybe they'd have been better off stuffing it with less cache of the silghtly faster type, as it appears their current cache hierarchy is quite ludicrous in terms of throughput, with L1 being crippled AFAICT.
 
AMD could still claim their CPUs have moar cache than the other guys around. Oh wait, wasn't this the case ever since T-bird came out some 10 years ago? Very naive reason to stick with the exclusive hierarchy, when Intel's Nehalem clearly demonstrated how you can have more-for-less in a very clean and streamlined cache architecture.
 
Pretty much depends on what we expect Piledriver to be, no? Also, this isn't as bad as Failcelona IMHO, at least they're not vastly underperforming compared to their prior offerings. Also no nonsense about "definitely in the double digits" this round, although some of the official on-forum noise was somewhat disturbing, to say the least.

I am hoping that Piledriver would be aimed at consumer markets, and hence might end up increasing it's area efficiency.

I think it is clear that BD is a poor fit for client workloads.
 
I am hoping that Piledriver would be aimed at consumer markets, and hence might end up increasing it's area efficiency.

I think it is clear that BD is a poor fit for client workloads.
I wonder how much it can improve though, I see not many possibilities without fundamentally changing the architecture:
1) increase clocks
2) improve cache subsystem

There's of course always other possibilities (like 256bit FP unit) and tweaks here and there but I'm not sure this can change the overall picture.
If you look at BD, it's not terribly efficient for server loads neither. One module is ~30mm², whereas one SNB core is only ~20mm² or so. Now given the right loads that BD module might be faster but on a perf/area scale it'll lose pretty much no matter what. If AMD could go by with much less L2 cache (say 512kB instead of 2MB but faster instead per module) it would look much better there (as one module would only be slightly larger than a SNB core), though they probably can't because the L3 has neither the bandwidth nor the latency to make this really work. But even if it would be possible it still would lose (very badly) in lightly threaded loads. That's a tradeoff which is built right in the BD architecture with the 2-issue INT cores, unless it was really designed for MUCH higher clocks (which I rather doubt).

Still I guess some changed cache architecture is something we'll see at least with Trinity. Either ditch the L3 cache or make L2 smaller (while improving L3 cache bandwidth/latency and also share it with IGP).
 
Back
Top