AMD Bulldozer Core Patent Diagrams

Doomtrooper · Sep 26, 2011

LunchBox said:
While looking at those performance slides, I couldn't help but snicker and laugh at the content because it just reeked of desperation.

I would not snicker unless you want to pay a small mortgage to Intel without competition like back in the Pentium 60 days...but be my guest.

AlexV · Sep 26, 2011

mczak said:
2) improve cache subsystem

This may happen in terms of fixing the currently crippling bug they have with writes to the L1, and other apparent gimpyness. So it will be improved in terms of being less weak, but it'll still be far too weak, IMHO.

leoneazzurro · Sep 26, 2011

Or strongly lowering latency. It would be probably more useful for desktop workloads to have a lowe ramount (512K-1Mbyte) of cache L2, but with much lower latency than having 2 Mbyte of cache with those pesky timings (L1 bug aside, which anyway could be very useful to analyze in detail, because if with a "crippled" L1 BD performs on par with a 2600K, it would be interesting to know how well it could have performed without problems).

fehu · Sep 26, 2011

so all the problems came from faulty cache design in your opinion?
and something that can be easily addressed in pilediver?

leoneazzurro · Sep 26, 2011

fehu said:
so all the problems came from faulty cache design in your opinion?
and something that can be easily addressed in pilediver?

If you are asking to me, not all the possible issues can come from the cache design (lower execution unit count per "core" and need for high clocks are examples). But having a better cache system always helps

Never said that this could be easy. But 18-20 cycles compared to SB's 10 with frequency being higher but not enough to compensate the difference could hurt performance. This is of course the result of cache size and frequency targets but larger caches use more die area and thus this lowers the performance/area ratio.

AlexV · Sep 26, 2011

fehu said:
so all the problems came from faulty cache design in your opinion?
and something that can be easily addressed in pilediver?

No and no. Their problems seem to be quite uArch related. I'm also not seeing the 2600K parity happening all that much (I'm pretty sure 2500K parity is not as clear-cut either, but hey, I do hope I'm wrong!). Also, why are we pleased that a much larger, 125W TDP CPU sortof almost matches a smaller, 95W part?

leoneazzurro · Sep 26, 2011

AlexV said:
No and no. Their problems seem to be quite uArch related. I'm also not seeing the 2600K parity happening all that much (I'm pretty sure 2500K parity is not as clear-cut either, but hey, I do hope I'm wrong!). Also, why are we pleased that a much larger, 125W TDP CPU sortof almost matches a smaller, 95W part?

Because before that we had an even larger 125W TDP CPU that not even matches a smaller 95W part, maybe? So at least until Ivy Bridge we could have at least a little more competition.
Which µarchitecture problems are you referring to? Narrow execution units, problems with the front-end or whatever else?

fellix · Sep 26, 2011

The Core 2 line proved that a big and fast L2 (shared between 2 cores) is a very good fit for desktop and some WS applications, but due the the inadequate system architecture it scaled poorly in SMP configurations. In Nehalem, Intel literally pushed the L3 cache as "the new L2" -- a sort pf a backbone for the whole memory sub-system, keeping actual copies of every cache above it, so the coherent traffic is kept out of the "real" L2s, and still being a fast and large enough to rapidly serve any misses from the upper levels. That turned the L2 cache into a more specific role of a truly private small but very low latency (only 10 cycles) piece of memory. The L1D cache (32KB) is 4 cycles in comparison, but 8 times smaller in size.
That way, Intel stroke two birds with one stone - a new scalable and very efficient server architecture for heavily threaded loads, and at the same time a potent performer for desktop applications, all thanks to the versatile "backbone" concept. Westmere-EX and SNB naturally developed this philosophy even further ahead. All this is on top of the already first class HW prefetch mechanism and memory disambiguation.
It all proves that the sheer size advantage is simply not enough, anymore. AMD didn't invested in more sophisticated and elegant workaround for this problem - they just scaled up the same old concept to a new ridiculousness in a gamble to save the day.

AlexV · Sep 26, 2011

leoneazzurro said:
Because before that we had an even larger 125W TDP CPU that not even matches a smaller 95W part, maybe? So at least until Ivy Bridge we could have at least a little more competition.
Which µarchitecture problems are you referring to? Narrow execution units, problems with the front-end or whatever else?

BD, if those numbers are even remotely accurate, doesn't change the competitive landscape at all, AMD is still left fighting for the same scraps it was fighting before, without impacting Intel in any way shape or form. Front-end should be fine on paper, just like many other things (hard to gauge how that pans out in practice though), but the Execution engine seems bonkers in practice.

Their cache hierarchy is pretty much bonkers too, with its exclusivism and other contortionisms even excluding the apparent bug(this is uArch, not something that's trivially fixed or validated IMHO). Their Turbo implementation seems rather limited too, but maybe that can be tweaked. It looked pretty decent on paper mind you, but the paper was pretty vague and the implementation is anything but.

leoneazzurro · Sep 26, 2011

AlexV said:
BD, if those numbers are even remotely accurate, doesn't change the competitive landscape at all, AMD is still left fighting for the same scraps it was fighting before, without impacting Intel in any way shape or form.

Of course Intel has the advantage. But at least the situation improved.

AlexV said:
Front-end should be fine on paper, just like many other things (hard to gauge how that pans out in practice though), but the Execution engine seems bonkers in practice.

Their cache hierarchy is pretty much bonkers too, with its exclusivism and other contortionisms even excluding the apparent bug(this is uArch, not something that's trivially fixed or validated IMHO). Their Turbo implementation seems rather limited too, but maybe that can be tweaked. It looked pretty decent on paper mind you, but the paper was pretty vague and the implementation is anything but.

Cache hierarchy cannot be changed or overhauled in Piledriver? Execution units cannot be improved (maybe dedicating less space to caches)?
If you know that Piledriver will be only a slightly tweaked Bulldozer, OK.

LunchBox · Sep 28, 2011

Doomtrooper said:
I would not snicker unless you want to pay a small mortgage to Intel without competition like back in the Pentium 60 days...but be my guest.

I'll put my money to where the performance is. If you like to support a company just for the sake of "competition" even if the item in question is underwhelming, then be my guest.

Sxotty · Sep 28, 2011

LunchBox said:
I'll put my money to where the performance is. If you like to support a company just for the sake of "competition" even if the item in question is underwhelming, then be my guest.

That has nothing to do with snickering about it. I am just tired of AMD failing. It isn't funny it is sad. Intel is already slowing down b/c they have no reason to put anything better out. They are charging $1k for a processor. It is ridiculous.

hoho · Sep 28, 2011

Sxotty said:
They are charging $1k for a processor. It is ridiculous.

They also charged 1k when their highest-end CPU was miles behind AMD's midrange. Since around Core2 you've been able to get a decently performing CPU from Intel for around $200-300. Sure, it's not the highest-end but I wouldn't say paying 4x higher price for a few hundred MHz extra is worth it. Going from 4 -> 6 cores is another thing of course but their prices start from around €500, not sure how much cheaper they could be in US.

Doomtrooper · Sep 28, 2011

LunchBox said:
I'll put my money to where the performance is. If you like to support a company just for the sake of "competition" even if the item in question is underwhelming, then be my guest.

Never said that, just said laughing at a competitor trying to take on a giant when as consumers we NEED AMD is not the smartest move....wait for full benchmarks before snickering.

swaaye · Sep 28, 2011

AMD too had a bit of fun with selling desktop processors for $1000 back during the Athlon 64 era. Those FX chips on 939 come to mind. Then they got bulldozed (harhar) by Core 2 and cut most of their product prices in half. Thanks Intel.

Sxotty · Sep 29, 2011

hoho said:
They also charged 1k when their highest-end CPU was miles behind AMD's midrange. Since around Core2 you've been able to get a decently performing CPU from Intel for around $200-300. Sure, it's not the highest-end but I wouldn't say paying 4x higher price for a few hundred MHz extra is worth it. Going from 4 -> 6 cores is another thing of course but their prices start from around €500, not sure how much cheaper they could be in US.

They are not cheap and that is the problem I have a 6 core (AMD) and don't want to go to 4 core.

Rootax · Sep 29, 2011

Sxotty said:
They are not cheap and that is the problem I have a 6 core (AMD) and don't want to go to 4 core.

If 4 cores can beat your 6 cores even in heavy multithreaded apps, what's the problem ?

Malo · Sep 29, 2011

Because 6 > 4!

Sxotty · Sep 29, 2011

Rootax said:
If 4 cores can beat your 6 cores even in heavy multithreaded apps, what's the problem ?

The problem is also I run multiple instances. I run one for each core. The app doesn't need multithreaded since they are completely independent.

hoho · Sep 29, 2011

In that case higher single-threaded performance should provide even higher overall throughput due to less cache trashing.

AMD Bulldozer Core Patent Diagrams

Doomtrooper

AlexV

Heteroscedasticitate

leoneazzurro

fehu

leoneazzurro

AlexV

Heteroscedasticitate

leoneazzurro

fellix

AlexV

Heteroscedasticitate

leoneazzurro

LunchBox

Sxotty

hoho

Doomtrooper

swaaye

Entirely Suboptimal

Sxotty

Rootax

Malo

Yak Mechanicum

Sxotty

hoho