AMD Bulldozer Core Patent Diagrams

Wow, just wow. I'd expect such clueless posts on OCN or other benchmark wanking places, but here??

Worse than Phenom II? Don't stop there, say it's worse than a K6 too. :rolleyes:

So ... once you stop looking at how well the FX processors actually perform, then they become great? I doubt most people would see things this way.

Phenom II vs FX in games:

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/8

Power consumption?

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/9

Piledriver needs to pull something special out of the bag because this kind of performance will hurt future Fusion products. And that would be a bad thing for everyone.
 
So ... once you stop looking at how well the FX processors actually perform, then they become great? I doubt most people would see things this way.
Yeah, most people without a clue, like so called "enthusiasts" or benchmark wankers, on short those who will judge the processor based on desktop workloads and benchmarks of dubious neutrality. And that is not the kind of people and applications the processor was build for.

You mention games, do you think this was actualy designed to run games? Please. Check this out: http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200/2 , explains a lot of the design decisions made. No, it still doesn't do great in that review, but all things considered it manages to be competetive.
 
Well isn't that a nice and friendly post.

So then AMD designed their new architecture to be inept at desktop workloads and has no other architecture to address that market then right? That's brilliant. Why do I doubt that was the intention of this project. AMD will tout whatever strengths it has so yeah you can expect them to say it was designed for servers when that's the only place it isn't terrible.
 
I think the scheduler tweaks are AMD skirting around the real truth of the CPU needing special attention due to bugs in the cache and such. I think the design is supposed to be less sensitive to scheduler problems otherwise they would have used the logical/physical CPU designations that the OS knows about for Intel HT.
They could still have done this even if they would have noticed those issues very late, this should be under complete control of the bios.
In any case even if it wouldn't have cache issues and such, if for instance you consider 4 FP-heavy threads it is very obvious scheduling these to run on only two modules can't be optimal always even if that allows it to reach higher clocks, maybe it was done more for power draw reasons or something.
 
Yeah, most people without a clue, like so called "enthusiasts" or benchmark wankers, on short those who will judge the processor based on desktop workloads and benchmarks of dubious neutrality. And that is not the kind of people and applications the processor was build for.

So the problem is the customer and the software they want to run.

You mention games, do you think this was actualy designed to run games? Please. Check this out: http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200/2 , explains a lot of the design decisions made. No, it still doesn't do great in that review, but all things considered it manages to be competetive.

Who do you think AMD want to sell their FX processors to? What do you think those people want to do?

http://www.amd.com/uk/products/desktop/processors/amdfx/Pages/amdfx.aspx

Uh oh. You can tantrum about clueless wankers all you like but if you're expecting people to disregard that the FX line sucks for what they want and satisfy themselves with a "doesn't do great" Opteron review then it's you with the problem. Well, you and AMD. And everyone else if Piledriver Fusion sucks as badly as the new FX chips.

Anyway this is all horribly OT so I'm out of this line of conversation.
 
It might be worse than Phenom II when you consider how inconsistent its game performance apparently is.
http://www.hardocp.com/article/2011/11/03/amd_fx8150_multigpu_gameplay_performance_review/2

The unstable frame rate looks really poor.

The Dragon Age 2 dual SLI section looks just as bad:

http://www.hardocp.com/article/2011/11/03/amd_fx8150_multigpu_gameplay_performance_review/5

So what specifically is causing this? It's like the FX is having a fit. The spikes in performance are way above the 2500K average line, so does that mean it could be an SLI driver bug or some issue with measuring framerate?
 
AMD expect to add 10% extra performance with every 'tock' they release.

So the next release will bring performance back up to Phenom 2 level, Nice one AMD :rolleyes:
 
AMD expect to add 10% extra performance with every 'tock' they release.
That's perf per watt from what I can tell from their graphs. If it were just 10% then it would basically boil down to improving production technology, not architecture.
 
That's perf per watt from what I can tell from their graphs. If it were just 10% then it would basically boil down to improving production technology, not architecture.

image.php


10-15% increase per year meens that it won't be until 2013 that they can offer anything remotely faster then what a Phenom 2 x6 offers.
 
Yeah, most people without a clue, like so called "enthusiasts" or benchmark wankers, on short those who will judge the processor based on desktop workloads and benchmarks of dubious neutrality. And that is not the kind of people and applications the processor was build for.

You mention games, do you think this was actualy designed to run games? Please. Check this out: http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200/2 , explains a lot of the design decisions made. No, it still doesn't do great in that review, but all things considered it manages to be competetive.

Mind your manners please. I'm sure that line of argument is very popular elsewhere, so keep it to those other places and far away for B3D.
 
Who do you think AMD want to sell their FX processors to? What do you think those people want to do?

http://www.amd.com/uk/products/desktop/processors/amdfx/Pages/amdfx.aspx

Uh oh. You can tantrum about clueless wankers all you like but if you're expecting people to disregard that the FX line sucks for what they want and satisfy themselves with a "doesn't do great" Opteron review then it's you with the problem.
The marketing team got fired recently. Good riddance.

And no, I don't want people to pretend BD is an awesome chip when it's not, or more precisely not for them. It's perfectly fine when they buy what fits their needs better. If someone is in the market for CPU he can buy an SB and be happy. I am myself waiting for Ivy Bridge, since it will fit my needs very well (Tinn-R is mostly singlethreaded) and has a hardware RNG (for Monte Carlo simulations and other various statistical stuff). But please, stop the endless digs and complaining. For example:

So the next release will bring performance back up to Phenom 2 level, Nice one AMD :rolleyes:
What does that "line of argument" achieve? How does it add to discussion?

Most people who read the news already made up their opinion within a week of BD release, mostly unfavourable. Me too, quoting Charlie's "SB being biggest disappointment of the year", I said BD is the biggest disapointment of the decade. Can we move on now? Repeated digs won't increase IPC, or make AMD work faster on updates.

Infact, the people themselves are partialy responsible for the whole situation on the CPU market, voting with their wallets and posting their opinions on forums over the years. I bet some of them will later complain how CPU technology isn't going forward in terms of performance, cause Intel has no meaningful competition. It's the shortsight that annoys me, even though I can't really blame people for caring only about themselves and not seeing the whole picture. Already got used to it in other places, stopped reading those, but seeing similar to those "arguments" here is for me unexpected. Hence my (too) strong choice of words.

For which I appologize.

Mind your manners please. I'm sure that line of argument is very popular elsewhere, so keep it to those other places and far away for B3D.
Noted and removing myself form the discussion, till there is some news to report or discuss, hopefully good ones.
 
Yeah, my comment about the FX processors being "turds" didn't help either. FWIW Sandybridge is my first Intel / non AMD chip since the 1990's. Bulldozer is clearly much better suited to servers than desktops, but with Trinity AMD's entire consumer lineup is at risk of being further weakened. It would be really bad to see Trinity lose to Llano in anything CPU constrained.

I'd like to know what clock speeds AMD were originally envisioning Bulldozer running at, and just how much GloFo's process has under delivered. It must be pretty bad for Llano to be capped at 2.9 gHz and for there to be no turbo on the 2.6 and 2.9 gHz models.
 
BD seem particularly vulnerable to lack of process affinity settings. This is inherited from earlier AMD processors, but exacerbated by the increase in L2/3 latency.

Without process affinity, a process can bounce from one core to another. When a process is pre-empted and the re-scheduled on a different core, it will produce a mass of cache misses on the new core. All the data needs to be fetched from the data cache of the core that ran the process previously.

On Athlon X2 this could result in a 10-15 % hit running a single process on a dual core.

There was great improvement in Windows scheduling going from XP to Vista to remedy this problem.

Intel's Core 2 duo queries the neighbouring cores' L1 cache in parallel with the L2 on a L1 miss, greatly reducing this penalty. I don't know about newer Intel CPUs.

It appears that the scheduling mess is back with BD, possibly compounded with the page aliasing issues of the L1 I-caches. Remedying these issues would require:

1. Fix Windows/Linux scheduling
2. Fix page aliasing in hardware, or implement page colouring in Windows/Linux.
3. Implement parallel queries of neighbouring L1s.

None of them easy.

Cheers
 
1. Fix Windows/Linux scheduling
Back when I was still using P4 HT at around 2003-ish under Linux it had no problems keeping single-threaded application running on one specific core without having to manually specify to do it. I'd dare to say that shceduling ping-pong has never really existed under Linux. Though I haven't really looked at how threads are handled when you have >1 physical cores with HT or BD-like modules.
 
BD seem particularly vulnerable to lack of process affinity settings. This is inherited from earlier AMD processors, but exacerbated by the increase in L2/3 latency.

Without process affinity, a process can bounce from one core to another. When a process is pre-empted and the re-scheduled on a different core, it will produce a mass of cache misses on the new core. All the data needs to be fetched from the data cache of the core that ran the process previously.

On Athlon X2 this could result in a 10-15 % hit running a single process on a dual core.

There was great improvement in Windows scheduling going from XP to Vista to remedy this problem.

Intel's Core 2 duo queries the neighbouring cores' L1 cache in parallel with the L2 on a L1 miss, greatly reducing this penalty. I don't know about newer Intel CPUs.

It appears that the scheduling mess is back with BD, possibly compounded with the page aliasing issues of the L1 I-caches. Remedying these issues would require:

1. Fix Windows/Linux scheduling
2. Fix page aliasing in hardware, or implement page colouring in Windows/Linux.
3. Implement parallel queries of neighbouring L1s.

None of them easy.

Cheers
Intel use inclusive cache design now. I think it has similar function as what you said as "Intel's Core 2 duo queries the neighbouring cores' L1 cache in parallel with the L2 on a L1 miss"



Scheduling mess is a problem, some bench has proove it. But not ALL.
Bulldozer sitll behave not quite satisfy on some Muil-Thread INT task on Linux, based on the bench in Openbenchmarking.org

AMD still hadn't change there cache structure to some extend.
The schedule within the dispatch may be the cause of some problem too.
 
Back when I was still using P4 HT at around 2003-ish under Linux it had no problems keeping single-threaded application running on one specific core without having to manually specify to do it. I'd dare to say that shceduling ping-pong has never really existed under Linux. Though I haven't really looked at how threads are handled when you have >1 physical cores with HT or BD-like modules.
I'd like to know more about the Thread Schedule too.
It's just as what had been said before, It sounds like it won't be a problem if OS has an approach to regard the BD cluster as a SMT core.
 
Back when I was still using P4 HT at around 2003-ish under Linux it had no problems keeping single-threaded application running on one specific core without having to manually specify to do it. I'd dare to say that shceduling ping-pong has never really existed under Linux. Though I haven't really looked at how threads are handled when you have >1 physical cores with HT or BD-like modules.

A process ping-ponging back and forth between two logical threads on the same core doesn't produce any cache misses.

Linux used to have the same problem as Windows. The scheduler tried to average the length of each run queue, resulting in processing bouncing around. Instead you should only re-schedule a process to another core if the run-queue for a core is larger than one.

Cheers
 
A process ping-ponging back and forth between two logical threads on the same core doesn't produce any cache misses.
I know but my point was that it simply wasn't happening. Similarly load stayed on one core (or 2, 3, 4 when there were more threads obviously) when I upgraded to q6600 years later.

Though I was using Gentoo, it might have been that I turned on some weird kernel flag or something that made it behave like that :)
 
I know but my point was that it simply wasn't happening. Similarly load stayed on one core (or 2, 3, 4 when there were more threads obviously) when I upgraded to q6600 years later.

Though I was using Gentoo, it might have been that I turned on some weird kernel flag or something that made it behave like that :)

Interesting. I had a Redhat installation and it had each core of my X2 loaded at 50% running a single big-ass query in MySQL.

I guess I should have updated the kernel more often :)

Cheers
 
Speaking of thread management, if someone here have access to a Bulldozer system, I would be grateful for a test run with this little console application in the post attachment. It measures the sync latency between all the CPUs and cores (physical/virtual/local/remote) that are present in the system.

Here are some reference numbers:
Code:
Core 2 Q6600 -- 3825MHz
 
CPU0<->CPU1:       24.6nS per ping-pong
CPU0<->CPU2:      106.8nS per ping-pong
CPU0<->CPU3:      106.4nS per ping-pong
CPU1<->CPU2:      107.0nS per ping-pong
CPU1<->CPU3:      105.9nS per ping-pong
CPU2<->CPU3:       24.6nS per ping-pong
Code:
Core i7-920 -- 3995MHz
 
CPU0<->CPU1:       13.3nS per ping-pong
CPU0<->CPU2:       44.0nS per ping-pong
CPU0<->CPU3:       44.1nS per ping-pong
CPU0<->CPU4:       44.2nS per ping-pong
CPU0<->CPU5:       44.1nS per ping-pong
CPU0<->CPU6:       44.3nS per ping-pong
CPU0<->CPU7:       44.2nS per ping-pong
CPU1<->CPU2:       44.1nS per ping-pong
CPU1<->CPU3:       44.0nS per ping-pong
CPU1<->CPU4:       44.2nS per ping-pong
CPU1<->CPU5:       44.1nS per ping-pong
CPU1<->CPU6:       44.3nS per ping-pong
CPU1<->CPU7:       44.1nS per ping-pong
CPU2<->CPU3:       13.0nS per ping-pong
CPU2<->CPU4:       44.1nS per ping-pong
CPU2<->CPU5:       44.1nS per ping-pong
CPU2<->CPU6:       44.3nS per ping-pong
CPU2<->CPU7:       44.1nS per ping-pong
CPU3<->CPU4:       44.1nS per ping-pong
CPU3<->CPU5:       44.1nS per ping-pong
CPU3<->CPU6:       44.3nS per ping-pong
CPU3<->CPU7:       44.1nS per ping-pong
CPU4<->CPU5:       13.1nS per ping-pong
CPU4<->CPU6:       44.4nS per ping-pong
CPU4<->CPU7:       44.7nS per ping-pong
CPU5<->CPU6:       44.3nS per ping-pong
CPU5<->CPU7:       44.2nS per ping-pong
CPU6<->CPU7:       13.0nS per ping-pong
 

Attachments

  • cache2cache_latency.rar
    4.7 KB · Views: 31
Back
Top