Time for AMD to pull out the desktop market?

3dilettante · Jul 25, 2012

They should be priced roughly where their performance lies. If a given chip is in your price range and your usage model isn't an extreme outlier, I don't think there's a reason to regret anything.

The nature of AMD's situation is harsher, but that's not the buyer's problem.

eastmen · Jul 25, 2012

RudeCurve said:
I have no regrets with my two new FX8120 and FX4100 builds. They were relatively cheap too. Didn't see a need to go with Intel.

i have the fx8150 , i should have gone with sandy bridge . The bulldozer is fast sure , but its using a ton more power than whats really needed.

HOpefully amd gets its act together otherwise i'm going to hope over to intel

Exophase · Jul 25, 2012

Frank said:
I just bought new innards for my main pc, with an AMD octo-core FX-8120. Ok, I know it's only roughly a 6-core and the upper Intel i5s and i7s are faster for applications that use ~2 cores, out of the box.

However, it is faster than any Intel less than 150% as expensive. And it suffers mostly from bad scheduling by the Windows task scheduler. Because it requires just about the opposite approach of an Intel. Let me explain.

If you run applications that use multiple cores, you'll see that the first, third, and (when available) fifth and seventh core are all pushed to the max, while the even cores are mostly idle. Because the Windows scheduler expects those cores to be virtual, hyperthreaded ones. And it makes more sense to use a "full" core than a "virtual" one, to balance the load.

Further, it tries to keep processes on that same core as much as possible, to prevent cache misses and rescheduling demands.

Then again, if you use an AMD CPU, the opposite makes more sense: try and schedule multiple threads of the same processes on the same core pair. That increases the overall speed and prevents stalling.

The octo-cores do have a full set of 8 integer pipelines, only the floating point and special units (which are faster than the Intel ones) are shared. That means, that if a thread is waiting on the result of another one, it will stall if it isn't running on the same core-pair while using the special units. Which you can easily see in the Windows task manager: while the Intel cores tend to have an average load that fluctuates mildly, the AMD load tends to consist of spikes.

Or, in other words: they stall all the time.

So, the problem isn't so much bad/slow AMD processors, as it is bad scheduling in Windows that greatly favors Intel CPU's.

And as soon as software can really make use of 8 cores or if you like virtualizing, those octo-cores will outperform the more than twice as expensive Intel ones.

That's not how things work...

The advantage you get from keeping inter-thread communication on two cores of the same module is small and I doubt there are a lot of applications with such tight communications requirements where you'd even start to measure such a benefit. Instead running two threads on the same module penalizes you because both cores share the same frontend (including L1 instruction cache, fetch, and decode) as well as L2 cache, not just the FPU. Several reviews demonstrated this property.

AMD did present some slides that suggested that running two threads on the same module instead of separate modules could improve performance, but not for the reasons you gave. The idea is that the single module uses less power if the other module is turned off. This raises the available TDP, which allows the remaining module to turbo to a higher clock speed. Unfortunately, the turbo headroom for Bulldozer isn't that high, so in practice this didn't really result in enough of an improvement to offset the sharing overhead, let alone overtake it. Hence why AMD's scheduling changes are doing the opposite of what you're saying.

One thing to keep in mind is that in situations where all cores are utilized, like those that benchmark the best for Bulldozer, the scheduling changes don't do anything. In reality, the impact from the scheduling changes are on average small and minor compared to the shortcomings of the uarch. Piledriver does some work to improve things, but I'm skeptical of the 15+% better IPC number given by THG, since so few tests were done. There really needs to be reviews, which I'm alarmed haven't surfaced given that you could buy OEM desktop Trinity systems for a while..

sebbbi · Jul 25, 2012

Frank said:
Then again, if you use an AMD CPU, the opposite makes more sense: try and schedule multiple threads of the same processes on the same core pair. That increases the overall speed and prevents stalling.

It's easy to overestimate the advantages of sharing a data cache between two cores. First of all, data often gets flushed out of the caches very fast (less than a millisecond). So two cores must be using the same data (almost) exactly at the same time to gain the benefits. The first thing you get teached at school about multithreaded programming is: protect your data so that multiple threads do not access it simultaneously. This is of course only true for write & modify cases.

If one thread is modifing the data that the other reads, there's a lot of potential problems present. You have no control over the ordering (read might return old data), or data might be partially updated. So you need locks. Writing a multihreaded program that does lots of simultaneous updates & reads to same locked data objects is just really bad design. It kills the performance completely (lots of stalls). So this kind of data accesses are highly avoided by programmers. So this doesn't happen a lot in real software.

Atomic (read & modify) operations are good for some algorithms, and cost less than locks, but purposefully doing atomic operations to same cache lines from multiple threads is really asking for trouble. You will usually get huge penalties from this (coherency stalls), so any competent programmer is going to avoid this. This use scenario would actually be improved a lot by grouping two threads to same module, but as it practially never happens in real software, it doesn't matter much.

Cache sharing improves performance when both threads are only doing reads (no modify at all) to the exactly same data at almost the same time. The question here becomes: Why are two threads reading exactly the same data at the same time? If it is a random occurance of two objects referring a same object then it's fine, but in order for it to affect cache performance it must be a really common pattern. And a common pattern of multiple threads reading and processing the same data could likely be a sign of inefficient code (why you need to repeatedly read the data so many times?). Of course there are good cases where same data is read by multiple threads at same time, but it's not a really common case. Not likely something that affects cache performance a lot.

---

Intel's hyperthreading (and IBM's SMT) is better for hiding memory latency (cache misses) than AMDs module based architecture. With HT, when a thread hits a cache miss, the CPU immeditely starts executing instructions from the other thread instead. All the execution units are fully utilized by the other thread during the cache miss (cache miss can take up to 200 cycles). With AMDs architecture, a cache miss freezes the core until it gets the data to continue. There's nothing you can run on the core until the data is ready. Of course AMD has twice as many cores, but these cores are not as high performance as Intel's big fat cores. So if one of these two cores stalls for 200 cycles, only the other keeps running. On Intel's architecture the big fat core is crunching numbers all the time (unless of course both threads hit at stall at the same time). And it can crunch much more per cycle than a simpler smaller AMD core.

So basically AMDs cores are stalling more because of memory latencies. Intel's core can get instructions from two threads and thus memory latency can be often hidden. AMD has higher peak performance (in multithreaded loads), but it drops down more because of cache misses (and other stall cases). AMDs relatively weak caches make this situation even worse. Intel has excellent low latency caches with high associativity.

almighty · Jul 25, 2012

Bulldozer is 20-25% slower then Phenom 2 per clock and Phenom 2 is a good 40%+ slower then Sandy Bridge.

AMD bumping up IPC by 10-15% with each revision is not enough, it'll take them 2 revisions before they can truly offer a chip that offers better performance then Phenom 2 in every way.

Meanwhile Intel is stream rolling there chips out, Ivy Bridge is a good 8-10% faster per clock the Snady Bridge.

I think people need to wake up and see how far behind AMD really are.

AlphaWolf · Jul 25, 2012

almighty said:
Meanwhile Intel is stream rolling there chips out, Ivy Bridge is a good 8-10% faster per clock the Snady Bridge.

Er... at what? Cuz I'm not seeing anything close to that.

entity279 · Jul 25, 2012

i guess it's best you ignore that post

almighty · Jul 25, 2012

AlphaWolf said:
Er... at what? Cuz I'm not seeing anything close to that.

That would be on average, game performance is much improved in CPU limited games.

http://hardocp.com/article/2012/04/23/intel_ivy_bridge_processor_ipc_overclocking_review/1

AlphaWolf · Jul 26, 2012

almighty said:
That would be on average, game performance is much improved in CPU limited games.

http://hardocp.com/article/2012/04/23/intel_ivy_bridge_processor_ipc_overclocking_review/1

You've got two gaming benchmarks with 6% or over. And a pile of tests showing virtually no difference. I really don't know how that comes out to 8-10% improvement.

Albuquerque · Jul 26, 2012

Yeah, I'm not sure that Ivy Bridge can claim a whole lot of improvement over Sandy Bridge in pure CPU throughput, unless you want to talk about the enhancements to QuickSync or the iGPU. There are a few percentage points difference to be sure, but nothing that stands out at 10% across the board.

Most of the gains in IVB were power-consumption related; Intel cut full-load power consumption by something like ~15% while keeping the same performance or slightly better. Not a lot to complain about on that front...

almighty · Jul 26, 2012

AlphaWolf said:
You've got two gaming benchmarks with 6% or over. And a pile of tests showing virtually no difference. I really don't know how that comes out to 8-10% improvement.

And that was one review with only a handful of programs.... I really CBA to go search Google on your behalf.

Grall · Jul 26, 2012

It was your original claim, so really, onus is on you to back it up with sufficient evidence...

In any case, IVB was more about finfets and power savings than performance increases I believe. On the CPU side anyhow - the IGP was much improved of course and from what I understand show a lot more than 10% performance increase...

Haswell is supposed to bring further CPU performance increases is it not? Maybe we'll see some leaked details soonish. OTOH, it would seem Intel isn't working very hard on further improving CPU performance right now, perhaps due to greatly diminishing returns, and/or poor competition.

Most people don't even need a sandy bridge CPU for what they do, which probably is why tablets are starting to eat the PC's lunch.

Blazkowicz · Jul 26, 2012

RudeCurve said:
I have no regrets with my two new FX8120 and FX4100 builds. They were relatively cheap too. Didn't see a need to go with Intel.

they are the most interesting FX offers and the incrementally better Vishera refreshes will be fine good enough CPUs.
I can understand that the 8150 feels less great, this one is the power hog. the whole ranges gets deadly power hungry though if you decide to up the clock.

Frank · Jul 26, 2012

Ok, thanks for the explanations. I stand corrected.

But I still think I got my money's worth out of it. And it's not as if there's anything I do (except compiling very large solutions from an SSD with the -stupidly- only single-threaded and only 32-bit Visual Studio 2010) that actually pushes them (or only a single one, with VS2010) up to 100% usage.

Blazkowicz · Jul 27, 2012

on a side note I've thought again about a particular AMD CPU : it will be a low end Trinity with a single module enabled, but a highish clocked version with unlocked multiplier.
would be a fun CPU, I'll probably be interested in it. I like stupidly fast low end - Intel is currently better than AMD at it, actually, which is why AMD would have an incentive to have a "black edition" dual core.

RudeCurve · Jul 27, 2012

Blazkowicz said:
they are the most interesting FX offers and the incrementally better Vishera refreshes will be fine good enough CPUs.
I can understand that the 8150 feels less great, this one is the power hog. the whole ranges gets deadly power hungry though if you decide to up the clock.

I OC'd the FX4100 to 4.3GHz from my previous conservative 4GHz OC...stock voltage/stock HSF...it's still very stable and doesn't run hot at all...couldn't any happier.:smile:

Kaotik · Jul 31, 2012

swaaye said:
Intel surely could build up their capacity if they saw the need.

Surely they could. However that still wouldn't protect from the inevitable monopoly lawsuits.

swaaye · Aug 1, 2012

Yup. I wouldn't be surprised if they like having AMD around because it keeps the DOJ away. I also think Intel has interesting strategic opportunities by having a budget player that they can control rather easily in a variety of ways (mainly due to AMD's uncompetitive CPU technology).

CouldntResist · Aug 2, 2012

Kaotik said:
However that still wouldn't protect from the inevitable monopoly lawsuits.

Lawsuits, which tend to end in wristslap for the convict. Check the track record of lawsuits against US companies operating worldwide monopolies...

If Intel indeed fears the inevitable little wristslap, then it only emphasises how shitty the situation on the CPU market is. Like Intel has too little to gain from eating the breadcrumbs currently left to AMD, to outweight the following discomfort from the wristslap.

Time for AMD to pull out the desktop market?

3dilettante

eastmen

Exophase

sebbbi

almighty

AlphaWolf

Specious Misanthrope

entity279

almighty

AlphaWolf

Specious Misanthrope

Albuquerque

Red-headed step child

almighty

Grall

Invisible Member

Blazkowicz

Frank

Certified not a majority

Blazkowicz

RudeCurve

Kaotik

Drunk Member

swaaye

Entirely Suboptimal

CouldntResist

Similar threads