Intel Broadwell for desktops

Cilk/Cilk++/Threading Building Blocks/Grand Central Dispatch... solved :p

I still want more cores, even though it's true I doubt I'm going to need them at first :(
 
I just do not see how asking tens of thousands of companies to modify their software in hope of Intel agreeing to bless everyone with a higher-lowest common denominator is more reasonable than Intel doing it first and then allowing the software companies to catch up.

I guess it's my fault for expecting Intel to be the leader instead of the follower.
 
I just do not see how asking tens of thousands of companies to modify their software in hope of Intel agreeing to bless everyone with a higher-lowest common denominator is more reasonable than Intel doing it first and then allowing the software companies to catch up.

I guess it's my fault for expecting Intel to be the leader instead of the follower.

For the last couple of decades we've seen remarkable and relentless improvement in single-threaded performance of x86 cores. This has been driven by extremely impressive advances in the sophistication of the core architectures, and in the manufacturing processes. Like them or not, Intel has been the leader here (on average). Accusing them of being a follower is disingenuous, verging on clueless.

Why have these improvements in the single-threaded performance been necessary? Why build Pentium Pro when they could have thrown half-a-dozen 486 cores on a chip? Because the software people claimed it was too hard to use multiple cores. Twenty years later they're saying the same thing, but look at the scale of increase in complexity that Intel have managed in the core in the same time-frame. Today Intel are parallelising your single-threaded code for you, in real-time, in hardware. Just so you don't have to think about it.

Why is it up to Intel to lead yet again? They've delivered multi-core CPUs - dual-thread cores are trivially cheap, quad-thread cores not much more. But still the software people are moaning that Intel isn't doing enough. Because obviously having 16-threads or more makes life easier for the software guys, than 2 or 4. Or something. Still the software people say it's too hard.

Which brings me back to the 4/8/16-core 80486. Why didn't we go down that path? Why did Intel have to go the route it did improving single-threaded performance, whilst the software people sat on their thumbs waiting for Intel to speed their code up for them? Why didn't the software guys get off their behinds and solve the hard problems, in the same way the Intel hardware guys did?
 
For the last couple of decades we've seen remarkable and relentless improvement in single-threaded performance of x86 cores. This has been driven by extremely impressive advances in the sophistication of the core architectures, and in the manufacturing processes. Like them or not, Intel has been the leader here (on average). Accusing them of being a follower is disingenuous, verging on clueless.

Why have these improvements in the single-threaded performance been necessary? Why build Pentium Pro when they could have thrown half-a-dozen 486 cores on a chip? Because the software people claimed it was too hard to use multiple cores. Twenty years later they're saying the same thing, but look at the scale of increase in complexity that Intel have managed in the core in the same time-frame. Today Intel are parallelising your single-threaded code for you, in real-time, in hardware. Just so you don't have to think about it.

Yeah, die space was still at a premium then, and there were better die space to performance ways to use transistor budgets than stamping out multiple cores at that time. Materials were also relatively far from their limits; it wasn't unrealistic to expect clock speeds to double in 18 months along with transistor density, hence the Pentium 4.

However, I think Intel wouldn't have done much without a swift kick in the ass from RISC based competitors and Cyrix / AMD in the early 90's and continued pressure through the 00's. Intel didn't give a damn about developers, you took what they gave you if you wanted to use x86; barring process limitations and other considerations, had they decided to release a 4 core 386, instead of a 486, people would have to had swallowed. They took this attitude with Itanium but it was a different era by then.

Before then, they were patting themselves on the back for releasing the i486 at all, a modest improvement about 4 years after the introduction of the i386 in the absence of serious competition (I remember someone in the industry said he attended a talk where that was essentially the message). They also aggressively tried to stifle competing innovation with their rebates program and litigation against companies that had reverse engineered their instruction set, which in the end cost them a piddling 1.25 billion dollar fine. This was a very successful program for Intel, but terrible for the world of computing.
 
Before then, they were patting themselves on the back for releasing the i486 at all, a modest improvement about 4 years after the introduction of the i386 in the absence of serious competition (I remember someone in the industry said he attended a talk where that was essentially the message). They also aggressively tried to stifle competing innovation with their rebates program and litigation against companies that had reverse engineered their instruction set, which in the end cost them a piddling 1.25 billion dollar fine. This was a very successful program for Intel, but terrible for the world of computing.

And what a beast it was! :LOL:
 
It's not really a chicken and egg problem per se. Software with sufficient expressed parallelism will not pay any penalty running on fewer cores. People just have to stop doing parallelism by "moving X into another thread" and stopping when they get adequate use of 2-4 cores...

Things become more complex when the more parallel software has a higher aggregate runtime, for example because of higher memory footprint putting more pressure on caches or doing some operations redundantly between threads. It really can become a case of making tradeoffs.
 
Cilk/Cilk++/Threading Building Blocks/Grand Central Dispatch... solved :p(

No. Nothing in your list solves anything but the very easiest problems in parallel programming. Thinking that they solve anything that wasn't solved 30 years ago demonstrates that you understand absolutely nothing about why MT is hard.

MT is not hard because a lot of boilerplate is needed, which is basically all any of those libraries/languages solve.

MT is hard because parallel algorithms in a system with unpredictable operation latencies are necessarily distributed algorithms, and distributed algorithms are fundamentally harder to reason about than serial ones. There are some technologies that help with this -- notably, functional programming and methods and transactional memory, but they are not silver bullets that suddenly make parallelization well understood.

I have written significant amount of parallel code. My best estimate is that using the best available methods and technologies, writing a non-trivial application to be well parallelizable is at least a full order of magnitude more expensive than writing it in a serial way. Whenever you curse an application and want it to be parallel, ask yourself, would you pay ten times what you paid for it if it was well parallelized? And every single person who says no to it and gets the serial version from a competitor instead increases the price.

The exception to this are the class of "embarassingly parallel" applications, where work done by the application can be easily split into chunks and code working on a chunk doesn't need to communicate with code working on other chunks. Example of this would be a media encoder. These kinds of applications should be parallelized, and largely already are.
 
I just do not see how asking tens of thousands of companies to modify their software in hope of Intel agreeing to bless everyone with a higher-lowest common denominator is more reasonable than Intel doing it first and then allowing the software companies to catch up.
Intel and AMD have already done it - even quad core processors are barely utilized! If IHVs were "following" software you'd still just have dual cores. And 6/8 cores isn't really too ridiculous to get either... but most consumers rightly don't bother because it doesn't make anything they do faster!

Developers already have lots of Xeon cores to play with. I think you can probably figure out that if they're not even making good use of 4 cores, they're not doing much with 6 or 8. They are well aware that they need to do a much better job on parallelism. The lack of success is both due to the difficulty of the problem and the poor way that people have gone about multithreading to date.

I don't think I've spoken to a single developer who has said they have chosen not to do more parallelism because their customers don't have hardware that would see an advantage from it. The few that aren't working on it simply find current processors "fast enough" (especially related to consoles), so I guess in a roundabout way you could blame Intel and AMD for doing too good a job on IPC...

Things become more complex when the more parallel software has a higher aggregate runtime, for example because of higher memory footprint putting more pressure on caches or doing some operations redundantly between threads. It really can become a case of making tradeoffs.
I'm well aware - realize that I'm a software developer, not a hardware guy :) Making things parallel is what I do for a good chunk of my job today, and even more in the past.

That said, the fact that it's hard (fundamentally - i.e. there are tradeoffs like you say) is unrelated to complaints against IHVs. IHVs have to ship hardware that runs existing and near-future software well. In fact if it turns out to be untenable for certain algorithms to run efficiently in parallel, that's all the more reason to still have high IPC, high frequency CPUs vs. ones with more cores.

No. Nothing in your list solves anything but the very easiest problems in parallel programming. Thinking that they solve anything that wasn't solved 30 years ago demonstrates that you understand absolutely nothing about why MT is hard.
Dude... did you miss his emoticon? I'm hope you did because if you really think that Ing doesn't know what he's talking about then I'm sort of uninterested in whatever else you have to say :p I'll assume a miscommunication here.
 
Last edited by a moderator:
No. Nothing in your list solves anything but the very easiest problems in parallel programming. Thinking that they solve anything that wasn't solved 30 years ago demonstrates that you understand absolutely nothing about why MT is hard....

Sorry but you missed my smiley.
Those tools are a big help for now, but don't solve it at all as you said.
There's a lot of work to be done on parallel algorithms, but I find those abstractions to be already much better than doing multi-threading by hand, they obviously are a first (good) step forward.

When it comes to parallel programming I think we need a language like Chapel (chapel.cray.com) which maps rather well to many parallel/concurrent programming concepts.
I really think the first step is to agree that threads are not exactly the abstraction we are after before moving on, and boilerplate is much needed anyway, the least code you type, the less chance to write a bug...
(And yes Erlang/Haskell and other functional languages also have the innate ability to run in parallel, I just have a hard time leaving so much performance on the table at the moment.)

I'm as excited as you about the TSX instruction set, and can't wait to see how well it will work in practice.

I'm also doing multi threading, I wrote my own Cilk like scheduler executing depth first and stealing breadth first, I also cut my work into nice Tasks which are neither too small (to generate too much management overhead) neither too big (to have starting threads), and am always looking for better tools/solutions to help me move forward.
ATM I'm only investigating the Chapel programming language and the TSX instruction set, and of course checking new algorithms used on GPU. (Also started learning Haskell and read about Erlang, but the performance cost is a bit hard to swallow for a game dev.)

If you have any books, languages, libs or algorithms to recommend/share, I'm all ears...


(Do I need to talk about false sharing through hitting the same cache line, atomics, LL/SC, CAS... before you take me seriously ? But man, just don't when I put a smiley in ^^)
 
And what a beast it was! :LOL:

Hah, nice. This article seems to indicate that the Ivy Bridge's power control unit is about as powerful as a 486 from http://forwardthinking.pcmag.com/pc...idge-first-processor-with-tri-gate-transistor

It also includes a power control unit on board to control things like Intel's Turbo Boost that is itself equivalent to a 486 processor.

I guess it makes sense since the granularity of power shifts is far lower than the clockspeed of the IvyBridge. You wouldn't need more than a few million polls per second.

Here's a relevant quote from Gelsinger about Intel's situation then from http://blog.smartbear.com/community/gelsinger-and-meyer-two-cpu-designers-who-changed-the-world/

There is an interesting dilemma and an opportunity. I worked with Intel for many years. We had an opportunity to bring out 486 series of chips — 386 was one of my children. The 386 series was working great, it was highly manufacturable and very cost-effective.

Then, we introduced the 486 — it was expensive and it was hard to manufacture. So what did we do? We got rid of the 386 as fast as we could and moved to 486. It was a lousy business decision, until a year or two later.

We had to eat our children — may be that is a bit too graphic — but if we didn't do it, there was the risk of somebody else doing it. That's how I see the dilemma that the IT services companies are facing.

The cloud is a radically more efficient model for delivering services and applications. They may see the revenue opportunity declining in the short term when they make the transition, as Intel did when they moved from 386 to 486.

Profit margins will be lousy, until you get to the other side. The transition will be painful. If they don't evolve and transition, they will increasingly become a boat anchor for the customer.

Great foresight on his part but it's clear that Intel made this decision largely in the absence of direct competition from other contemporary companies in those days. It's not too different from Apple's story in the last decade or so.
 
Here's a relevant quote from Gelsinger about Intel's situation then from http://blog.smartbear.com/community/gelsinger-and-meyer-two-cpu-designers-who-changed-the-world/

Great foresight on his part but it's clear that Intel made this decision largely in the absence of direct competition from other contemporary companies in those days. It's not too different from Apple's story in the last decade or so.

I don't understand how you get to that conclusion. Gelsinger is saying they pushed the 486 even though they made less money doing so. If not because of competition, then what?

Up until the 486, Motorola's 68K series was competitive. You also had a rich set of RISC competitors, MIPS, SPARC, PA-RISC, Power and Motorola's 88K.

The 486 was a *huge* improvement on the 386 with more than twice the IPC and integrated FPU. It marked the beginning of Intel entering the workstation market.

Cheers
 
I don't understand how you get to that conclusion. Gelsinger is saying they pushed the 486 even though they made less money doing so. If not because of competition, then what?

Up until the 486, Motorola's 68K series was competitive. You also had a rich set of RISC competitors, MIPS, SPARC, PA-RISC, Power and Motorola's 88K.

The 486 was a *huge* improvement on the 386 with more than twice the IPC and integrated FPU. It marked the beginning of Intel entering the workstation market.

Cheers
Those other CPUs you mentioned were sold in higher priced and lower volume computers compared to the ubiquitous x86 PC. It was only when AMD and Cyrix really emerged w/ their 486 clones that Intel was forced to dramatically drop their prices and started fighting for their lives. (RISC CPUs may have less of a direct economic impact on Intel's bottom line at that time, but they were showing much better integer performance than Intel parts before the coup that was the Pentium Pro. I think the switch to internal uops by AMD and Intel clearly shows the impact of certain RISC philosophies on their internal designs.)

The lazy roll out of the 486 at a large premium to introductory 386 prices is a far cry from the premature release and the subsequent recall of the 1.13 ghz P3 or the ham fisted bid for the lower end with the L2 cache-less Celeron in the late 90's. It's the difference between congratulating yourself for doing so when you didn't have to versus fighting to survive; I agree the reason is ultimately competition but the pace of improvement when there's no credible threat versus having nimble competitors nipping at your heels is quite different. Intel also resorted to dirty anti-competitive tactics when there was a credible threat to its domination. I praise them for their substantial innovation, but at the same time, I think the world would have been better without their subterfuge and with more legitimate competition.

EDIT: http://processortimeline.info/proc1980.htm

indicates that the 386 intro'd at $299 wheas the 486 intro'd at $900. This is a rather huge premium even when taking inflation into account; compare this to how new models are introducing at about the same prices as prior ones now-a-days. Prices dropped when clones went to market:

http://www.nytimes.com/1993/12/21/business/company-news-intel-battling-rivals-cuts-its-prices.html

Intel said the new 1,000-piece price for the 66 MHz Pentium processor would be $750, while the 60-MHz would be priced at $675 each, down 14 percent from the current prices. It said it would also cut the price on the 66-MHz Intel 486 DX2 processor to $360 each in 1,000-piece quantities, down 18 percent from current prices. And it plans to lower the prices of other 486 processors.

...

Although the 486 price cuts will most directly affect Advanced Micro Devices Inc., which sells 486 clones, Ben Anixter, the company's vice president for external affairs, also said the cuts were anticipated. "We're shipping 486 DX2 chips now, and our prices are essentially their prices," he said. "This does exemplify what competition will do."

These quoted prices adjusted for inflation would be about double today. That we can get a high end 4770k for ~$300 is a testament to what legitimate competition can bring us.
 
Last edited by a moderator:
EDIT: http://processortimeline.info/proc1980.htm

indicates that the 386 intro'd at $299 wheas the 486 intro'd at $900. This is a rather huge premium even when taking inflation into account; compare this to how new models are introducing at about the same prices as prior ones now-a-days. Prices dropped when clones went to market:

You're comparing apples to organges.

The 386 wasn't faster than the fastests 286 running 16 bit code when it launched. The 486 was an instant doubling of performance. You also need to factor in the 387 co-processor to get a fair comparison, that doubles the cost of the 386 system.

If you look at competitors pricing (as per your link), the 33 MHz 68030, used in Apple Macs, was nearly $700 and very much inferior to the 486.

The price is a result of supply and demand, so yes, the high price is in part the result of lack of competition, but it is also a result of supply constraints; The die size of the 486 was initially more than three times the size of the 386. It wasn't until the 486DX (a shrink and a redesign) that the 486 hit mainstream prices and volumes.

The story repeated itself with the Pentium, launched initially in 0.8um BicMOS: Big, hot and hard to manufacture. It repeated itself again with the PPRO, which didn't sell in large quantities until the P-II was released.

Cheers
 
You're comparing apples to organges.

The 386 wasn't faster than the fastests 286 running 16 bit code when it launched. The 486 was an instant doubling of performance. You also need to factor in the 387 co-processor to get a fair comparison, that doubles the cost of the 386 system.

If you look at competitors pricing (as per your link), the 33 MHz 68030, used in Apple Macs, was nearly $700 and very much inferior to the 486.

The price is a result of supply and demand, so yes, the high price is in part the result of lack of competition, but it is also a result of supply constraints; The die size of the 486 was initially more than three times the size of the 386. It wasn't until the 486DX (a shrink and a redesign) that the 486 hit mainstream prices and volumes.

The story repeated itself with the Pentium, launched initially in 0.8um BicMOS: Big, hot and hard to manufacture. It repeated itself again with the PPRO, which didn't sell in large quantities until the P-II was released.

Cheers

Your point about the 68030 shows that there just wasn't much pressure on Intel running up to the time when the i486 was released outside the x86 space; there wasn't much within either. The 90's and early 00's was a real competitive environment, and Intel was being a bit smug and patting themselves on the back before then. Intel should have stuck to plowing cash into engineers instead of lawyers and illegal back-room marketing.

A four year roll out for the i486 is quite long when you consider that in the subsequent 6, Intel pushed out both the Pentium and PPro lines and dropped introductory prices relative to the i486 as well when inflation was taken into account. The impressive doubling of performance indicates that there were quite a few low hanging fruits at the time, high impact choices like integrating on die 8K of L1 Cache and the FPU as well as pipelining. Its absolute performance gain on existing code was great but from an internal design perspective, more effort was put into the Pentium and the vastly more impactful PPro whose dynasty ran all the way up to Nehalem. I think Intel only made their best designs when they started to feel the heat of competition.

There was another span of about 5 years from Conroe to SandyBridge when Intel's breath-taking performance gains, design improvements, and price drops left the increment of the 80386 to i486 in the dust. (Granted absolute performance improvement percentage wise wasn't as dramatic but we were past the "knee of the curve" after Banias and the effort Intel put into those designs was incredible.) Before then, there was also quite a bit of hubris on Intel's part in assuming a guaranteed industry transition to Itanium and plowing billions into the design. They could afford continuing to plow the entire R&D budget of smaller companies into it well after it was clear that it wasn't going anywhere. The move to x86-64, and the "tick-tock" philosophy came directly from changes made after the lashing that the P4 and Itanium got from the competition.
 
Last edited by a moderator:
When do you guys think development started on Conroe? It's interesting to think about internal decisions that were happening during the years they were relying on Netburst. Banias, Dothan and Yonah didn't seem capable of really replacing the top P4 options even if they were stunningly great notebook chips.
 
The development cycle is usually 4 years for a new architecture. Work on Nehalem and Atom began in 2004, but Conroe might have been fast-tracked after the early Tejas cancellation. Given the tradition of Intel to keep a backup architecture in parallel, this could be perfectly in order for them.
 
Sounds about right. My cynical take is that the situation came to a head when their Japanese offices were raided in '04:

http://features.techworld.com/sme/3208919/a-history-of-intels-antitrust-woes/

At that time, Itanium was also going nowhere fast and drastically missing forecasts:

http://upload.wikimedia.org/wikiped...it.png/800px-Itanium_Sales_Forecasts_edit.png

AMD's Hammer was gaining lucrative server market, and it wasn't a given that AMD was going to shoot itself in the foot this time. Netburst was going through its "Press-hot" revision and the only bright spot was the Pentium-M line.

They realized that had to actually outmaneuver the competition with a focused effort in superior x86 engineering instead of relying on their back room stop-measures with dealers and dead-end architectures, x86 or otherwise. Hence, tick-tock and those amazing years of performance boosts.
 
Last edited by a moderator:
You can't blame Intel for the lack of competition in the 80's. Or perceived lack of competition. No business would have done anything differently than they did. You only need to exceed your competition. Doing more than that is potentially wasteful for no added return.

In other words, why spend money to potentially make less money. Hence, the 486 was expensive to justify its existence. Not because Intel were smug. But because there was absolutely zero reason for it to be cheaper than it was. And it's not like consumer apps were pushing the 386 at the time. Much less needing the 486 (with its included math co-processor).

It can be argued that for consumer and most business apps, you didn't need anything faster than a 286/386 until Windows started to seriously supplant DOS starting with 3.11 and really accelerating with Win95.

I really don't see why Intel gets dinged for acting like a responsible business. Yes their anti-competitive practices later on weren't good, but you certainly can't blame the evolution of the 486 on that.

And I'm not even a particular fan of Intel. I avoided their CPUs like the plague until AMD could no longer offer meaningful competition in the high end.

Regards,
SB
 
What you say is fair, and I admire Intel's engineering prowess and history. I'm just not especially impressed with the self-congratulatory tone that they take about their voluntary self-cannibalization in the absence of competition when the i486 came around; they've gone on from there to do so many underhanded things that had nothing to do w/ good engineering and improving the state of the art. Sure you have to watch the bottom line, but it's sad to know that Intel wasted so much money outside engineering, actively impeding the industry's progress when it could have fruitfully spend those dollars improving the thing that it invented and have won that way. These "tick-tock" years are proof of that.

The engineering-dominance centric Intel of the early 90's and recent times is much more impressive than the one of the 80's or the late 90's when they were a monolithic, abusive monopoly of the PC CPU markets. They resorted to litigation (ultimately deemed meritless) to keep out competition, they weren't cooperative with the requests of smaller developers, and the way they reacted to competition was deemed nothing short of illegal and bad for consumers. (Ironically, the doldrums of the late 90's was presided over by an engineer CEO and the recent resurgence under a business school CEO.)

I think a good company realizes that business isn't purely a zero-sum or a negative-sum game as Intel tried to make it, and that as a business, they're a part of a more meaningful system than one that just maximizes its projection onto one-dimensional profit each quarter through any means possible. A company should realize that competition can be a good thing to everyone; I'd be much more proud of an exceptional product like the Pentium Pro forged in the fire of competition versus milking the profits from second rate products like the Pentium 4 after having hamstrung the competition. (The Pentium Pro's fantastic micro-architecture went on to make Intel much much more money in the long run than their litigation and deals.) I can understand Intel probing the legality of AMD's reverse engineering, but it's another thing to strike unspoken deals with dealers that substantially undermines a competitive environment.

P.S. There was a great "American Experience" program on PBS about Intel, definitely worth a look if you guys haven't seen it:

http://www.pbs.org/wgbh/americanexperience/films/silicon/player/

I also definitely have more Intel CPUs in my past than I do AMD, my favorite being the Pentium-M.
 
Last edited by a moderator:
Back
Top