Intel Nehalem on track

Per B

Newcomer
It seems Intel will show working versions of Nehalem at IDF which starts today. Add to that the signs that Intel seems to have good control over the 45 nm process as Penryn runs on it at high clocks with OC headroom. So Nehalem in 2H2008 should be in reach, but what can we expect from it? Will it be as when Core 2 was released?

Per
 
Last edited by a moderator:
I'm curious to see, as I really don't know what to expect. Hopefully we get some good tidbits of info during the demonstrations today...
 
It would be incredible if they could make Nehalem as good an improvement over the Core micro-architecture as the Core micro-architecture was over NetBurst.
 
It would be incredible if they could make Nehalem as good an improvement over the Core micro-architecture as the Core micro-architecture was over NetBurst.

While I'm fully in agreement, I also wouldn't be holding my breath ;)
 
Heh. does anyone think AMD is a little bit outclassed by this "new" Intel? Obviously the folks at Intel were anything but feeling miserable as they were refining Pentium 4. All of what has been released in the past year and is being touted now was in the works at the same time during those years.
 
I find it hard to feel sorry for AMD, when they HAD to know once the Pentium-M dropped. Sure, it didn't have a lot of floating point power at the time and was "only" single core and "only" 64 bit, but it was VERY competitive with then-top-end processors while being smaller, cooler, far more power efficient and requiring far less clock speed.

AMD's last big contribution to the general CPU space was 64-bit to the desktop. And it was good, and they deserve credit for that, but that's the very last innovative thing I can recall them doing. They've been "milking" it ever since, and now their lack of focus on true technological improvement is biting them in the ass BIGTIME.
 
It wasn't for lack of effort on the part of AMD.

AMD had two redesigns after K8.
They failed to come to fruition.

I can name at least two Intel designs that got canned in the same time frame:
Tejas and the original Tukwilla.

There are more, doubtless.

The difference is that Intel had the engineering and cash resources to fail and still move forward on other designs and fabs.
 
It wasn't for lack of effort on the part of AMD.

AMD had two redesigns after K8.
They failed to come to fruition.

I can name at least two Intel designs that got canned in the same time frame:
Tejas and the original Tukwilla.

There are more, doubtless.

The difference is that Intel had the engineering and cash resources to fail and still move forward on other designs and fabs.

I don't agree; AMD had nothing come to the table after K8 until now except for incremental speed differences and cache changes. Intel had a few similar offerings after the Willimatte P4, such as Northwood and the first iteration of Prescott. However, Pentium-D, Pentium-M, Core and Core 2 are actually more than a simple speed bump and cache / process change...

Sure, more money helps. But even now AMD doesn't have anything honestly "bigger and better" than their X2 line -- just process changes, cache increases and core speed bumps. Oh, and now a 3rd core instead of just two...

I'm sure there could've been something going on other than just the incremental stuff...
 
There were at least two from scratch designs after K8.
One Netburst-type core and one very wide design.

Both were cancelled.

There were at least 2 versions of K8, going by the patents that came out around the same time, and one was a much greater leap than what eventually became Opteron.

That means that every code name after K7 was an attempt or multiple attempts to revamp or replace K7, all the way up to K10.

AMD failed at every non-incremental update to the K7 core, but not for a lack of effort.
 
Frankly, Intel seems pretty unstoppable at the moment.

Looking at the Anandtech preview here below, Penryn seems to be wiping the floor with Barcelona both in overall performance and performance/watt.

http://www.anandtech.com/IT/showdoc.aspx?i=3099

And Nehalem looks like its going to make Penryn look like a 286!

In its biggest configurations its going to sport 8 cores on a single die, an intergrated memory controller, 2 threads per core (16 logical threads in total!!), a new point to point interface and well over a billion transistors on 45nm!

What has AMD got on its roadmap that could come close to competing with that? Fusions certainly no answer as Nehalem will also feature onboard graphics and potentially other processors in its smaller (4 core and below) configurations.

Then there's Larrenbee and Sandy Bridge also peeking over the horizon.

I don't want Intel to dominate, its very bad for the industry but its hard not to admire them right now.

Interestingly the nxt gen console might be due out around the same timeframe as Sandy Bridge. Anyone think they might include a version of it or Larrenbee or stick with the custom designs? And can IBM or someone else really compete with Intel in that time frame?
 
I don't want Intel to dominate, its very bad for the industry but its hard not to admire them right now.
Yeah I agree. They must have a lot of extremely smart and creative people working for them.

Amazing how they can be so damn great at both CPU design and IC manufacturing.

Or maybe there are synergy effects between the two?
Peace.
 
Manufacturing prowess leads to larger transistor budgets, improved yields, and better circuit performance.

This in turn allows for larger designs and better economies of scale.

The better economy of scale and the benefits of greater complexity at a given target market at a given price drives improved profits that get plowed into manufacturing and hiring more talent that leads to more design teams and larger transistor budgets that leads to...
 
Frankly, Intel seems pretty unstoppable at the moment.

Looking at the Anandtech preview here below, Penryn seems to be wiping the floor with Barcelona both in overall performance and performance/watt.

http://www.anandtech.com/IT/showdoc.aspx?i=3099

I don't know about beating AMD out on perf/watt, unless you're talking about <4S and/or small memory configurations. FB-DIMMs are a huge hindrance to Intel's power consumption, and thus perf/watt figures. Overall performance seems to be a win for Intel in just about every scenario (excepting huge memory bandwidth bottlenecks).

And Nehalem looks like its going to make Penryn look like a 286!

That's just silly. At least for the average user. 4 threads is more than enough to handle any common workload today outside of servers and HPC. What makes you think quadrupling that amount will have any impact?

In its biggest configurations its going to sport 8 cores on a single die, an intergrated memory controller, 2 threads per core (16 logical threads in total!!), a new point to point interface and well over a billion transistors on 45nm!

What has AMD got on its roadmap that could come close to competing with that? Fusions certainly no answer as Nehalem will also feature onboard graphics and potentially other processors in its smaller (4 core and below) configurations.

Shanghai (revised Barcelona w/larger L3 & minor architectural improvements, is to Barcelona what Penryn is to Conroe) & Montreal (based on Shanghai, dual-die 8-core design). Nothing about SMT AFAIK, will be interesting to see what SMT does for the 4-issue Core architecture.

Then there's Larrenbee and Sandy Bridge also peeking over the horizon.

Larrabee will have less FLOPs than the GPUs it will go up against, most likely. Larrabee is really just a "foot-in-the-door" product. Sandy Bridge (aka Gesher) looks quite nice w/approximately 200GFLOP DP performance (i.e. Cell-level), but it won't be out until late 2009 at the earliest.

I don't want Intel to dominate, its very bad for the industry but its hard not to admire them right now.

They're definitely executing very well ATM, and their roadmap looks very promising.

Interestingly the nxt gen console might be due out around the same timeframe as Sandy Bridge. Anyone think they might include a version of it or Larrenbee or stick with the custom designs? And can IBM or someone else really compete with Intel in that time frame?

:LOL: why not throw an Itanium or two in while you're at it?
 
That's just silly. At least for the average user. 4 threads is more than enough to handle any common workload today outside of servers and HPC. What makes you think quadrupling that amount will have any impact?

Your the one that was saying in another thread how the 6 logical threads on 3 cores of Xenon make it faster than 3 logical threads of Phenom on 3 cores. Now your saying 16 logical threads on 8 cores will have no impact vs 4 logical threads on 4 cores? Which is it?

Of course doubling your cores is going to have an impact and in some workloads, so will adding SMT to those cores. Well programmed code can now and increasingly in the future scale with the number of cores, and the more cores that are available, the more that scaling will become more common.

Besides, Nehalem should be much faster in single threaded code aswell so its a win win compared to Penryn. My analogy was obviously an exageration but there is no doubt that the total CPU power of a full Nehalem will blow Penryn out of the water. Intel themselves are extimating 3x the power meaning even the "average user" gets 50% more power completely discounting multithreading.

Now ask yourself what kind of improvement games using the Havok physics engine will see.....

Shanghai (revised Barcelona w/larger L3 & minor architectural improvements, is to Barcelona what Penryn is to Conroe) & Montreal (based on Shanghai, dual-die 8-core design). Nothing about SMT AFAIK, will be interesting to see what SMT does for the 4-issue Core architecture.

Shanghai will be very lucky to match Penyrn and its not out for months yet while Montreal is no answer to Nehalem given thats its simply 2 quad core dies packaged together. How ironic that Intel will have a native 8 core design around the same timeframe :LOL:

will have less FLOPs than the GPUs it will go up against, most likely. Larrabee is really just a "foot-in-the-door" product. Sandy Bridge (aka Gesher) looks quite nice w/approximately 200GFLOP DP performance (i.e. Cell-level), but it won't be out until late 2009 at the earliest.

I don't expect Larrabee to be a competitor to GPU's, at least not at the high end but it does give Intel a product that can compete with Cell and its derivatives in the scientific/supercomputer space. Thats something AMD haven't even attempted.

:LOL: why not throw an Itanium or two in while you're at it?

I don't see why thats so outlandish. Those CPU's may sound like huge monsters today but by the time the next gen consoles launch they could be well established architectures. They should also be pretty power efficient so seeing Sandy Bridge or even Larrabee in a next gen console of 2011 doesn't sound all that crazy to me.

Especially if its conservative on clock speed and/or core numbers. The biggest hurdle to that would be the IP problem IMO.
 
To me it's actually pretty surprising that AMD managed to either keep up with or surpass Intel in performance for the length of time it did. Looking at the differing sizes of the two companies and especially the production capabilities of Intel, it really is a David vs Goliath situation.

Importantly, Intel had the marketing and financial clout to see them through the difficult Netburst years when they were being totally outperformed. I tend to agree with the assessment that we have AMD to thank for the current price/performance available from both companies.

Unfortunately, I feel that Intel's process advantage will give them an insurmountable lead for the foreseeable future which may ultimately spell the end of AMD.
 
Your the one that was saying in another thread how the 6 logical threads on 3 cores of Xenon make it faster than 3 logical threads of Phenom on 3 cores. Now your saying 16 logical threads on 8 cores will have no impact vs 4 logical threads on 4 cores? Which is it?

First of all, I never said Xenon is faster than a tri-core Phenom. Re-read that thread, especially my latest post. As for 16 threads not being significantly > than 4 threads, in the PC space this is true. Do you know of any meaningful workloads outside of server and HPC realm that scale that well?

Of course doubling your cores is going to have an impact and in some workloads, so will adding SMT to those cores. Well programmed code can now and increasingly in the future scale with the number of cores, and the more cores that are available, the more that scaling will become more common.

I would expect SMT to make more of an impact than the additional cores (to utilize the idle execution resources of each core), and only if implemented well, and even then only in corner cases of extremely thread parallel workloads.

Besides, Nehalem should be much faster in single threaded code aswell so its a win win compared to Penryn. My analogy was obviously an exageration but there is no doubt that the total CPU power of a full Nehalem will blow Penryn out of the water. Intel themselves are extimating 3x the power meaning even the "average user" gets 50% more power completely discounting multithreading.

I figured we were discussing useful performance, but it's clear you're just discussing available execution resources, in which case there's no doubt Nehalem will be superior to Penryn by a wide margin.

Now ask yourself what kind of improvement games using the Havok physics engine will see.....

Physics, while very parallelizable, do not put a heavy strain on any modern performance-class processor, whether it be specialized hardware (PhysX), multi-core GP CPUs, GPUs, or multi-core in-order console processors. Unless Intel is able to get the Havok engine to auto-parallelize physics and scale effects with core count, I doubt we'll see much difference (short of SSEx optimizations and optimizations specifically for Core MPUs).

Shanghai will be very lucky to match Penyrn and its not out for months yet while Montreal is no answer to Nehalem given thats its simply 2 quad core dies packaged together. How ironic that Intel will have a native 8 core design around the same timeframe :LOL:

First of all, where'd you get the crystal ball? You don't know how well Shanghai will compare to Penryn, nor Nehalem.
Secondly, Nehalem is not a native 8-core design, it is also 2x quad-core dies on a package.

I don't expect Larrabee to be a competitor to GPU's, at least not at the high end but it does give Intel a product that can compete with Cell and its derivatives in the scientific/supercomputer space. Thats something AMD haven't even attempted.

AMD hasn't even attempted to create a product for the HPC space? What do you call Opteron? When was the last time you checked the Top 500 supercomputer list? Do you not remember who pioneered stream computing in the first place? I'll give you a hint, it's not Nvidia.


I don't see why thats so outlandish. Those CPU's may sound like huge monsters today but by the time the next gen consoles launch they could be well established architectures. They should also be pretty power efficient so seeing Sandy Bridge or even Larrabee in a next gen console of 2011 doesn't sound all that crazy to me.

Especially if its conservative on clock speed and/or core numbers. The biggest hurdle to that would be the IP problem IMO.

Correction to my earlier post about Sandy Bridge offering "Cell-level" DP FLOP performance. What I meant to say was that Sandy Bridge will bring DP FLOP performance to the level of Cell's SP FLOP performance. That's quite a feat, especially when you take Cell's (by-comparison) meager DP FLOP performance into account.

I don't think anyone outside of MS/Nintendo/Sony know precisely when their next-gen consoles will launch, so it's tough to justify any statement about the maturity of any future PC product's viability as a console part when the availability of said PC part is also just a timeframe and not a hard number. I'm certainly not going to argue against using a Sandy Bridge or Larrabee derivative in a console from a performance standpoint, but I think it's fair to assume such a derivative would have to be rather cut-down in order to meet price/performance targets of a relatively low-cost consumer electronic product. Remember, we're talking about a part of the build cost of an entire machine that costs as much as only that component does in the PC space.
 
First of all, I never said Xenon is faster than a tri-core Phenom. Re-read that thread, especially my latest post. As for 16 threads not being significantly > than 4 threads, in the PC space this is true. Do you know of any meaningful workloads outside of server and HPC realm that scale that well?

Compression/decompression and encoding/decoding scale well. So does pretty much any type of 3d rendering while gaming is starting to take advantage too.

But the fact that multicore scaling is limited today means little for what it will be like in a few of years when Nehalem level CPU's enter mainstream - particularly in relation to games. Software is still playing catch up but it won't be long before software is generally catering for multiple cores.

I would expect SMT to make more of an impact than the additional cores (to utilize the idle execution resources of each core), and only if implemented well, and even then only in corner cases of extremely thread parallel workloads.

That doesn't make much sense. Code must be multithreaded to take advantage of extra threads and your always going to get more performance from running a thread on a dedicated core than running that same thread on the second hardware thread of an already utilised core.

I figured we were discussing useful performance, but it's clear you're just discussing available execution resources, in which case there's no doubt Nehalem will be superior to Penryn by a wide margin.

Well its both. Usefull performance as you call it should be a fair bit higher but total available power is obviously much higher. Iits not as though that power is locked away though. Certain code types will use it today, more will use it when Nehalem is on the market and many more will use it 2-4 years from then. Anyone planning to keep the CPU for a few years will see the benefit of those cores far more so than what your proposing they would today.

Anyway, it was you that was claiming 6 threads would give a significant performance boost over 3 in the Phenom thread. So if nothing else an 8 Core Nehalem can dedicate a core per Xenon thread to ported games while Penryn does not have sufficient threads to do so - just like the Phenom. You were saying thats a big advantage weren't you? The same could also be said for ported Cell code.

Physics, while very parallelizable, do not put a heavy strain on any modern performance-class processor, whether it be specialized hardware (PhysX), multi-core GP CPUs, GPUs, or multi-core in-order console processors.

That doesn't make much sense. Physics can scale to your available resources. PS3 will likely be doing things with physics that Xenon simply can't pull off. Similarly their are feats claimed for the Ageia PPU that are not possible on current CPU's. Nehalem should be able to breeze through these kinds of things making "PPU only" like physics and full PS3 level physics possible on the desktop (if its not already so).

Unless Intel is able to get the Havok engine to auto-parallelize physics and scale effects with core count, I doubt we'll see much difference (short of SSEx optimizations and optimizations specifically for Core MPUs).

Why would Intel not be able to get Havok to scale with core count? Like you said, its a pretty trivial task and its obviously in their best interest to make it happen. In fact they have already demonstrated their interest in doing so with the ice fighters demo and thats before they bought Havok. Havok will scale with multicores and Intel will encourage devs to take advantage of that, ala Alan Wake. You can count on that.

First of all, where'd you get the crystal ball? You don't know how well Shanghai will compare to Penryn, nor Nehalem.
Secondly, Nehalem is not a native 8-core design, it is also 2x quad-core dies on a package.

No, Nehalem is a native 8 core design, check it out.

And Shanghai is little more than a die shrink of Barcelona with extra L3. If you think thats going to allow it to seriously compete with Penryn then your optimisitic IMO. By the time its launched Penryn could easily be clocking over 3.6Ghz so Shanghai will have to be clocking at something similar to have a chance of competing, unless its IPC is significantly higher than Barcelona that is. And how can you expect two of them to compete with Nehalem assuming that Nehalem is going to give pretty substantial improvements over Penryn on a per core basis and be native octo core?
%á#P> </P>
AMD hasn't even attempted to create a product for the HPC space? What do you call Opteron? When was the last time you checked the Top 500 supercomputer list? Do you not remember who pioneered stream computing in the first place? I'll give you a hint, it's not Nvidia.

I call Opteron a server chip just like everyone else does. Opteron is clearly not aimed at the same markets as Cell and Larrebee. It might be offered to those markets as AMD's only option but its certainly not designed to be a serious competitor to them in their specialised fields, i.e, high performance stream computing.

I don't think anyone outside of MS/Nintendo/Sony know precisely when their next-gen consoles will launch, so it's tough to justify any statement about the maturity of any future PC product's viability as a console part when the availability of said PC part is also just a timeframe and not a hard number. I'm certainly not going to argue against using a Sandy Bridge or Larrabee derivative in a console from a performance standpoint, but I think it's fair to assume such a derivative would have to be rather cut-down in order to meet price/performance targets of a relatively low-cost consumer electronic product. Remember, we're talking about a part of the build cost of an entire machine that costs as much as only that component does in the PC space.

I agree, however as you say the timefrmes are important here. If Sandy Bridge is a year old when the next gen consoles launch then a low end/low power version of it would be conceivable. Not that it matters, it was just an off hand comment on my part, not something I want a detailed and ultimatly fruitless debate over.
 
First of all, I never said Xenon is faster than a tri-core Phenom. Re-read that thread, especially my latest post. As for 16 threads not being significantly > than 4 threads, in the PC space this is true. Do you know of any meaningful workloads outside of server and HPC realm that scale that well?

Compression/decompression and encoding/decoding scale well. So does pretty much any type of 3d rendering while gaming is starting to take advantage too.

And that's the entirety of the list (outside of synthetic benchmarks, of course). Not very useful IMHO. Other than for Folding :devilish:


But the fact that multicore scaling is limited today means little for what it will be like in a few of years when Nehalem level CPU's enter mainstream - particularly in relation to games. Software is still playing catch up but it won't be long before software is generally catering for multiple cores.

I agree that the trend with new software development is towards thread parallelization, but there are some tasks that simply do not gain performance from spinning off new threads.

That doesn't make much sense. Code must be multithreaded to take advantage of extra threads and your always going to get more performance from running a thread on a dedicated core than running that same thread on the second hardware thread of an already utilised core.

I'm just speaking from a per-core efficiency standpoint. Obviously Intel sees a reason to reintroduce SMT in an 8-core product, we have to assume it is for performance and resource utilization purposes.

Well its both. Usefull performance as you call it should be a fair bit higher but total available power is obviously much higher. Iits not as though that power is locked away though. Certain code types will use it today, more will use it when Nehalem is on the market and many more will use it 2-4 years from then. Anyone planning to keep the CPU for a few years will see the benefit of those cores far more so than what your proposing they would today.

I'm an immediate gratification kind of guy, I want my performance now, not 3 years from now when I'll have upgraded several times already. I know most don't upgrade that often and we're just building an install base for multi-threaded software, but what good are extra cores if they can't be used by anything other than aforementioned thread-friendly apps?

Anyway, it was you that was claiming 6 threads would give a significant performance boost over 3 in the Phenom thread. So if nothing else an 8 Core Nehalem can dedicate a core per Xenon thread to ported games while Penryn does not have sufficient threads to do so - just like the Phenom. You were saying thats a big advantage weren't you? The same could also be said for ported Cell code.

I mis-spoke when I said that 6-thread Xenon was faster than 3-thread Phenom. My reasoning was that we were discussing a theoretical new XB360 which would have to emulate both Xenon's 6 hardware threads and PPC ISA including VMX. Since this is not what you were suggesting my response to you was based on a flawed premise.

That doesn't make much sense. Physics can scale to your available resources.

The keyword there being *can*

PS3 will likely be doing things with physics that Xenon simply can't pull off.

I would truly hope so. Xenon does its job well (I doubt there are many XB360 devs shouting for more CPU power) but Cell is an unabashed FLOP monster.

Similarly their are feats claimed for the Ageia PPU that are not possible on current CPU's.

The keyword there being *claimed*. I'm all for advancement but PhysX has yet to deliver any, that I've seen.

Nehalem should be able to breeze through these kinds of things making "PPU only" like physics and full PS3 level physics possible on the desktop (if its not already so).

I agree. Assuming Intel gets all its ducks in a row and has Havok optimized for Nehalem by then.

Why would Intel not be able to get Havok to scale with core count? Like you said, its a pretty trivial task and its obviously in their best interest to make it happen. In fact they have already demonstrated their interest in doing so with the ice fighters demo and thats before they bought Havok. Havok will scale with multicores and Intel will encourage devs to take advantage of that, ala Alan Wake. You can count on that.

I wouldn't call anything trivial at this point in software development. I just hope Intel doesn't play games with Havok and only parallelize for their own chips.

No, Nehalem is a native 8 core design, check it out.

grumblegrumbledamnintelandtheirsuperiorprocesstechnologymumble

My mistake. That's what I get for not reading today's IDF coverage and going off speculation for the last few months.

And Shanghai is little more than a die shrink of Barcelona with extra L3. If you think thats going to allow it to seriously compete with Penryn then your optimisitic IMO. By the time its launched Penryn could easily be clocking over 3.6Ghz so Shanghai will have to be clocking at something similar to have a chance of competing, unless its IPC is significantly higher than Barcelona that is. And how can you expect two of them to compete with Nehalem assuming that Nehalem is going to give pretty substantial improvements over Penryn on a per core basis and be native octo core?

Intel only has SKU headroom for Penryn up to 3.67GHz on 1333FSB and 3.6GHz on 1600FSB.

Since AMD can hit 3GHz with K10 on 65nm they should be able to hit higher than that on a die-shrunken core with minor architectural changes & more cache. Shanghai should be very competitive with Penryn, and Montreal with Nehalem.

I call Opteron a server chip just like everyone else does. Opteron is clearly not aimed at the same markets as Cell and Larrebee. It might be offered to those markets as AMD's only option but its certainly not designed to be a serious competitor to them in their specialised fields, i.e, high performance stream computing.

HPC may not be Opteron's primary target market nor its bread and butter, but you can't just brush aside its clout in this market.

I agree, however as you say the timefrmes are important here. If Sandy Bridge is a year old when the next gen consoles launch then a low end/low power version of it would be conceivable. Not that it matters, it was just an off hand comment on my part, not something I want a detailed and ultimatly fruitless debate over.

I think Silverthorne's descendant would be more interesting for a potential next-gen XBox or Nintendo console (Sony's obviously going to stick with Cell for PS4).
 
Back
Top