"What Nehalem is really about"

Wish the prices were in dollars rather than UK pounds...I would have upgraded! The appeal of having a motherboard which does not pledge allegiance to either Crossfire or SLI is just too good!
 
so can i run crossfire on an nforce board ?

CF is open for all to use. NV chooses not to allow CF to run on Nforce chipsets, but it is possible to do so. Case-in-point: enthusiast-class systems from HP and Dell have used Nforce chipsets to run Crossfire, so it most certainly can be done.
 
oi, there was nothing wrong with hyperthreading ;)

Actually, while technically true (HT worked just fine), there were performance implications in multiple cases, most of which were addressed in the Prescott core right before they just turned SMT off and went to SMP. And while the OS had at least part of the blame, It wasn't all because of the operating system... Register pressure was one of the big ones, so were cache issues. Both of these hardware issues were able to cause measurable performance degredation under certain circumstances, although proper application coding specifically for HT systems could certainly work around it.

As the "programming world" has now become accustomed to multiple physical cores rather than logical cores, it's even more important these days to have an SMT processor built with some expectation of those original bottlenecks. I have a feeling Intel has done a lot more homework on these issues and is ready for a strong re-debut of this technology.

I'm glad to see it come back; I liked the original HT and most of my apps either didn't care if it was enabled or picked up a bit of extra performance because of it. I'm pretty excited for the improvements that Nehalem brings, even if it doesn't automatically make all my games go eleventy-brazillion percent faster.
 
Hyper threading in the P4 days where two different things. HT on Northwood and HT on Prescott.

Northwood had a tiny 8KB, 2-way set associative data-cache, which was superfast (2 cycle load-to-use latency). Fantastic for single context applications, but running two contexts would very likely result in thrashing.

Same with the trace-cache. Northwood had a single 12K instruction (96KB) trace-cache. Thrashing in the trace cache was extra painful since only one instruction was decoded per cycle on trace cache refill.

The 128 entry ROB was split in two 64 entry partitions when SMT was enabled, further reducing single thread performance.

Prescott had a lot of changes to make HT viable. The data cache was increased to 32KB 8-way. The latency increased from 2 cycles to 4 cycles, so that even though L1 hit rate increased substantially single thread applications still saw a net reduction in performance in many cases. The larger cache and the multiple way associativity greatly helped multiple context workloads.

Prescott also doubled the trace caches, so that there was one for each HT context, reducing trashing.

HT performance improvement on Northwood was normally in the -10% to 10% range, dependent on workload. Prescott was more like -3% to 30%.

Nehalem also statically partitions its ROB into two 64 entry chunks when SMT is enabled which will have a small effect on single thread performance.

Cheers
 
Some extremely impressive results there. There's no doubt that Nehalem is a beast for everything but gaming.

But it still manages to equal Penryn and when you consider the huge gaming lead Penryn already has, its easy to understand why Intel chose to beef up other areas of performance.

Especially when future games may well start taking better advantage of a Nehalem like architecture and thus its advantage in this area will probably grow over time anyway.
 
Nehalem also statically partitions its ROB into two 64 entry chunks when SMT is enabled which will have a small effect on single thread performance.

Not so.

I won't speculate on the degree to which this will impact Nehalem's single-threaded performance, because I don't know a) the threshold beyond which a decrease in ROB entries significantly impacts the core's ability to extract maximum instruction-level parallelism, or b) if Intel has some way of mitigating this decrease,Update: Intel says that they do option b, i.e., if only one thread is executing that thread gets full use of the shared resources.
 
Back
Top