Intels Conroe benchmarked!

3dilettante said:
Realworldtech has a very good summary of the latest Conroe info.

Yes, it is indeed pretty good. Check this out:

Realworldtech said:
Core has three execution dispatch ports, which feed a total of three 128 bit SSE units, two 128 bit floating point units, and three 64 bit integer units; the integer unit on dispatch 1 also handles 128 bit shifts and rotates and all of the ports can perform FP Moves. The execution subsystems of the P4 and Yonah are paltry in comparison.

http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144&p=6

I'm getting more and more impressed with the progress that Intel has done with this new architecture.
 
Other than the usual expansion of caches and increase in execution units, there are at least two ideas in Conroe that interest me quite a bit:
  • Macro-op fusion
  • Memory aliasing prediction
I kinda suspect that it is features like these that are responsible for the large per-clock lead Conroe holds over A64, rather than just the wider SSE unit or DDR2 support or the large cache. It also looks like there is still a lot of potential left in these features that Intel hasn't tapped yet in Conroe (much like branch predictors kept getting enhanced over and over again after their first appearance in Pentium).
 
arjan de lumens said:
Other than the usual expansion of caches and increase in execution units, there are at least two ideas in Conroe that interest me quite a bit:
  • Macro-op fusion
  • Memory aliasing prediction
I kinda suspect that it is features like these that are responsible for the large per-clock lead Conroe holds over A64, rather than just the wider SSE unit or DDR2 support or the large cache. It also looks like there is still a lot of potential left in these features that Intel hasn't tapped yet in Conroe (much like branch predictors kept getting enhanced over and over again after their first appearance in Pentium).
I suspect the large shared L2 cache give a 12% performance improvment over the two independent L2 cache of the Athlon 64 X2.
4% probably from better hit ratio and 8% from less cache coehrence protocol overhead in the L2.

But the overall good performance comes from many good design decisions.
I suspect some performance increase could come with a smart compiler that optimize the way macro-op works :)
 
Last edited by a moderator:
swaaye said:
AMD's exclusive cache design isn't going to see much gain at 2MB. Look at the diff between 1MB and 512K. The L2 on AMD's chips is very slow compared to Intel's. L2 is critical for Intel for various reasons, like their tiny L1 cache. This is one of those areas where the performance curve slopes sharply. Diminshed returns.

I remember seeing an article comparing semprons and athlon 64s. In most applications, going even from 128KB to 1MB L2 cache did little for performance, but every once in a while there'd be an odd jump where the 1MB would perform something like 20% better, even when 512KB did almost nothing. Cache seems to be like memory, it does nothing for performance when you have an adequate amount, but when you don't it makes a huge difference.

Very interesting to see that in the near future, the only place AMD may be able to approach Intel is in the mobile market. Maybe AMD can license that new low cost water cooled heatsink Intel unveiled and start selling 3.5ghz chips.
 
Pete said:

Ok, more like a 10% jump, but it does show that performance doesn't scale at all linearly with an increase in cache and some things take more effect from it than others. Since there are some situations that barely benefit from 128KB to 512KB but then jump at 1MB, perhaps Intel found a sweet spot with 4MB cache that causes another major jump for nearly every application (large enough to avoid almost all memory access maybe?)
 
Fox5 said:
Very interesting to see that in the near future, the only place AMD may be able to approach Intel is in the mobile market. Maybe AMD can license that new low cost water cooled heatsink Intel unveiled and start selling 3.5ghz chips.

I disagree. Thanks to Centrino branding and the quality of the Pentium M design, Intel will stay in the lead here. Turions are somewhat close in performance, and somewhat in the same league with power consumption, but the branding and platform quality are nowhere near what Intel has managed.

Intel's 1 year lead in process design will only make the chips cooler and faster than AMD, which can't eke out much more speed. Worse, Conroe has better performance and is cooler. It's a complete reversal of Prescott vs A64.

AMD only has good positioning in 4-socket and possibly low-cost 2-socket server market. If it weren't for its point to point topology and Intel's sticking to a bus for a few more years, even this would be gone.

In higher-end systems, the limitations of coherency and the IMC's limited memory capacity seriously hurt Opteron.

AMD's going to have to go back to being a bargain compared to Intel, which nearly killed it before when it had stuff to spin off.
 
I think that assumes AMD has nothing in the pipeline. Indeed they may not, but when I last was the in business and involved in Sematech (admitedly some time ago now), AMD had a quite formidable R&D staff at work in a number of areas...time will tell. I do know that we NEED a competitive competitor to Intel to keep the gains going and prices reasonable.
 
Mize said:
I think that assumes AMD has nothing in the pipeline. Indeed they may not, but when I last was the in business and involved in Sematech (admitedly some time ago now), AMD had a quite formidable R&D staff at work in a number of areas...time will tell. I do know that we NEED a competitive competitor to Intel to keep the gains going and prices reasonable.

It's possible AMD has been sandbagging. It's always been more paranoid about disclosing future plans. We have heard of the multiple cancellations of next-gen projects. AMD has had more K10s than socket changes.

Sadly, what has been disclosed in the next year and a half is not as promising as Conroe. Conroe is a new core, and there won't be a new AMD core until possibly late 2007 to early 2008.

AMD will probably entrench in the server market, where it's interconnect and improved multiprocessing lines should keep it ahead of Intel until at least late 2007.

Hector Ruiz has already made some hints at a deemphasis of mobile and desktop. If he's just being sneaky or trying to soften a coming blow is fair to question.
 
3dilettante said:
It's possible AMD has been sandbagging. It's always been more paranoid about disclosing future plans. We have heard of the multiple cancellations of next-gen projects. AMD has had more K10s than socket changes.
afaik, K9 which was designed as high-freq CPu (like P4) was cancelled ~ an year ago.

btw, what does "128-bit FPU" stands for? Intel made real 128-bit fpu ops ?! weeeeeee
:super:
 
Last edited by a moderator:
chavvdarrr said:
afaik, K9 which was designed as high-freq CPu (like P4) was cancelled ~ an year ago.

btw, what does "128-bit FPU" stands for? Intel made real 128-bit fpu ops ?! weeeeeee
:super:

No, it's a 128-bit SSE unit: computing 2 64 bits FP at once.
 
That's what I've been expecting all along. Bandwidth doesn't mean jack to A64. Overclockers have been spouting this for years. It's all about maintaining its latency advantage, and DDR2 doesn't gain you much there. A64's RAM interface is approaching the speed of sort of an L3 cache so it's not hurting on the RAM speed front. It needs more clock speed or execution units, or just more efficient execution. I think the chip is already extremely efficient, and that it's totally obvious that Conroe totally outguns it in resources. And it should; it's like a 5 yr newer design. A64's execution core is a lot like K7 and is built for a 180nm process and its transistor budget limitations.

P4 is a different animal and loves to chew on lots of bandwidth. Its bus design can't achieve the latency that A64 enjoys so the relatively small changes in time to RAM don't affect it as much.
 
if conroes FSB is 1066 then what DDR2 mem would be most suited?

at the same price range what would be better?
ddr2-533 with tight timings
ddr2-675 tightish
or ddr2-800 slackish
 
chavvdarrr said:
afaik, K9 which was designed as high-freq CPu (like P4) was cancelled ~ an year ago.

There have been a bunch of K8 replacements that have been thrown out. I think that K9 you mention was one of them.

There was also an alternate K8 that got tossed. Parts of it moved to a K9, which got pushed to K10 (when they reassigned the K9 codename to dual core K8s), which got cancelled.
There's other crossed off projects in there somewhere.

Every time, AMD dumped the more ambitious designs for an evolution of the Athlon core.
There won't be a revamped core until 2008, if the roadmaps are to be believed.
 
borntosoul said:
if conroes FSB is 1066 then what DDR2 mem would be most suited?

at the same price range what would be better?
ddr2-533 with tight timings
ddr2-675 tightish
or ddr2-800 slackish

Isnt ddr2 1066 out already? Wouldnt that be the memory that is best suited then?
 
Back
Top