Leaked Intel Nehalem performance projections over AMD Shanghai

That could be Montreal, which slides point to having a full 1MiB of L2 per core.

If Shanghai's L3 were to become approxmately as dense as Intel's, there would be room for 1.5 MiB of extra L2 chip-wide right there. Further tweaks on the L2 density and perhaps other changes might have enough additive improvement to enable the cache increase with a hopefully small die size increase.

edit:
There's also the option of just bloating the core back to Barcelona's die size. Going by the 7.5 mm2 per MiB, going from 266 to 285 is 19/7.5, which would at least allow the room for the cache arrays.

The rumors that Montreal could be an MCM seem hard to justify with doing this. Actually they seem hard to justify even now, given Shanghai's current size.
 
Last edited by a moderator:
The rumors that Montreal could be an MCM seem hard to justify with doing this. Actually they seem hard to justify even now, given Shanghai's current size.
Yeah, it does seem strange and now Fudzilla is actually claiming that it's native. If it only had 512KiB of L2/core and Z-RAM L3, I guess that'd make sense... otherwise, I'm not sure I get the strategy here.
 
I eyeballed the die shot, and I think Shanghai's non-L3 component is roughly 5/8 of the die. 5/8 of 266 mm2 would be ~166.
That leaves 99 mm2 for the L3.

Montreal sans L3 would be 330 mm2.
The question now becomes what kind of savings ZRAM could give.
I think the numbers were 3-4 times better density, which would give between 33 and 25 mm2 for 6 MiB cache for a total of 200mm2 for a Montreal core in the same configuration as Shanghai.

The slide said 6-12 MiB, which I interpret as 6 per die of a two-chip MCM, so it could be 50-66 mm2 for the cache.

Montreal's aggregate die area, if my rough math is close, would be 380-400 mm2.
It might be better than Beckton when it comes to die area, if true, but it's a league of die size AMD hasn't trod in.

If it were native, we need only look at AMD's luck with a die less than 300mm2.

I'm not sure what package an MCM with two 200mm2 dies would look like. It probably would require the new socket just to fit.
 
Going without would leave Montreal at about 532mm2 in die area, either on one die or in aggregate.

The package substrate would have to be pretty wide to accomodate either.
 
I don't know how relevant 3DMark Vantage CPU scores are to anything, but apparently a 2.66GHz Bloomfield (4 cores, 8 threads) scores ~16k vs. ~13k for a 3.2 GHz QX9770, or ~20% faster with a ~20% slower clock.
 
I don't know how relevant 3DMark Vantage CPU scores are to anything, but apparently a 2.66GHz Bloomfield (4 cores, 8 threads) scores ~16k vs. ~13k for a 3.2 GHz QX9770, or ~20% faster with a ~20% slower clock.

Wow! Now thats nice! I'm definatly planning to pick up a Nehalem for my next CPU. I'll probably wait for the 8 core version though. Between this and GT2xx, PC's are going to be insanely powerful by the end of this year!

Now where are the games to use it!
 
I don't know how relevant 3DMark Vantage CPU scores are to anything, but apparently a 2.66GHz Bloomfield (4 cores, 8 threads) scores ~16k vs. ~13k for a 3.2 GHz QX9770, or ~20% faster with a ~20% slower clock.

Vantage is a heavily-threaded benchmark though, is it not? How much of that gain is due to SMT and how much is due to single-thread IPC improvements? Wagering a guess I'd say it's damn near half and half.
 
I don't know how relevant 3DMark Vantage CPU scores are to anything, but apparently a 2.66GHz Bloomfield (4 cores, 8 threads) scores ~16k vs. ~13k for a 3.2 GHz QX9770, or ~20% faster with a ~20% slower clock.
I just saw this as well, I was hoping on being able to resist Nehalem until 32nm but now I'm not so sure ;).

Vantage is a heavily-threaded benchmark though, is it not? How much of that gain is due to SMT and how much is due to single-thread IPC improvements? Wagering a guess I'd say it's damn near half and half.
Very good point, we'll have to see how single-treaded clock/clock performance turns out.
 
Anandtech has an article with some benches available.

Truly impressive don't fare well for AMD
 
Very good multithreaded results with good power numbers.

I'm waiting to see what happens with benchmarks that don't specifically target weak spots for Core2 like unaligned accesses and less well threaded programs.

Anand did get one thing wrong, most likely. He seems to still think that AMD's Bulldozer will arrive in 2009.
 
Wasn't Barcelonas L2 twice as slow as Conroe which give Intel advantages in games ?

No doubt Conroe's L2 is faster than any L2 AMD has ever created, but it was the inclusion of an L3 cache optimized for power/heat consumption (rather than density or performance) that really hurt.
 
Anandtech has an article with some benches available.

Truly impressive don't fare well for AMD

Extremely impressive and extremely worrying for AMD. As the article says, AMD still haven't caught up with Core 2 performance. Penyrn makes Phenoms look like a bit of a joke and now Nehalem is waiting in the wings and capable of utterly obliterating Penryn!

These things are gonna be expensive!
 
Very good multithreaded results with good power numbers.
They only had one, but there was a single threaded Cinebench R10 result where the Nehalem still whipped the Penryn to the tune of about 25%.

And if these benches were done on the "broken" memory motherboard, I can only imagine what they'll look like on a working one. :oops:
 
I am flabbergasted. I am reading the piece now and I am amazed.
 
I cant believe the performance numbers I am reading on that preview...with faulty mobos as well...yikes! I am still going to hold off from getting it and would rather wait for the 32 nm Westmere...perhaps the HP Blackbird will have a system with that in it next year....but I am getting off track.

AWesome job Intel! I am rather worried though since the CPU prices are going to be quite expensive IMHO since AMD is not around to provide a counterpunch.
 
They only had one, but there was a single threaded Cinebench R10 result where the Nehalem still whipped the Penryn to the tune of about 25%.

And if these benches were done on the "broken" memory motherboard, I can only imagine what they'll look like on a working one. :oops:

As if on cue, Anand revised the single-threaded bench result. Nehalem is only slightly better than Penryn per clock now.
 
Back
Top