The Core i7 "Density-Disaster"

CarstenS

Moderator
Moderator
Legend
Supporter
Sorry for such a rude headline, but it obviously suceeded in attracting your attention. ;)

The focus of this thread should be a possible explanation of Nehalem/Bloomfield/Core i7s apparent lack of high transistor density.

I've learned a few things in general, since i began to read into semiconductor tech and this forum. ;)

a) Intel has very good mfc. tech
b) cache packs better than logik
c) smaller fab-tech makes higher density

So, when i was reading through the plethora of Core i7 reviews this morning, I was kind of dumbstruck when I read about die-size and amount of transistors in Intels latest and greatest:

Supposedly it has 263mm² die size and about 731M transistors netting a density of about 2,779M/mm² on Intels 45nm process. That is including the vast amount of 8 MiByte L3-, 2 MiByte L2- and some spare bytes of L1-cache.

Intels Core 2 on penryn-architecture has a density of about 3,832M/mm² (~410M trannies, ~107mm²) on the same process, AMDs RV770 has about 3,68 - on 55nm!

And i thought: WTF?
 
Last edited by a moderator:
Penryn has 6M L2 per two cores, ~50% more than Nehalem. Could that be part of the fault? Also IIRC Penryn had much smaller logic area compared to Nehalem. IMC with three memory channels and QP will probably also take (much) more die space than FSB.
 
Ok, but that doesn't explain the comparison to the GPU - which consists to a significantly higher degree of logic and is manufactured in a more coarste process.
 
Not really sure this is any sort of "disaster"; more than a few people would be thrilled if AMD could get to that kind of density with this kind of feature set...

The i7 features a a radically different CMOS design than any processors since the "Pentium" came to be; different cache design from about the same era, and obviously has more logic per-core as well as "un-core" than anything they've built prior to this.

Even if the design isn't optimal, it doesn't really have to be -- the true "shrink" comes with the next "tick" in their development chain.
 
Intel's cache density is still very good. Looking at Nehalem's die, the L3's area is dwarfed by the wide expanse of core logic. The L1 is less dense, since it has been implemented 8T SRAM for better voltage scaling, but it's small enough to not massively skew things.

Nehalem's logic portion larger this time around, and logic has not benefited as much from process scaling as cache. The IOs and other interfaces have not scaled as much, either.

Penryn is over 1/2 cache by area.
Nehalem, if we count all the extra space taken up the new uncore, is closer to 2/3 logic.

RV770 is not an entirely clean comparison.
Aiming for transistor performance incurs a density penalty, whereas low clocks allow for more dense packing.
Different manufacturers may also have different methods for arriving at their transistor counts, so there may be some wiggle room there as well.
 
Supposedly it has 263mm² die size and about 731M transistors netting a density of about 2,779M/mm² on Intels 45nm process. That is including the vast amount of 8 MiByte L3-, 2 MiByte L2- and some spare bytes of L1-cache.
Small correction that only includes 1MiByte L2 cache (though it's of the 8T version just as the L1 cache too).
I think core die size (and transistor count) isn't too different to penryn actually (with the obvious difference that L1/L2 cache is 8T instead of 6T and L2 cache is now really part of core whereas you could easily separate that from the rest of the core in penryn in the die shot). As others have mentioned, all the i/o logic (qpi, mc mostly) consists probably of (comparatively) few transistors and definitely takes up a significant amount of area.
(I also wonder if the "low power optimized design" accounts for some difference, intel talks a lot about how this is the first fully static cmos design since ages, though judging by the numbers I'd conclude that penryn was almost fully static too.)
And don't forget AMD doesn't manage to pack the transistors any denser neither, quite the contrary at least for the L3 cache (if you look at Shanghai, that's probably the closest chip you can compare this to from a technology point of view).
 
Thanks a lot guys for clearing this up for me. I was seriously wondering why on earth Intel should have lost its mfc. edge all of a sudden.
 
coresul3.png


A small contribution to the discussion, from me. ;)

Now, those are core excerpts from Nehalem (left) and Shanghai (right), scaled properly.
Shanghai is about 14.3% smaller than Nehalem here, despite twice the L2 (very similar SRAM-cell density to Nehalem), and one should note that there are quite a few "empty" areas in AMD's 45nm shrink, due to disproportional scaling for the different functional blocks, or sort of. Looks like, it's time for AMD to retouch the layout of the good old K8 base. ;)

Of course, there are differences due to the manufacturing process peculiarities, despite both architectures being built on the same 45nm node.
 
Last edited by a moderator:
Is that a Bloomsfield die? The quad core Nehalem die pictures available through Google don't look like that... ??

I think you've got something wrong, look here: http://chip-architect.com/news/Shanghai_Nehalem.jpg

That compares both dies at equal scales, and it's nothing like what you're showing. The die to the left doesn't even have any apparent "cores", which should be plainly obvious in a four-core system. I'm not sure what you've got there, but I don't think it's a Nehalem die.

The actual die sizes appear to be 246mm^2 for the Nehalem quad core versus 243mm^2 for the Shanghai quad core, which is about a 1% difference in size for about a 4% decrease in transistor count.
 
I think the pics fellix posted are core shots rather than die shots, aren't they? There appear to be four of each in the die shots from chip-architect.
 
Ahhhhh, I see it now -- you're exactly right.

Ok, that makes LOTS more sense now :) Thanks for the clarification...
 
Is that a Bloomsfield die? The quad core Nehalem die pictures available through Google don't look like that... ??

I think you've got something wrong, look here: http://chip-architect.com/news/Shanghai_Nehalem.jpg

That compares both dies at equal scales, and it's nothing like what you're showing. The die to the left doesn't even have any apparent "cores", which should be plainly obvious in a four-core system. I'm not sure what you've got there, but I don't think it's a Nehalem die.

The actual die sizes appear to be 246mm^2 for the Nehalem quad core versus 243mm^2 for the Shanghai quad core, which is about a 1% difference in size for about a 4% decrease in transistor count.

Hmm.. strangely in the chip-architect shot, if the die dimensions are right, we have 257 mm^2 for the Nehalem (18.9x13.6mm) and 244 mm^2 for the Shanghai (17.8x13.7 mm) accounting for a 5% difference in size. And this is clearly visible because the Nehalem die picture is larger while the height is almost the same.
 
Back
Top