Fusion die-shot - 2009 Analyst Day

Maybe 25 mm2 per core, theres no way it can be 25 mm2 for both cores. Atom itself is 21mm2 for each core and bobcat is supposed to be far superior to Atom

Where do you get this 21mm^2 number?

Atom + L2(45nm) is about 25mm^2. The core itself takes less than half, maybe 1/3rd of this(8mm^2), the rest is taken by L2 cache, bus interface etc. (based on die images)

K10 core(without L2) at 32 nm is 10mm^2/core. (Llano)

Bobcat should be less than half of it, so something like 8 mm^2 is very reasonable assumption for bobcat die size at 40nm(same core size as atom, but with better mfg tech, meaning "would be 20% bigger at same mfg tech"). Two cores 16mm^2, add some 1 MB of L2 cache for 10 mm^2 and you have about the 26mm^2 for cores and the l2 cache.

Then we still have 49mm^2 for the GPU, memory controller, bus interfaces etc in the 77mm^2 part.
 
Last edited by a moderator:
2u90q37.jpg


Bobocat should be about 10mm2 per core.

Based in this table it should be 13mm^2 if manufactured with 65nm, when compared with barcelona.

But Ontario will be 40nm which means it will less than half the size, about 6mm^2 based on this comparison.

If we compare to penryn we get that if should be about 11mm^2 on 45nm, some 9mm^2 at 40nm.

So from this table we can conclude anything between 6-9mm^2.
 
2u90q37.jpg


Bobocat should be about 10mm2 per core.

By core size i meant including everything, cache and all associated logic. Bobcat is rumoured to have a 512KB L2 cache per core. And with an atom core at about 21mm2 i stand by my speculation that bobcat will be more than atom(even with the die savings of 40nm vs 45nm for atom)

Where do you get this 21mm^2 number?

Atom + L2(45nm) is about 25mm^2. The core itself takes less than half, maybe 1/3rd of this(8mm^2), the rest is taken by L2 cache, bus interface etc. (based on die images)

K10 core(without L2) at 32 nm is 10mm^2/core. (Llano)

Bobcat should be less than half of it, so something like 8 mm^2 is very reasonable assumption for bobcat die size at 40nm(same core size as atom, but with better mfg tech, meaning "would be 20% bigger at same mfg tech"). Two cores 16mm^2, add some 1 MB of L2 cache for 10 mm^2 and you have about the 26mm^2 for cores and the l2 cache.

Then we still have 49mm^2 for the GPU, memory controller, bus interfaces etc in the 77mm^2 part.

I subtracted the die sizes of dual core Pineview and single core pineview, this giving us the size of an additional core(87 mm2 - 66 mm2 = 21mm2). By core im referring to everything inclucing caches, bus interface, etc
 
Bobcat is rumoured to have a 512KB L2 cache per core.
Where did you get it?

And with an atom core at about 21mm2 i stand by my speculation that bobcat will be more than atom(even with the die savings of 40nm vs 45nm for atom)
Bulk chip's transistor density is significantly higher than SOI chip.

Rv770: 956m transistors,256mm2,55nm bulk
Core i7: 731m transistors,263 mm²,45nm SOI
 
Last edited by a moderator:
There's another side effect to density:
RV770: 750 MHz
[GT200: 1,476 MHz]
Core i7: 3,330 MHz
 
Last edited by a moderator:
There's another side effect to density:
RV770: 750 MHz
[GT200: 1,476 MHz]
Core i7: 3,330 MHz

Transistor density has very little to do with that.

Clock speed difference is mostly due different pipeline lengths.
Each pipeline stage does much less on core i7 than in RV770.
 
Where did you get it?


Bulk chip's transistor density is significantly higher than SOI chip.

Rv770: 956m transistors,256mm2,55nm bulk
Core i7: 731m transistors,263 mm²,45nm SOI

SOI vs bulk has very little to do with this.

The reason why RV770 packs more transistors to smaller space is because bigger part of those transistors are in some caches and/or register files, SRAM which has much higher transistor density than "any normal logic".
 
Transistor density has very little to do with that.

Clock speed difference is mostly due different pipeline lengths.
Each pipeline stage does much less on core i7 than in RV770.
Are the pipeline stages from GT200 and RV770 so very different also both being primarily gpus? AFAIK you can do quite a bit wrt to clock speed in your design apart from sheer pipeline length.

SOI vs bulk has very little to do with this.

The reason why RV770 packs more transistors to smaller space is because bigger part of those transistors are in some caches and/or register files, SRAM which has much higher transistor density than "any normal logic".
Are the 9+ MBytes [was: 10+] of cache in Core i7 not SRAM also? Even if not: The density Intel achieved on Penryn/45nm (see table above) seems quite impressive.
 
Last edited by a moderator:
Are the 10+ MBytes of cache in Core i7 not SRAM also? Even if not: The density Intel achieved on Penryn/45nm (see table above) seems quite impressive.

The Core i7 has 4 256KB L2 and 8MB L3 for a total of 9MB.

Here's a good example of GPUs being denser than CPUs.

First some back story.

Core i7 9xx: 731 million transistors, 263mm2 die, 45nm
Core i7 8xx: 774 million transistors, 296mm2 die, 45nm

You can see the extra circuitry used in I/O like PCI Express isn't very dense. 5.9% increase in transistors resulted in 12.5% increase in die size.

Core i7 980X "Gulftown": 1.17 billion transistors, 240mm2 die, 32nm
Sandy Bridge 4 core: 1.12 billion transistors, ~225mm2 die, 32nm

Remember, even though Gulftown has 13.5MB worth of SRAM, and Sandy Bridge packs the PCI Express and DMI connections of Lynnfield, and only 9MB SRAM, its more dense than Gulftown.

How about a look at Sandy Bridge's GPU? Closest comparison to the ~225mm2 die Sandy is Lynnfield. There's a 346 million transistor difference between the two. Let's make it simple and say its 300 million for the GPU.

The GPU takes roughly 40mm2. 300 million transistors in 40mm2, quite compact isn't it?
 
GPUs have much higher structural "uniformity" than a typical CPU, because of their streaming data parallel nature. That allows for dense packing of similar structures, and the SRAM arrays are only part of the whole picture here.
 
But Ontario will be 40nm which means it will less than half the size, about 6mm^2 based on this comparison.

This is an extremely simplified calculation.

Look at the core sizes of Windsor and Brisbane. The 65nm Brisbane isn't 50% of the size of 90nm Windsor, its 67%.

I have a feeling when AMD said "90% of today's mainstream performance at half the size", they meant actual die size, not per core. Cores are small enough to be not much significance.

What's that equal to? ~25mm2 single core and ~50mm2 dual core variant. If we assume its the single core variant then they might be able to put the rest into a big enough GPU.

Perhaps it doesn't need to be ~50mm2 for the dual core either. Maybe its 40 or 45mm2, if they don't duplicate everything.
 
The reason why RV770 packs more transistors to smaller space is because bigger part of those transistors are in some caches and/or register files, SRAM which has much higher transistor density than "any normal logic".

i7 still have (far) more SRAM the RV770, but on i7 everything is tuned to reduce latency including letting be more space between transistors.
 
Are the pipeline stages from GT200 and RV770 so very different also both being primarily gpus?
Yes. NVIDIA uses a much longer pipeline in the ALUs than ATI does.

There are of course other things that affect clock speed (especially the largest critical path in one of the pipeline stages), but that's the most important thing.
 
Yes. NVIDIA uses a much longer pipeline in the ALUs than ATI does.

There are of course other things that affect clock speed (especially the largest critical path in one of the pipeline stages), but that's the most important thing.

Frequency is determined by a lot of factors:

1. Depth of each pipeline stage
2. Process technology
3. Power consumption and thermal dissipation (hint: this was the constraint for NV)
4. Voltage, which impacts #3

It's hard to say that any one factor is most important. In fact, I'd argue that #2 in some ways is the most important. You can design a microarchitecture around process technology, but only two companies in the world can design process technology around a microarchitecture.

David
 
You can design a microarchitecture around process technology, but only two companies in the world can design process technology around a microarchitecture.

David

That's interesting. One of the companies will of course be Intel. Which is the other one?
 
Back
Top