Hard to tell when we don't even know how many TMUs LRB has. Though I guess we can expect LRB2 to be mass produced on a 32nm process..(If, as it seems, LRB1 is on a 45nm process)So if we already expect a 64 core 2GHz Larrabee I, anyone care to speculate what they have in mind for Larrabee II?
Hard to tell when we don't even know how many TMUs LRB has. Though I guess we can expect LRB2 to be mass produced on a 32nm process..(If, as it seems, LRB1 is on a 45nm process)
Two chips on the same board configuration?32 Larrabee cores sounds doable at 45nm, while 64 looks like the middle product between the first Larrabee chips and the second generation.
We might have to differentiate between the 32-core Larrabee and 64-core variant.
At 45nm, Atom is weighing in at 47 million transistors and 25 mm2.
Larrabee was speculated to be around 30-33 million.
A Larrabee core, going naively by transistor count, should be north of 16mm2.
64x16 is 1024, and no reticle goes that high.
32 Larrabee cores sounds doable at 45nm, while 64 looks like the middle product between the first Larrabee chips and the second generation.
Yes.Is that cache trans are cheaper / smaller than logic?
Hang on, Its said that Larrabee (and dunington) are 1.9 billion trans @ 45nm.... so.
64 * 30 milllion = 1.92 billion trannies.
and 1.92 billion trannies at 45nm = 503mm^2 (or something like that.)
So something isnt adding up? Is that cache trans are cheaper / smaller than logic?
Dunnington's transistor count is nearly two thirds cache, and the cache could fill less than half the die.
Dunnington's die size also includes a very bulky system interface that ties all the cores together. As has been mentioned, the non-core elements of Larrabee have not been factored into the area and transistor count estimations, while the hard numbers for Dunnington have figured them in by default.
Larrabee's cache transistor count per core is maybe half, and the cache might take up a quarter of a core block's area.
The question is how densely the logic can be packed, especially with the big stonkin' vector unit.
The ratio of highly compressed cache versus more expanded logic is lower for Larrabee.
Atom is the closest contemporaneous x86 to Larrabee. There are marked differences, so the comparison must be very, very loose, but it is much closer philosophically to Larrabee than, say Core2.
There a bunch of factors, such as the FSB interface, less dense L1, modular layout, and different circuit design targets that can make Atom less dense than it could be, but the idea that it could be up more than 3 times too large compared to Larrabee on the exact same process node just doesn't sit well with me.
Maybe Intel has done a fantastic job of compressing the core, otherwise the physical constraints make such a chip too big.
At 32nm, 64 cores sounds far more doable.
I don't think it's close enough, the designs and processes are very different and difficult to compare. One chip already exists, and the other at least publically does not.So Nvidias GT200 wouldnt be a good philosophical comparison then?
Plus whatever transistors and area NVIO takes up, unless Larrabee has a separate chip for that too.1.4 billion - 65nm - 570mm^2
That assumption is one that hasn't held true for some time, particularly for logic.Assuming linear scaling we see that...
Your basis with the GPU uses area figures for a chip with a lot of area not devoted to computation cores.1.9 billion - 45nm - 380mm^2
From this perspective a 64 core 45nm part could be reached with 500mm^2.
So Nvidias GT200 wouldnt be a good philosophical comparison then?
1.4 billion - 65nm - 570mm^2
Assuming linear scaling we see that...
1.9 billion - 45nm - 380mm^2
From this perspective a 64 core 45nm part could be reached with 500mm^2. Granted it would be INSANE! Im sorry I am being naive, I realise there are complex variables I dont understand that effect all this.
Also, NVIDIA's GT200 has a very very low density (think non-fill-cell-area/total-area), so NV can do much better in terms of perf/mm^2, etc. I would think that something with a very optimal, regular layout can achieve much much more (like RV770, sorry for the low blow NV )
Hey, only the truth hurts! (and this case, that means it probably hurts a lot...) - I think it's pretty mind blowing that they decided to go down that road to maximize yields, while simultaneously having *no* coarse-grained redundancy for the GTX 280 and presumably little fine-grained redundancy. Doesn't make any sense to me unless for a bizarre reasons parametric yield problems increase super-exponentially as you increase density. Of course, maybe their flow encouraged non-regular but I suspect that's probably not the main issue.Also, NVIDIA's GT200 has a very very low density (think non-fill-cell-area/total-area), so NV can do much better in terms of perf/mm^2, etc. I would think that something with a very optimal, regular layout can achieve much much more (like RV770, sorry for the low blow NV )
Well, considering that Larrabee is not a 'thoroughbred' GPU, but relies on software threads to implement certain functionality, I don't think the perf/mm^2 will be anywhere near optimal for Larrabee.
Hey, only the truth hurts! (and this case, that means it probably hurts a lot...) - I think it's pretty mind blowing that they decided to go down that road to maximize yields, while simultaneously having *no* coarse-grained redundancy for the GTX 280 and presumably little fine-grained redundancy. Doesn't make any sense to me unless for a bizarre reasons parametric yield problems increase super-exponentially as you increase density. Of course, maybe their flow encouraged non-regular but I suspect that's probably not the main issue.
Nearly by definition, higher flexibility implies a larger ratio of control logic. While this should not be exaggerated, I think it would be hard to argue that helps to improve density... (control logic is neither naturally regular nor easy to tweak manually at a fine level). I don't disagree with you at all that Intel has a natural advantage here, but they don't have everything going for them either.That's a very good point. Architecturally, Larrabee isn't very efficient due to the fact that it's very general purpose. However, I bet that Intel can meet or surpass ATI's RV770 transistor density easily.
Well, if you really wanted to be a nice guy you could also argue they over-prioritized DX11 because of the Larrabee threat as well as 40/45nm because of the short G80->65->45nm time gap. However I'm not convinced they did that or got lazy. I prefer the simpler explanation that they just screwed up. At the same time AMD came out with a wonderful incremental improvement.I think that they were just lazy because of no competition