Larrabee: 16 Cores, 2GHz, 150W, and more...

I believe that CSI uses PCIe 2.0 as its physical layer.

So hypothetically Intel can use either bus as is appropriate.
 
INKster: From the B3D news post: "Page 2 is much more interesting however, as they link to the presentation above and also uncover the hidden Larrabee PCB diagram on slide 16." - If you look closely, you'll notice that the PCB xtreview and tgdaily are pointing at is on that slide. It is perfectly legit.

What the slides imply is that there are two chips based on the Larrabee architecture, one aimed at the GPU/Gaming market and one aimed at the GPGPU/HPC market. The former has dedicated texture samplers & other 3D-specific hardware, and sports 16 cores. The latter sports 24 cores, but doesn't have those fixed-function units.

Unsurprsingly, the slides imply the GPU is uses PCIe Gen2, while the HPC chip is based on CSI. There are other possible explanations, but this one seems to be the most likely to me... Also, Gesher is a traditional CPU, also known as Sandy Bridge and it is Intel's next microarch after Nehalem.

Gesher has absolutely nothing to do with a coprocessor. Larrabee could be considered a coprocessor, but it looks extremely general-purpose so that might not be a good way to represent it. I'm not so sure what's so confusing in the presentation, really... I guess if you're assuming Polaris (aka the 80-cores chip, also mistakenly called TeraScale) has anything to do directly with Larrabee and Gesher (it might influence future architectures, but that's all), then you might manage to confuse yourself, heh.
 
Ok, thx for the clarification Arun! Do you know too if the Larrabee is like the Cell ( some general purpose cores + more simple SPEs to make some SIMD math operations ) or if is composed by full general-purpose x86/x64 cores?

Anybody know if that Larrabee could be used with C/OpenMP or will need DirectX/OpenGL shaders/dedicated assembly or API to be programmed? One of the slides I saw mention indirectly the parallel programming should be easy to do ( and also shows a DVD with the fortran compiler cover ) so I bet could use OpenMP ( which is very good supported on Intel compilers btw ) but is just speculation!

I think Intel hired the Quake4 raytracing guy to give them some feedback about it. If I could use C/OpenMP with enough cores and fast instructions could do raytracing(likes says the presentation!) very fast on the TeraScale! ( because the spatial structures are a pain currently on the GPU due to stackless restrictions, although DX10/Evans increased te instruction count and allows jumbo textures ).

Other question... the Intel Geneseo (CSI?) is the Torrenza equivalent, isn't it? Can be used like the HT3.0 to interconnect all these coprocesor cards without CPU intervention? If it's that, why is going Intel to use PCIe 2 for that Larrabee graphics card, is better?

And other more... The graphicsa card Larrabee slide shows two "BSI" connectors placed usually where the graphics cards place the SLI connectors... So I assume are for "SLI"... If the idea of HT3.0 and Geneseo is to plug multiple cards into the motherboard as coprocessors and to use the bus without CPU intervention.. why these connectors exist? Or perhaps aren't for SLI after all...

How reactioned NVIDIA and AMD about this? Larrabee/polaris/whatever sounds like "dangerous" for them ( although some webs mentioned NVIDIA was going to cooperate in the Larrabee design lol! )

I know, too much questions... let's use Mr.Sylar to open the Intel's engineers head and see what's inside :LOL:
 
Last edited by a moderator:
wow, 32k l1 with 3 cycle latency. thats enormous, compared with current gpgpus. is l3 expected to run at full frequency? Hopefully intel will stick with traditional aliasing logic, simply because of shear strength.
 
Can anyone clarify where the pdf hosted on xtreview.com came from precisely and maybe comment its provenance?

Arun mentions that it was "uploaded" in April...

Was it a confidential briefing that has leaked or just one of those briefings that initially flew under the radar?
 
it should be of no surprise what Intel is aiming for, and will continue to aim for:

circa 2+ years ago....

platform2015.jpg



Intel's CMP architectures will also provide the essential special-purpose performance and adaptability that future platforms will require. In addition to general-purpose cores, Intel's chips will include specialized cores for various classes of computation, such as graphics, speech recognition algorithms and communication-protocol processing. Moreover, Intel will design processors that allow dynamic reconfiguration of the cores, interconnects and caches to meet diverse and changing requirements.

Special Purpose Hardware
Over time, important functions once relegated to software and specialized chips are typically absorbed into the microprocessor itself. Intel has been at the forefront of this effort, which has been the driving force behind our business model for over 35 years. By moving functions on chip, such capabilities benefit from more-efficient execution and superior economies of scale and reduce the power consumption drastically. Low latency communication between special purpose hardware and general purpose cores will be especially critical to meet future processor architecture performance and functionality expectations.

Special-purpose hardware is an important ingredient of Intel's future processor and platform architectures. Past examples include floating point math, graphics processing and network packet processing. Over the next several years, Intel processors will incorporate dedicated hardware for a wide variety of tasks. Possible candidates include: critical function blocks of radios for wireless networking; 3D graphics rendering; digital signal processing; advanced image processing; speech and handwriting recognition; advanced security, reliability and management; XML and other Internet protocol processing; data mining; and natural language processing.

http://www.intel.com/technology/magazine/computing/platform-2015-0305.htm
 
That's definitely on the early side for Larrabee, though the process tech should be ramping by then.

At that price point, I'd hope it would be the lower-end variant of Larrabee (if that's what Intel is going to use so soon), because the theoretical specs on Larrabee's peak execution rates would make it a worse undershoot than the "meh" R600.

If it did come out that early, it would be somewhat more favorable in comparison to GPUs that are likely to be out at the time, though it would need a process node advantage to hit a temporary parity.
 
Is there even a need to point out that 49.5mmx49.5mm die would be the most insane thing ever?

With a die size that large, the idea of having poor yields is more than a bit scary...Then again, the architecture could lend itself well towards redundancy...so they'd be able to "recycle" die by allocating to lower-end parts...

Now, I wonder how much TSMC or UMC would charge a fabless semiconductor company for a die like that...lol... :oops:
 
It's been noted that the size on that slide is for the entire package, not just the chip.

I don't think Intel's standard fab lithography equipment has optical reticles that are wide enough for a chip that size.

Itanium is the biggest they make, and it's nowhere near that die size.
 
This looks extremely intruiging, for something like this to be coming from intel.

is this a smarter, more-mass-market-supportable Cell ?

How much do they lose for being x86 versus risc

How much do they gain for apparently having more graphics support (texture samplers), and more lower- latency threads.... (arguably easier to utilize than unrolled loops/SOA)


How will Cell stack up against this ( is Cell dead ? )

could this find it's way into xbox 720.

will it have propper cache control instructions :)

how many registers will they have with the in-order cores (i.e. no register renaming ??) ..

In need to read through all this in more detail again.
 
is this a smarter, more-mass-market-supportable Cell ?
Even if it's not, the rate of evolution is potentially higher for Larrabee.
The volumes are likely higher, and Intel has more fab capacity to burn.

There are a number of unknowns, such as how Larrabee will work out in silicon.

How much do they lose for being x86 versus risc
No FMADD, though this isn't a big problem if the chip can sport a MUL and an ADD pipeline.
The caches will be leaned on more heavily than a RISC would need to, thanks to the reg/mem operands and small register file.

x86 at Larrabee's clock speed has already been done, so that's not a huge problem.
Aside from having to hassle with register pressure more, much of x86's complications amount to little more than a few extra pipeline stages on a simple in-order core, some extra hardware, and slightly higher power draw.

On that account, a few stages is not killer, Intel can manage larger dies, and the high-end Larrabee's target power draw is already declared to be equally high.

There's other awkwardness to the ISA, but the vector extensions have not been discussed, and the they may be very significant.

How much do they gain for apparently having more graphics support (texture samplers), and more lower- latency threads.... (arguably easier to utilize than unrolled loops/SOA)
The graphics hardware would most likely keep Larrabee well ahead of Cell for graphics, and is about the only reason why it would be mentioned in the same paragraph as dedicated GPUs.

How will Cell stack up against this ( is Cell dead ? )
As a GPU, Cell is already a non-starter.

At 90nm Cell is ~200 gflops.
At 45nm Larrabee is ~1 tflop. (The range given in the slides is VERY wide, 0.2-1 tflop)
In an ideal world a Cell design scaled without significant design changes would be around 800.

However, Larrabee seems to be listed as having that massive throughput with DP precision, which is more than what Cell can do right now.

There are too many unknowns, given the wide range of possible clock speeds and core counts.
A future Cell was stated to be the same neighborhood, though I don't think that was DP.

will it have propper cache control instructions :)
The cache looks to be a very important design element. It seems likely that greater control will be present for caches.
With proper controlling instructions, Larrabee might negate much of the advantages that the LS offers Cell.

I'm still unsure of the exact arrangment of the caches. Someone said the L1 was write-through, which would be painful for a shared L2 cache. I'm not clear how the L2 is distributed.

What is still not mentioned is a DMA engine or other mechanisms for bringing in batches of data.

how many registers will they have with the in-order cores (i.e. no register renaming ??) ..
If working from x86, it's 8 GP registers and 8 SSE.
x86-64 is 16 of each.

Larrabee has been characterized as having a 512 bit vector FPU, which is 4 times the width of current SSE.

The number of registers, however, is still at most 16 unless they get Larrabee to run on a modified subset of x86.
 
That's definitely on the early side for Larrabee, though the process tech should be ramping by then.

At that price point, I'd hope it would be the lower-end variant of Larrabee (if that's what Intel is going to use so soon), because the theoretical specs on Larrabee's peak execution rates would make it a worse undershoot than the "meh" R600.

If it did come out that early, it would be somewhat more favorable in comparison to GPUs that are likely to be out at the time, though it would need a process node advantage to hit a temporary parity.

What makes you think that Larrabee is necessarily the only arrow in their discrete quiver? Surely that time frame would indeed suggest otherwise.
 
I don't know if Larrabee will be the initial discrete product, and it does sound early.

My intended statement was that Larrabee would be more impressive compared to GPUs of that time frame as opposed to GPUs that would be coming out later.

If it came out later, it would be running up against more powerful GPUs, and it would have a harder time making an impact in the discrete GPU space.

To complicate matters, if Intel's first product is very different from Larrabee, then Intel will have to deal with developmental whiplash when Larrabee is released.
 
To complicate matters, if Intel's first product is very different from Larrabee, then Intel will have to deal with developmental whiplash when Larrabee is released.

Mmm, maybe. Depends on the characteristics of that product and how closely they match current competitors rather than a third model that is neither current IHVs nor Larrabee.

But there might be other "goods" that they might see as more important. For instance, are they going to want AIB participation? If not in North America (and may there too), at least in Europe and Asia? AIB's probably have much better retail relationships for graphics products with retailers, both e- and b&m for introducing a new graphics product. Also, if they are suddenly going to need oodles of graphics memory, as we've seen reported, wouldn't you want those relationships to get going and become solidified? How about pcb manufacture and assembly?

I just look at that date and wonder if they might have a 'tweener discrete strategy in mind to help them ramp up in some of these areas before "the main show" with Larrabee debuts.
 
Back
Top