is this a smarter, more-mass-market-supportable Cell ?
Even if it's not, the rate of evolution is potentially higher for Larrabee.
The volumes are likely higher, and Intel has more fab capacity to burn.
There are a number of unknowns, such as how Larrabee will work out in silicon.
How much do they lose for being x86 versus risc
No FMADD, though this isn't a big problem if the chip can sport a MUL and an ADD pipeline.
The caches will be leaned on more heavily than a RISC would need to, thanks to the reg/mem operands and small register file.
x86 at Larrabee's clock speed has already been done, so that's not a huge problem.
Aside from having to hassle with register pressure more, much of x86's complications amount to little more than a few extra pipeline stages on a simple in-order core, some extra hardware, and slightly higher power draw.
On that account, a few stages is not killer, Intel can manage larger dies, and the high-end Larrabee's target power draw is already declared to be equally high.
There's other awkwardness to the ISA, but the vector extensions have not been discussed, and the they may be very significant.
How much do they gain for apparently having more graphics support (texture samplers), and more lower- latency threads.... (arguably easier to utilize than unrolled loops/SOA)
The graphics hardware would most likely keep Larrabee well ahead of Cell for graphics, and is about the only reason why it would be mentioned in the same paragraph as dedicated GPUs.
How will Cell stack up against this ( is Cell dead ? )
As a GPU, Cell is already a non-starter.
At 90nm Cell is ~200 gflops.
At 45nm Larrabee is ~1 tflop. (The range given in the slides is VERY wide, 0.2-1 tflop)
In an ideal world a Cell design scaled without significant design changes would be around 800.
However, Larrabee seems to be listed as having that massive throughput with DP precision, which is more than what Cell can do right now.
There are too many unknowns, given the wide range of possible clock speeds and core counts.
A future Cell was stated to be the same neighborhood, though I don't think that was DP.
will it have propper cache control instructions
The cache looks to be a very important design element. It seems likely that greater control will be present for caches.
With proper controlling instructions, Larrabee might negate much of the advantages that the LS offers Cell.
I'm still unsure of the exact arrangment of the caches. Someone said the L1 was write-through, which would be painful for a shared L2 cache. I'm not clear how the L2 is distributed.
What is still not mentioned is a DMA engine or other mechanisms for bringing in batches of data.
how many registers will they have with the in-order cores (i.e. no register renaming ??) ..
If working from x86, it's 8 GP registers and 8 SSE.
x86-64 is 16 of each.
Larrabee has been characterized as having a 512 bit vector FPU, which is 4 times the width of current SSE.
The number of registers, however, is still at most 16 unless they get Larrabee to run on a modified subset of x86.