Just returned from the IDC, according to the intel guys:
1)The LLC cache arrangement of Phi is not like these found in intel CPU, LLC(which is L2 for phi) of Xeon Phi is local to each core, so for each core there is only 512kB L2 cache, instead of the 31MB number Intel promoted, any data cached that need to be accessed, that not avilable at the local L2 cache, will need to be transfered to the local L2 before accessing.
For comparison, GK110 has 1.5MB of L2 cache, but it is global cache like Intel's LLC on ivy bridge/sandy bridge CPUs, so its data is accessable to all gpu cores.
2)At least according to the intel guys at IDC, Intel has no plan to introduce programmable L1 cache into their future generation MIC co-processors.
3)Xeon Phi's SIMD unit is more or less the same as Haswell's AVX-2, just wider.
4)Unlike HT in CPU, hardware multi-threading on MIC is estenial for MIC to achieve peak performance.
5)Intel's guys here are very open to promote MIC's programmability comparing to Nvidia's offers, but remain tight-lipped regarding the performance comparison between the two products.
6) The card is likely to be cheaper than K20/K20X, but it is not for retail, only provided with whole system solution, and some company at IDC manage to pack 4 of these cards in one case with dual socket CPUs.