LiXiangyang
Newcomer
Kepler's 30% advantage in flops is very tiny in the grand scheme of things.
More caches, branch prediction helps all codes.
MIC does not have any branch prediction.
MIC has 2x more cache per core than Kepler. Which makes a lot of difference for everything.
In terms of FP32, its 130% more instead of merely 30%+, to be honest I was hoping MIC can perform better, due to its obviously large cache and supposedly better branch predications etc.
However, so far from my experience, it isnt, the reasons I suspect are:
1)Maybe 512bit SIMD is too wide a pure vector unit for even most parallel tasks, so in most cases a large part of the SIMD remaining ideal.
2) I am not entirely sure that Intel's SIMD can do FMA or not, if not, thats another disadantage comparing to Kepler in handling matrix maths.
3)Maybe the maths ops I tested (radix-sort, solving system of linear equations and other matrix-based maths operations) can barely benefit from large cache.
4)Due to the lack of out-of-order functionality, Intel's MIC isnt so much better than GK110 in handling complicate tasks.
5)Maybe the software support for MIC is not matured at the time I tested it. And I only play with MIC for a very short period of time, however I used alot MKL, so it is not likely the code cannot take advantage of MIC.
6)Intel has a reason to not to make MIC as good as it can be since it isnt that profitable for intel comparing to their other products(Xeon), and I suspect the main reason intel try to throw MIC is to prevent Nvidia from growing too big to handle.