LiXiangyang
Newcomer
Yeah ISPC is clearly better, and it's better than the GPU computing languages too. If the whole persistent threads research has showed us anything it's that virtualizing (vs. parameterizing) the SIMD width is far too harmful to performance of non-trivial kernels.
Unfortunately HPC and other parties have too much sway as far as the standards go, and they are completely in the land of non-coding physicists who just want to trust compiler magic to get them a 2x even on 8+-wide SIMD... and again, I speak from experience on that as someone who has rewritten a lot of scientist-code
Very interesting, it does look like CUDA on CPU.
How much the performance gain in your application, comparing to pthreads/openmp with standard intel compiler optimization with auto-vectorization on? thanks.