Originally Posted by Nick
All you need is a compiler hint that you're expecting a certain loop to be vectorized, and if it isn't then a descriptive warning should be generated (the same way __forceinline works).
What you are thinking of are only the low hanging fruits. Loops don't make the majority of code to be auto-vectorized. In fact till higher shader models there wasn't loops in HLSL for example.
The challenge for a auto-vectorizer is to look at a blob of seemingly serial code and break this in independent pieces (parallelizable) which then are overlayed and tries to find out if a match of operations can be achieved at the same moment in the sequence of operants. Then those operants can run in a vector.
Said that, auto-vectorization, the complex part, is fine-grained auto-parallelism of carefully aligned (matched) operations.
The loop-thing (and the regular code as well of course) can also be made more complex by introducing branches in the loop. Again, loops are not branch-free in the majority. They will have branches. Then the compiler need to be very clever if he can make the branches more or less independent of operation-streams after the branch, or if the branches can be converted into simple data-moves. Sometimes that is not possible.
When you ask for an auto-vectorizer as part of a compiler which only treats branchless loops, I doubt anyone producing compilers sees the real use in only that. One has to offer the whole thing. The whole thing in HLSL is easy, it's hard in C++, when you think of all the additons (templates, operator overloading, custom types etc.)
Your loop+no-branches can easily be implemented in a vector library (Havok did this, very elegant), no need to torture a compiler with it.