If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#26 |
|
Senior Member
|
No. That is not auto vectorization. That is variable packing more like it. Which is even more fragile.
|
|
|
|
|
|
#27 | |||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
I just don't think it will help auto-vectorization. Quote:
Quote:
I really don't follow your reasoning. Anything you write in a GPGPU language is basically just an implicit loop with independent iterations, right? So why not write that loop in plain C++, have the compiler detect that the iterations are independent, and then trivially vectorize it? A few keywords like 'restrict' or 'foreach' can help a lot with determining independence, but I don't think you need much else. Definitely not something as invasive and restrictive as C++ AMP. |
|||
|
|
|
|
|
#28 | ||||
|
Senior Member
|
Quote:
For me, auto vectorization is compiler detecting independent iterations of loops and generating SSE/AVX. Also, I don't consider these instructions to be vector. So obviously, a standard float4 class is not going to help auto vectorization. Quote:
Quote:
Quote:
|
||||
|
|
|
|
|
#29 | |||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Quote:
I don't think explicit SPMD is a good idea. Compilation should never fail. The thing is, software development is getting harder every year. So we need all the help from compilers we can get, and not have them make things more complicated. A large portion of developers will hardly care whether a loop was vectorized or not. Only if the performance doesn't meet the target, we need gentle tools to get the desired results. GPGPU hasn't taken a big flight yet because (a) it's very time consuming to learn and then rewrite your algorithms and tune them, and (b) there's a lot of fragmentation due to hardware-specific limitations/capabilities reflected in the languages/APIs, impeding a flourishing software economy. So there's a need to lower the bar and make things device-independent. Quote:
|
|||
|
|
|
|
|
#30 | |
|
Member
Join Date: Jan 2010
Posts: 375
|
Quote:
The challenge for a auto-vectorizer is to look at a blob of seemingly serial code and break this in independent pieces (parallelizable) which then are overlayed and tries to find out if a match of operations can be achieved at the same moment in the sequence of operants. Then those operants can run in a vector. Said that, auto-vectorization, the complex part, is fine-grained auto-parallelism of carefully aligned (matched) operations. The loop-thing (and the regular code as well of course) can also be made more complex by introducing branches in the loop. Again, loops are not branch-free in the majority. They will have branches. Then the compiler need to be very clever if he can make the branches more or less independent of operation-streams after the branch, or if the branches can be converted into simple data-moves. Sometimes that is not possible. When you ask for an auto-vectorizer as part of a compiler which only treats branchless loops, I doubt anyone producing compilers sees the real use in only that. One has to offer the whole thing. The whole thing in HLSL is easy, it's hard in C++, when you think of all the additons (templates, operator overloading, custom types etc.) Your loop+no-branches can easily be implemented in a vector library (Havok did this, very elegant), no need to torture a compiler with it. |
|
|
|
|
|
|
#31 | ||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Quote:
I'm afraid rpg.314 is right that we don't have consistent terminology here. I fully realize that you can also take the kernel code and try to find sequences of identical operations and put that in a vector. But in light of C++ AMP and how it can evolve back into plain C++ that is not what we're looking for. |
||
|
|
|
|
|
#32 |
|
Senior Member
|
Glorified VLIW. Scatter/gather/predication.
|
|
|
|
|
|
#33 | |||
|
Senior Member
|
Quote:
SPMD is no more harder than actually ensuring that the loop you wrote is actually parallelizable. But for a compiler to make sure that all loops suggested to be parallel actually are, is very much harder. Quote:
Quote:
|
|||
|
|
|
|
|
#34 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
It's not VLIW at all; there's only one opcode. Perhaps it's glorified SIMD. In any case it has the foundations for loop vectorization.
Quote:
|
|
|
|
|
|
|
#35 | |||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Compilation of amp-restricted functions can fail for many reasons:
- There's no support for char or short types, and some bool limitations apply as well. - There's no support for pointers to pointers. - There's no support for pointers in compound types. - There's no support for casting between integers and pointers. - There's no support for bitfields. - There's no support for variable argument functions. - There's no support for virtual functions, function pointers, or recursion. - There's no support for exceptions. - There's no support for goto. The list goes on, and there are also device-specific limitations on data size and such. Quote:
Quote:
This fragmentation really isn't helping the adoption of general purpose throughput computing. And an ecosystem in which code can be exchanged (commercial or otherwise) is close to non-existent. I can only see this change for the better when the language has minimal restrictions (preferably none at all) and abstracts the device capabilities. Vendor lock-in isn't going to work anyway and it's all evolving back to generic languages. Quote:
|
|||
|
|
|
|
|
#36 | ||||
|
Senior Member
|
Quote:
Quote:
Their amazing achievements notwithstanding. By refusing to change the language/programming model to match evolution of hw, we are back to automagical parallelization of generic C. There is no reason to believe that sucess is this regard will be any more than what has been achieved so far, no matter who works on it. Quote:
JVM and CLR have seen far more revisions that what vendor neutral GPU compute has seen so far. I simply don't see how these assertions hold up in front of established facts. Quote:
|
||||
|
|
|
|
|
#37 | ||||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Quote:
Quote:
Quote:
|
||||
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|