Qualcomm Roadmap (2011-2012)

The chairman of the OpenCL working group should advocate for OpenCL.
http://en.wikipedia.org/wiki/Neil_Trevett
He's also Vice President of Mobile Content at nVidia who you would think would have both knowledge and say on Tegra. If he was merely nVidia's OpenCL guy I could certainly see mixed messages coming from divisional infighting. But having a managing role on Tegra, he has to know what their true GPU compute plans are for mobile. When he says OpenCL will be nVidia's primary focus on mobile I have to believe its more likely that he's telling the truth rather merely tooting OpenCL's horn for the sake of his Khronos role or outright lying since he'd know if the true strategy was to go all-in on CUDA.

However, for the work I do, OpenCL's dual source model is fundamentally unworkable, and so I hope CUDA comes to fruition on mobile devices.
Originally he said mobile would be all OpenCL and he ended by qualifying to OpenCL being the primary tool on mobile. I guess CUDA will be available on mobile too, but they won't overplay their hand in promoting it to mobile developers.
 
Originally he said mobile would be all OpenCL and he ended by qualifying to OpenCL being the primary tool on mobile. I guess CUDA will be available on mobile too, but they won't overplay their hand in promoting it to mobile developers.

I've heard of a few interesting ideas considering the topic, but it's neither the time and place yet for details like that. The notion though that OpenCL will be the primary tool for mobile should be correct.
 
What do you mean by dual source model? That you need two languages? OpenCL plus something else.

Yes, you need OpenCL plus the host code written in C. OpenCL code is segregated from the rest of your application. I often write classes that have some methods written for the CPU, some for the GPU, and some for both. That integration is not possible with OpenCL.

There are several practical issues with OpenCL that keep me from using it:
1. It's too verbose. To check for error messages, for example, you have to load your OpenCL code from a file into a string, pass it to the OpenCL runtime, ask the OpenCL runtime to compile it, ask the runtime how many bytes of error messages were generated, allocate a buffer large enough to hold the error messages, ask the OpenCL runtime to copy the error messages into your buffer, and then print out the buffer. Every argument to a kernel has to be managed explicitly. I'm aware of the C++ bindings for OpenCL, they improve things considerably, but even so it's just a lot more work to use OpenCL.

2. C++ templates are essential to me. It's not just templating on datatype, it's also a mechanism for reusing parallel code by inserting your own operators. For example, a reduction operator can be inserted, with no runtime overhead, into a generic reduce function, (see Thrust, for example). This is hugely useful. I'm aware of Bolt, but Bolt has merely replaced the standard C++ template engine with a custom template engine; using Bolt involves littering my code with unnecessary macros and relying on another, orthogonal template engine. Addtionally, I need to be able to use template metaprogramming in my GPU code, and I'd prefer to do it in C++.
 
C++ templates are essential to me. It's not just templating on datatype, it's also a mechanism for reusing parallel code by inserting your own operators. For example, a reduction operator can be inserted, with no runtime overhead, into a generic reduce function, (see Thrust, for example). This is hugely useful. I'm aware of Bolt, but Bolt has merely replaced the standard C++ template engine with a custom template engine; using Bolt involves littering my code with unnecessary macros and relying on another, orthogonal template engine. Addtionally, I need to be able to use template metaprogramming in my GPU code, and I'd prefer to do it in C++.
I remember thinking Thrust looked interesting, but haven't used it. My understanding (probably from an AFDS video) is Bolt looks like STL. Maybe I'm being naive, but I figured it wouldn't be much more complicated than using STL and thus wouldn't involve littering the current code base.
 
Back
Top