If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
|
|
#1 |
|
Member
Join Date: Jan 2009
Posts: 215
|
Has anyone experimented with C++ AMP yet?
I downloaded and installed Visual Studio 11 beta, and tried some simple examples. Code is certainly cleaner, simpler than say OpenCL because a lot of the cruft is now implicit though support for some advanced scheduling etc. seems less clear. Performance is decent, requires a lot of manual optimization of kernels but that is no different than OpenCL. While the API is not really introducing any new compute capabilities over say OpenCL, and is limited to VS for the moment, but I do like that MS has put some thought into design and deployment. 1. Code is mostly statically compiled, guaranteed (almost) to work on any DX11 card, with the app vendor not having to worry about whether or not OpenCL driver is working etc. 2. TDRs (timeout detection and recovery) are a big problem for some OpenCL apps, with no good cross-vendor guarantees really. I have had my OpenCL apps crash because the kernel happened to run longer than say 2 seconds, invoking a TDR, and app not being given a chance to respond. Given the huge variety of hardware out there, if I am distributing my app binary, making sure that my kernel does not invoke TDR on any machine at all, is very hard. The only way is to ask end-users to disable TDR by editing registry. In C++ AMP, if a TDR happens, an exception is thrown which your app can catch and respond to. On Windows 8, you will also get the ability to programmatically declare that you want a card all-to-yourself disabling TDRs as long as that is not connected to a display for example. Finer grained context switching is also coming in the future with WDDM 1.2 3. You do get the benefits of integration into VS, which IMO is a great IDE. However, the big downside is of course that it is new, with not much of a library ecosystem and completely dependent on VS and Windows for now. |
|
|
|
|
|
#2 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
I've done a bit of work with it. I love it quite a bit, to be honest. It just fits in far more naturally than everything else I've tried.
What do you have in mind WRT the more advanced scheduling aspect? I don't think CL's: enqueMadnessandDamnationOutOfOrderOutOfSightOutOfC haracters(CL_GL_KL_DOOT_DOOT_EXT_KHR_ATI_AMD_NVIDI A_APPLE_SGI_S3_SIS_PRINT_LULZ, *ptr_to_struct, *ptr_to_array_of_structs_of_ptrs_to_enums, ...) queuing mechanism is notably more advanced, it's just annoyingly verbose (this seems to apply in general). You have control over how the submission queues attached to one accelerator_view or another are created (either immediate or deferred), and you have a few async ops that return convenient std::future<void>s(IIRC). There are also means to create markers in the API, for a bit of added oomph. Of course, the deferred queuing aspect is reliant on having deferred contexts working in D3D drivers, and that's not all that certain these days (I think NV are the only ones who do it).
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
#3 |
|
Regular
|
What does AMP do that C++ bindings for OpenCL or AMD's static C++ for OpenCL not do?
As long as AMP is Windows only, it's not going anywhere is it?
__________________
Can it play WoW? |
|
|
|
|
|
#4 | ||
|
Moderator
Join Date: Feb 2002
Location: Taiwan
Posts: 2,347
|
Quote:
Quote:
|
||
|
|
|
|
|
#5 | |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Quote:
As long as AMP is Windows only it's only going to be the choice for consumer apps, because it's much better to bet on MS actually putting something useful out and having some sort of direction, as opposed to hoping that the perpetual tug of war that is Khronos will eventually figure it out. That's a pretty big place to go to, IMHO, as it's the only other avaliable slot. CUDA has the "pro" people - the ones who actually have money in this, not the ones who just need to get another paper or thesis out to finish their terms, but the ones who are ready to give an IHV a large wad of cash. It's not by accident, or by lack of people trying to use OCL.
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
|
#6 |
|
Member
Join Date: Dec 2009
Posts: 171
|
I'm too .. and this is exactly the reason I would pick OpenCL over C++ AMP: it is available only for Windows, how can you consider portable something that is available for a single platform ?
I agree on the the "productive" argument: developing something more complex than a few lines long OpenCL kernel is very time consuming. OpenCL definitively needs to introduce a lot more features in this area. I have some doubt, GPU computing is driven by hardware and MS has no control at all. AMD/Intel/Apple/IBM should have more interest and the tools to push and innovate OpenCL. |
|
|
|
|
|
#7 |
|
Member
|
The context in which portability was discussed was AMP vs AMD's C++ OCL bindings. Not AMP vs OCL.
|
|
|
|
|
|
#8 |
|
Junior Member
Join Date: Jan 2010
Posts: 57
|
Could you show some examples of code?
For instance, apply Gaussian Blur on an image or something. Thanks. |
|
|
|
|
|
#9 |
|
Moderator
Join Date: Feb 2002
Location: Taiwan
Posts: 2,347
|
There are several samples provided by Microsoft here:
http://blogs.msdn.com/b/nativeconcur...-download.aspx |
|
|
|
|
|
#10 |
|
Member
Join Date: Jan 2010
Posts: 375
|
I also have some short fragment of OpenMP vs. CR (Concurrency runtime) vs. AMP:
Code:
/* forward/backward-z */
ULONG v = 0x00808000 | (TCOMPRESS_NINDEP(format) ? 0x00 : 0xFF);
ULONG c = (TCOMPRESS_SIDES (format) ? 0x00 : 0xFF) << 24;
#if defined(USE_AMP)
Concurrency::extent<1> num(texo.Width * texo.Height);
Concurrency::array_view<ULONG, 1> sArr(num, sTex);
Concurrency::parallel_for_each(num, [=](index<1> elm) restrict(amp) {
ULONG t = sArr[elm] | c;
/* black matte to forward-z */
/**/ if (( t & 0x00FFFFFF) == 0)
t = (t | v);
/* white matte to forward-z (can't be done here, as white is valid for partial derivatives)
else if ((~t & 0x00FFFFFF) == 0)
t = (t & v); */
sArr[elm] = t;
});
#elif defined(USE_CCR)
parallel_for(0, (int)(texo.Width * texo.Height), [=](int elm) {
ULONG t = sTex[elm] | c;
/* black matte to forward-z */
/**/ if (( t & 0x00FFFFFF) == 0)
t = (t | v);
/* white matte to forward-z (can't be done here, as white is valid for partial derivatives)
else if ((~t & 0x00FFFFFF) == 0)
t = (t & v); */
sTex[elm] = t;
});
#else
#pragma omp parallel for schedule(dynamic, 4) \
shared(sTex)
for (int y = 0; y < (int)texo.Height; y += 1) {
for (int x = 0; x < (int)texo.Width ; x += 1) {
ULONG t = sTex[(y * texo.Width) + x] | c;
/* black matte to forward-z */
/**/ if (( t & 0x00FFFFFF) == 0)
t = (t | v);
/* white matte to forward-z (can't be done here, as white is valid for partial derivatives)
else if ((~t & 0x00FFFFFF) == 0)
t = (t & v); */
sTex[(y * texo.Width) + x] = t;
}
}
#endif
|
|
|
|
|
|
#11 |
|
Senior Member
|
__________________
The views presented here are my own and not my employer's. Last edited by rpg.314; 25-May-2012 at 02:34. Reason: Pay $500 for C++ AMP? |
|
|
|
|
|
#12 | |
|
Member
Join Date: Mar 2003
Location: Finland
Posts: 933
|
Quote:
__________________
Mikael Koskinen blog: .NET Programming, Windows Phone Development, Software Architecture |
|
|
|
|
|
|
#13 | |
|
Member
Join Date: Nov 2007
Posts: 938
|
Quote:
I haven't personally tested C++ AMP in Metro applications, but it sounds like an fascinating idea, if it works also in devices with ARM CPUs (and mobile GPUs). Chips with DX 9_3 feature level obviously cannot run AMP by GPU, but mobile GPU developers have already announced DX 11 capable chips (first ones should be released later this year, maybe we have those in Windows 8 launch). Would be interesting to see how ARM Mali-T604 and PowerVR Rogue compare against AMD, Intel and NVidia in GPU compute. There's a chance that C++ AMP becomes an industry wide standard. The restrict keyword has been sent to the standards committee (it's the only new language feature required for C++ AMP programs to work). AMD has helped Microsoft a lot with the C++ AMP implementation, and it would be possible that they are interested in porting it to other platforms as well (just like they have been optimizing 7-zip and others with OpenCL and giving them free to community). |
|
|
|
|
|
|
#14 | ||
|
Registered
Join Date: Jun 2012
Posts: 1
|
Quote:
I'm about to embark on my first spot of GPGPU development (Monte Carlo playouts for a chess program) and working out my weapon of choice. CUDA looked nice, but obviously Nvidia GPU specific, so was about to head down the OpenCL route when C++ AMP appeared on the horizon. Looks ideal for clean code, but not keen to spend a bundle on VC++ if I don't have to as this project is a just a hobby experiment. Was concerned about the following http://msdn.microsoft.com/en-us/libr...1461a(v=vs.110) Quote:
Many thanks in advance, John |
||
|
|
|
|
|
#15 | |
|
Member
Join Date: Dec 2009
Posts: 171
|
Quote:
|
|
|
|
|
|
|
#16 |
|
Moderator
Join Date: Feb 2002
Location: Taiwan
Posts: 2,347
|
IIRC it's already the case for VS Express beta (at least when last time I checked, which was last year). However, I still think this is a bad idea. Of course, they are going to keep older VS Express around, but I don't understand why they make it this way.
Of course, w.r.t. C++ AMP anyone can implement their own version in their own compiler |
|
|
|
|
|
#17 |
|
Junior Member
Join Date: Mar 2012
Location: cracks
Posts: 53
|
|
|
|
|
|
|
#18 | |
|
Moderator
Join Date: Feb 2002
Location: Taiwan
Posts: 2,347
|
Quote:
Of course, developer tools do cost serious money to make, but making it free serves an important purpose: to push your platform. Maybe Microsoft don't believe they need the push anyway |
|
|
|
|
|
|
#19 |
|
Member
Join Date: Jan 2010
Posts: 375
|
If the DirectCompute compiler wouldn't kill it self to optimize the (valid ofc) AMP-code I feed it with, we may have had squish (DXT-library) as AMP-version today. But apparently I have to fight the compiler instead of passing on to make a BC6/7 compressor based on squish.
The compiler is very sensitive to template programming currently as well, and bails out with "OMG too complex for me" error in a lot of cases. Additionally it seems the DCC(directccomputecompiler)-part is multithreaded and I guess because parts of it are in the driver stack it just saturates the machine at a non-responsive 100%, I always start to drop the priority of cl when it appears in the taskmanager, otherwise I'm locked out of the machine for the time needed to compile - squish-amp currently haven't been observed finishing compiling though. %^) It has it's hickups, and if the complexity/time problems are not adressed in the final product, it's just a unfulfilled promise of low hanging TFLOPS, which you pay with doing code-migration (from smart to simple, from compact to duplicated, from C++ to HLSL, etc.) to get them possibly if you're persistant. |
|
|
|
|
|
#20 |
|
Member
Join Date: Nov 2007
Posts: 938
|
For me it has been working well so far, but I haven't coded anything large with it, just a parallel prefix sum and a radix sorter. But I wouldn't be surprised if there are some bugs left as it's still beta after all.
A link comparing DirectCompute to C++ AMP. This clearly shows that DirectCompute requires tons on boilerplate to get even a simple algorithm running. C++ AMP requires almost nothing in addition to the lambda that describes the kernel: http://blogs.msdn.com/cfs-file.ashx/...Programmer.pdf |
|
|
|
|
|
#21 |
|
Member
Join Date: Jan 2010
Posts: 375
|
What I found absolute amazing is the ability to go from AMP to CPU, I wanted to debug the AMP code before first run, sadly, no AMP debugger without W8, so, what do? Implement software-AMP!
Here's the amazing part: Code:
#define tile_static
template<typename type>
class array_view {
public:
int w, h; type *arr;
array_view(int ww, int hh, type *aa) { w = ww; h = hh; arr = aa; }
type& operator() (const int &x, const int &y) { return arr[y*w+x]; }
};
array_view<const unsigned int> sArr(iwidth, iheight, (const unsigned int *)texs);
array_view< unsigned int> dArr(cwidth, cheight, ( unsigned int *)texr);
for (int groupsy = 0; groupsy < (owidth / TY); groupsy++)
for (int groupsx = 0; groupsx < (oheight / TX); groupsx++) {
typedef type accu[DIM];
// tile_static UTYPE bTex[2][TY][TX];
tile_static type fTex[2][TY][TX][DIM];
tile_static int iTex[2][TY][TX][DIM];
for (int tiley = 0; tiley < TY; tiley++)
for (int tilex = 0; tilex < TX; tilex++) {
const ULONG &t = sArr(posy, posx);
Code:
#if defined(USE_AMP) #define local_is(a,b) elm.local == index<2>(a, b) #else #define local_is(a,b) ((ly == a) && (lx == b)) #endif Too bad I don't see a way to look at the HLSL (or IL), I'd love to get a feeling of the amount of insns, I'm totally clueless as to the complexity of the produced CS. BTW what happened with your BC6/7 efforts sebbi? |
|
|
|
|
|
#22 | |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,354
|
Quote:
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
|
|
#23 |
|
Member
Join Date: Aug 2004
Posts: 244
|
Are we seriously comparing language agnostic API (CUDA, OpenCL) to a combination of specific language library and specific compiler extension?
Positioning them as competitors is fallacious. Any hypothetical non Windows implementation of C++ AMP would have to be implemented in terms of OpenCL or CUDA. Have a look at ScalaCL. It's a combination of Scala library and Scala compiler extension, acting as abstraction layer over OpenCL. If you want to have something to compare C++ AMP with, knock yourself out. Similarly, you can't rely on C++ AMP if you are programming game with linux dedicated server, and want to use GPGPU for anything other than pixels. Last edited by CouldntResist; 29-May-2012 at 22:58. |
|
|
|
|
|
#24 | |||
|
Member
Join Date: Nov 2007
Posts: 938
|
Quote:
Quote:
Also running any kind of GPGPU in a scalable server farm of virtual machines is currently very difficult. Most service providers do not offer any kind of GPGPU possibility, as GPU virtualization techniques are still very much under development. It will take several years (or maybe even a decade) until GPGPU gets common in server world (there are so many unsolved problems). Of course a big virtualized cloud of GPGPUs is a dream of many, but it will take time until we have solid products and service producers have modified their infrastructures. These changes cost a lot (and GPU programming is still a moving target. Differerent kinds of CPUs are pretty much identical, so the actual hardware can be easily hidden behind virtualization). Quote:
I am not saying that C++ AMP is a perfect solution (especially if it does not get industry wide support behind it). I'd rather see a similar proposition from the C++ standards committee. C++ AMP at least gives the example of a proper C++ based parallel API with full compiler and IDE support. I personally do not find anything in it that feels like a hack. It's simple, easy to read, uses C++ constructs for everything and requires no boiler plate. Have you even tried to code with it before criticizing it? Last edited by sebbbi; 03-Jun-2012 at 11:23. |
|||
|
|
|
|
|
#25 | |
|
Senior Member
|
Quote:
ScalaCL, as I understood it from a brief look, does not handle control flow. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|