C++ AMP anyone?

Discussion in 'GPGPU Technology & Programming' started by codedivine, May 17, 2012.

  1. imaxx

    Newcomer

    Joined:
    Mar 9, 2012
    Messages:
    131
    Likes Received:
    1
    Location:
    cracks
    ...because once upon a time (start of '90), you had to pay for all compilers - MS included.
    MS lost half of its value in market (Ballmer rulez), so they probably try to go the greedy way...
    pecunia non olet
     
  2. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,824
    Likes Received:
    253
    Location:
    Taiwan
    Well, AFAIK gcc is always free.

    Of course, developer tools do cost serious money to make, but making it free serves an important purpose: to push your platform. Maybe Microsoft don't believe they need the push anyway :)
     
  3. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Well I guess it is better to say they want to push Metro instead of desktop apps, given that VC Express 11 will support only Metro apps.
     
  4. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Incorrect. C++ AMP works fine also in the free Visual Studio Express version (Metro is compatible with C++ AMP, assuming the documentation is correct). You can't use the free toolset to create traditional Windows applications anymore (but C++ AMP is not in the chopping block). I don't know if this is the proper thread to discuss Microsoft policies about the free Visual Studio Express, this thread is about C++ AMP. I dislike the limitations in the free version as much as anyone else here. Maybe there should be a separate thread to discuss the issue of Microsoft cutting off standard C/C++ development from the free development tools?

    I haven't personally tested C++ AMP in Metro applications, but it sounds like an fascinating idea, if it works also in devices with ARM CPUs (and mobile GPUs). Chips with DX 9_3 feature level obviously cannot run AMP by GPU, but mobile GPU developers have already announced DX 11 capable chips (first ones should be released later this year, maybe we have those in Windows 8 launch). Would be interesting to see how ARM Mali-T604 and PowerVR Rogue compare against AMD, Intel and NVidia in GPU compute.

    There's a chance that C++ AMP becomes an industry wide standard. The restrict keyword has been sent to the standards committee (it's the only new language feature required for C++ AMP programs to work). AMD has helped Microsoft a lot with the C++ AMP implementation, and it would be possible that they are interested in porting it to other platforms as well (just like they have been optimizing 7-zip and others with OpenCL and giving them free to community).
     
  5. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    If the DirectCompute compiler wouldn't kill it self to optimize the (valid ofc) AMP-code I feed it with, we may have had squish (DXT-library) as AMP-version today. But apparently I have to fight the compiler instead of passing on to make a BC6/7 compressor based on squish. :)

    The compiler is very sensitive to template programming currently as well, and bails out with "OMG too complex for me" error in a lot of cases.

    Additionally it seems the DCC(directccomputecompiler)-part is multithreaded and I guess because parts of it are in the driver stack it just saturates the machine at a non-responsive 100%, I always start to drop the priority of cl when it appears in the taskmanager, otherwise I'm locked out of the machine for the time needed to compile - squish-amp currently haven't been observed finishing compiling though. %^)

    It has it's hickups, and if the complexity/time problems are not adressed in the final product, it's just a unfulfilled promise of low hanging TFLOPS, which you pay with doing code-migration (from smart to simple, from compact to duplicated, from C++ to HLSL, etc.) to get them possibly if you're persistant.
     
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    For me it has been working well so far, but I haven't coded anything large with it, just a parallel prefix sum and a radix sorter. But I wouldn't be surprised if there are some bugs left as it's still beta after all.

    A link comparing DirectCompute to C++ AMP. This clearly shows that DirectCompute requires tons on boilerplate to get even a simple algorithm running. C++ AMP requires almost nothing in addition to the lambda that describes the kernel:
    http://blogs.msdn.com/cfs-file.ashx...B00_-AMP-for-the-DirectCompute-Programmer.pdf
     
  7. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    What I found absolute amazing is the ability to go from AMP to CPU, I wanted to debug the AMP code before first run, sadly, no AMP debugger without W8, so, what do? Implement software-AMP! :lol:

    Here's the amazing part:
    Code:
    #define    tile_static
    
        template<typename type>
        class array_view {
        public:
          int w, h; type *arr;
          array_view(int ww, int hh, type *aa) { w = ww; h = hh; arr = aa; }
          type& operator() (const int &x, const int &y) { return arr[y*w+x]; }
        };
    
        array_view<const unsigned int> sArr(iwidth, iheight, (const unsigned int *)texs);
        array_view<      unsigned int> dArr(cwidth, cheight, (      unsigned int *)texr);
    
        for (int groupsy = 0; groupsy < (owidth  / TY); groupsy++)
        for (int groupsx = 0; groupsx < (oheight / TX); groupsx++) {
          typedef type accu[DIM];
    
    //    tile_static UTYPE bTex[2][TY][TX];
          tile_static type  fTex[2][TY][TX][DIM];
          tile_static int   iTex[2][TY][TX][DIM];
          
          for (int tiley = 0; tiley < TY; tiley++)
          for (int tilex = 0; tilex < TX; tilex++) {
            const ULONG &t = sArr(posy, posx);
    
    And a few of those:

    Code:
    #if    defined(USE_AMP)
    #define local_is(a,b) elm.local == index<2>(a, b)
    #else
    #define local_is(a,b) ((ly == a) && (lx == b))
    #endif
    
    In 5 minutes I had a AMP-to-CPU switch. Now I'm going to proof-code squish - which I actuall got to finish compiling now, without the "iterative search" option, but 15 minutes it takes - change all the things I should adjust for the sake of the GPU, shorted 4-wide SSE vectors to 3-wide etc. pp. It can/should already run on 3 threads of 16, it's weird but fun to think the way you have to think for compute.

    Too bad I don't see a way to look at the HLSL (or IL), I'd love to get a feeling of the amount of insns, I'm totally clueless as to the complexity of the produced CS.

    BTW what happened with your BC6/7 efforts sebbi?
     
  8. CouldntResist

    Regular

    Joined:
    Aug 16, 2004
    Messages:
    264
    Likes Received:
    7
    Are we seriously comparing language agnostic API (CUDA, OpenCL) to a combination of specific language library and specific compiler extension?

    Positioning them as competitors is fallacious. Any hypothetical non Windows implementation of C++ AMP would have to be implemented in terms of OpenCL or CUDA.

    Have a look at ScalaCL. It's a combination of Scala library and Scala compiler extension, acting as abstraction layer over OpenCL. If you want to have something to compare C++ AMP with, knock yourself out.

    Similarly, you can't rely on C++ AMP if you are programming game with linux dedicated server, and want to use GPGPU for anything other than pixels.
     
    #28 CouldntResist, May 29, 2012
    Last edited by a moderator: May 29, 2012
  9. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Of course, but that's just an implementation detail. Nobody cares if there's OpenCL or CUDA or DirectCompute (or 1028 bit AVX2/gather CPU implemention) under the hood, as long as the code works properly, is easy to write, easy to maintain and easy use (and understandable). C++ AMP isn't meant to be a low level GPU API. It's a higher level API.
    C++ AMP is still beta, so you cannot even use it in Windows games yet. When it is out, you can use Windows Server 2008 R2 or 2012 to run your dedicated game servers, if you need to run C++ AMP in your server code as well. Or you can use C++ AMP solely for your client side code. It's really easy to port, as the kernels are pure C++ lambda functions (just define out those restrict keywords, and do wrappers for those array view templates).

    Also running any kind of GPGPU in a scalable server farm of virtual machines is currently very difficult. Most service providers do not offer any kind of GPGPU possibility, as GPU virtualization techniques are still very much under development. It will take several years (or maybe even a decade) until GPGPU gets common in server world (there are so many unsolved problems). Of course a big virtualized cloud of GPGPUs is a dream of many, but it will take time until we have solid products and service producers have modified their infrastructures. These changes cost a lot (and GPU programming is still a moving target. Differerent kinds of CPUs are pretty much identical, so the actual hardware can be easily hidden behind virtualization).
    Scala isn't C++. We have a million of lines of existing C++ code, and I want to use my existing C++ structures/templates/classes/enumerations/etc and the metaprogramming libraries inside my kernels. C++ AMP integrates very well to existing C++ projects, and it's easy to write reusable code with it. ScalaCL likely brings similar productivity boost to programmers that use Scala. That's a good thing (nobody wants to program with pure OpenCL or DirectCompute if they don't have to).

    I am not saying that C++ AMP is a perfect solution (especially if it does not get industry wide support behind it). I'd rather see a similar proposition from the C++ standards committee. C++ AMP at least gives the example of a proper C++ based parallel API with full compiler and IDE support. I personally do not find anything in it that feels like a hack. It's simple, easy to read, uses C++ constructs for everything and requires no boiler plate. Have you even tried to code with it before criticizing it?
     
    #29 sebbbi, Jun 2, 2012
    Last edited by a moderator: Jun 3, 2012
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    C++ AMP is built on top of Direct compute too.
    ScalaCL, as I understood it from a brief look, does not handle control flow.
     
  11. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,824
    Likes Received:
    253
    Location:
    Taiwan
    Apparently Microsoft reversed its decision and now there will be a Visual Studio Express Desktop edition.
     
  12. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,439
    Likes Received:
    280
    That's good news. At the very least I like debugging my command line apps in Windows.
     
  13. Miksu

    Regular

    Joined:
    Mar 9, 2003
    Messages:
    997
    Likes Received:
    10
    Location:
    Finland
  14. johnhamlen

    Newcomer

    Joined:
    Jun 12, 2012
    Messages:
    1
    Likes Received:
    0
    Dear Sebbbi. This is very good news indeed!

    I'm about to embark on my first spot of GPGPU development (Monte Carlo playouts for a chess program) and working out my weapon of choice. CUDA looked nice, but obviously Nvidia GPU specific, so was about to head down the OpenCL route when C++ AMP appeared on the horizon. Looks ideal for clean code, but not keen to spend a bundle on VC++ if I don't have to as this project is a just a hobby experiment.

    Was concerned about the following http://msdn.microsoft.com/en-us/library/60k1461a(v=vs.110)
    Sorry for the newbie question but can you explain what I need to get a VS Express development environment for C++ AMP up and running? E.g. Download VSE 2010 and plug in the the xxx library files from yyy.

    Many thanks in advance, John
     
  15. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,529
    Likes Received:
    108
    I think that reporting the breakage (including a repro) directly to the AMP team, through their dedicated blog, is probably the best way to get it fixed. In my experience they're quite responsive and quick to fix things / buggzorz. The conduit for doing the reporting would be http://blogs.msdn.com/b/nativeconcurrency/.
     
  16. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    869
    Likes Received:
    277
    You seem to be able to read minds, I just started getting 30 minutes compile-times on the TSP-solver. And the group-shared memory "aliasing" is completely braindead (it doesn't exist at all actually):

    Code:
    { /* scope 1 */
       tile_static int_4 lotsofmemory[512];
    }
    { /* scope 2 */
       tile_static int_4 lotsofmemory[512];
    }
    
    Bam, memory occupancy overflow. Unable to compile! :evil:
    Okay, let's try it manually:

    Code:
    tile_static union { 
       int_4 lotsofmemory1[512];
        float_4 lotsofmemory2[512];
     }
     
    Yay. The short-vector types have non-trivial constructors (for no reason AFAIS) and can't be placed in a union, damn. :evil:

    So currently, if you use different typed tile_static, then the sum of all shared mem of all scopes is your hard limit.

    I'll try and send my report with the TSP-solver when it's done, you're going to have fun with the code which is already 50% bigger than necessary because of "inconveniences" ...
     
    #36 Ethatron, Jun 20, 2012
    Last edited by a moderator: Jun 20, 2012
  17. Naurava kulkuri

    Newcomer

    Joined:
    Aug 12, 2012
    Messages:
    67
    Likes Received:
    0
    C++ AMP on Clang/LLVM, with OpenCL

    Hi!

    Now that I'm infrequently frequenting the forums, I'd like to point out there's a C++ AMP implementation on top of OpenCL, it uses LLVM infrastructure in general and Clang specifically. One can read more about it at Introducing Shevlin Park–A Proof of Concept C++ AMP Implementation on OpenCL.

    What comes to being Windows only, it's not necessarily a problem especially in bigger enterprises in which on any current time there's a rather diverse mix of technologies in use (a specific set of hardware would be more of a problem). What matters more is that the technology can find use in the rest of the stack, it has clear continuity for anything even mildly imporant that's being built (e.g. support for at least ten years) with some updates and that there are people with skills to handle the technology.

    In that regard, I have a feeling, C++ AMP fits the bill. Though in the context of most ordinary enterprise workloads, I think SQL server (2012) with Hadoop and StreamInsight will be used for the vast majority of cases.
     
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    I am glad to see C++ AMP gathering more support behind it :)

    But I wasn't honestly expecting Intel to port it OpenCL/LLVM/Clang. AMD desperately needs to make heterogeneous computing easy (language integration / syntax / debugging / ect) and cross platform. C++ AMP fits this bill perfectly. It also helps AMD against CUDA (as both OpenCL and DirectCompute are awkward to use / do not have as good debugging support compared to CUDA).

    But I can understand Intel's point of view as well. Haswell improves performance of stream computation on CPU drastically. Gather and dual FMA pipelines make Haswell stand out in stream benchmarks compared to Sandy/Ivy Bridge. And in the longer run Intel benefits even more, since stream computation scales automatically to high core counts. Intel has had problems in selling over 4 core CPUs to masses, as most programs do not take advantage of multiple cores (beyond two). Stream programming paradigm removes this barrier, and if it becomes really popular on CPU programming as well, Intel's plans on 1024 bit AVX can become reality much faster.
     
  19. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,529
    Likes Received:
    108
    AMD's success in software anything has pretty much equalled -1 historically. What they desperately need is, sadly, un-correlated with what they can do (or even what they understand as necessary). It was particularly embarrassing when they had slides that said "Bring C++ to GPUs, we did it -> see C++ AMP" - ugh, no guys, Microsoft did AMP. Also, this particular type of usage is a far better fit for CL, which was always intended as a low level thing on top of which people would build actually usable libraries, as opposed to the thing you use to write your day-to-day stuff.
     
  20. Naurava kulkuri

    Newcomer

    Joined:
    Aug 12, 2012
    Messages:
    67
    Likes Received:
    0
    The timeout ate my answer, but briefly again...

    This makes me wonder about AMD's HSAIL, which with its two-pass approach isn't that different in regard of what Microsoft is already doing with its cloud compiler. A short version I could find with quick searching can to be found at Sascha Goldstein's blog post BUILD Day 4: My Summary (1/2):
    A longer story at Subramanian Ramaswamy's presentation Deep Dive into the Kernel of .NET on Windows Phone 8. A point of interest could be post Going Native 2.0, The future of WinRT, which I just happened to come around.

    This does deliver already and I'm rather certain this compiler as SaaS approach won't stop here. Now there's even a Visual C++ Build Survey, which could bring about interesting changes to the build system (which, I gather, uses already a real database!).
     
    #40 Naurava kulkuri, Nov 19, 2012
    Last edited by a moderator: Nov 19, 2012
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...