Pre-announcement: OpenCL BLAS

Discussion in 'GPGPU Technology & Programming' started by codedivine, May 5, 2012.

  1. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Just wanted to pre-announce a project I am working on: an OpenCL BLAS.

    1. First version will be released at end of May hopefully. Significant progress has been made. No support for complex types in the first release. And mostly focusing on BLAS3.

    2. Tested against AMD (radeon 5850), Nvidia (tesla c2050) and Intel (i7 920) implementations on Linux, and against AMD implementation on Win 64. Trying to procure a Radeon 79xx and a GTX 680 for performance testing as well.

    3. License will be Apache 2

    4. API similar to AMD's BLAS API.

    Please let me know if you are interested in beta testing and I may make something available to you in a week or two. If you are an OpenCL implementation provider, and would like to make hardware/software available to me to test, that will be appreciated as well. I am a grad student at McGill. You can also reach me at @codedivine on twitter.
     
  2. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    I want a HPL setup guide with OpenCL BLAS :)
     
  3. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Will look into it, but do remember that it is a GPU BLAS, not a drop-in replacement for the CPU BLAS. My BLAS is more like CUBLAS, where the data transfer is left upto the programmer.
     
  4. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    I'm curious - how are you dealing with performance portability? For example, the SGEMM routine I'd write for G80 would be very different than Tahiti, which would be different still from the one I'd write for GK104.

    Some choices:
    1. Don't worry about performance portability. This is a place to start, but likely will leave large performance factors (5x?) on the table.
    2. Provide a fallback implementation for unknown/obscure platforms, and custom implementations for important platforms. This is a lot of programming effort, and potentially introduces a lot of bugs.
    3. Create a code generator that attempts to autotune for a target architecture. There's been a lot of work on autotuning for BLAS, so this could work, but it might require even more work to get it stable.

    ...Something else?
     
  5. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Its 3. Yes, its a LOT of work :)
    When I started the project, I thought I would be done in 3-4 days but that turned out to be a HUGE underestimate :p
     
  6. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    Excellent. Are you doing the tuning at install time like FFTW or offline with a static set of profiles?
     
  7. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    I currently do not plan on distributing pre-tuned profiles, simply because it is a lot of work for an underpaid hungry grad student to do :razz:

    I do provide an API to tune the library. This allows an ISV who is bunding the library dll to call the tuning routines according to their wishes.

    I also provide an API that says "initialize with this profile", so an ISV can theoretically collect and distribute profiles themselves instead of doing tuning on customer machines, and they will have to implement logic to identify which profile to apply. (GPU product names are too confusing and too numerous, so I didnt bother).

    Finally, for scenarios where the library installation is standalone and not a part of an app bundle (for example on general Linux scenarios), I do provide the option to build a small binary executable that calls the tuning API to generate the profile. So I guess this binary should be called as part of an install process. Havent really written proper installation scripts etc yet (just build right now).
     
  8. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Removed. That link had incorrect results.
     
    #8 codedivine, Aug 17, 2012
    Last edited by a moderator: Oct 28, 2012
  9. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    You can track progress here. http://www.raijincl.org
    First alpha of SGEMM and DGEMM routines is up, though it does not currently handle transposed inputs.
     
    #9 codedivine, Oct 28, 2012
    Last edited by a moderator: Oct 28, 2012
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...