OpenCL 1.1 Specification Released

Discussion in 'GPGPU Technology & Programming' started by Broken Hope, Jun 14, 2010.

  1. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
    #1 Broken Hope, Jun 14, 2010
    Last edited by a moderator: Jun 14, 2010
  2. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    Appendix E – Changes

    E.1 Summary of changes from OpenCL 1.0

    The following features are added to the OpenCL platform layer and runtime (sections 4 and 5):

    Following queries to table 4.3
    o CL_DEVICE_NATIVE_VECTOR_WIDTH_{CHAR | SHORT | INT | LONG | FLOAT |
    DOUBLE | HALF}
    o CL_DEVICE_HOST_UNIFIED_MEMORY
    o CL_DEVICE_OPENCL_C_VERSION

    CL_CONTEXT_NUM_DEVICES to the list of queries specified to clGetContextInfo.

    Optional image formats: CL_Rx, CL_RGx and CL_RGBx.

    Support for sub-buffer objects – ability to create a buffer object that refers to a specific
    region in another buffer object using clCreateSubBuffer.

    clEnqueueReadBufferRect, clEnqueueWriteBufferRect and
    clEnqueueCopyBufferRect APIs to read from, write to and copy a rectangular region of
    a buffer object respectively.

    clSetMemObjectDestructorCallback API to allow a user to register a callback function
    that will be called when the memory object is deleted and its resources freed.

    Options that control the OpenCL C version used when building Pro-Agram executable.
    These are described in section 5.6.3.5.

    CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the list of queries
    specified to clGetKernelWorkGroupInfo.

    Support for user events. User events allow applications to enqueue commands that wait
    on a user event to finish before the command is executed by the device. Following new
    APIs are added - clCreateUserEvent and clSetUserEventStatus.

    clSetEventCallback API to register a callback function for a specific command
    execution status.

    The following modifications are made to the OpenCL platform layer and runtime (sections 4 and
    5):


    o CL_DEVICE_MAX_PARAMETER_SIZE from 256 to 1024 bytes
    o CL_DEVICE_LOCAL_MEM_SIZE from 16 KB to 32 KB.

    The global_work_offset argument in clEnqueueNDRangeKernel can be a non-NULL
    value.

    All API calls except clSetKernelArg are thread-safe.

    The following features are added to the OpenCL C programming language (section 6):

    3-component vector data types.

    New built-in functions
    o get_global_offset work-item function defined in section 6.11.1.
    o minmag, maxmag math functions defined in section 6.11.2.
    o clamp integer function defined in section 6.11.3.
    o (vector, scalar) variant of integer functions min and max in section 6.11.3.
    o async_work_group_strided_copy defined in section 6.11.10.
    o vec_step, shuffle and shuffle2 defined in section 6.11.12.

    cl_khr_byte_addressable_store extension is a core feature.

    cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics,
    cl_khr_local_int32_base_atomics and cl_khr_local_int32_extended_atomics
    extensions are core features. The built-in atomic function names are changed to use the
    atomic_ prefix instead of atom_.

    Macros CL_VERSION_1_0 and CL_VERSION_1_1.

    The following features in OpenCL 1.0 are deprecated:

    The clSetCommandQueueProperty API is no longer supported in OpenCL 1.1.
    The __ROUNDING_MODE__ macro is no longer supported in OpenCL C 1.1.

    The following new extensions are added to section 9:

    cl_khr_gl_event – Creating a CL event object from a GL sync object.
    cl_khr_d3d10_sharing – Sharing memory objects with Direct3D 10.

    The following modifications are made to the OpenCL ES Profile described in section 10:

    64-bit integer support is optional.
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I am confused. Byte addressable stores means that it is dx11 only. But lots of dxcs 5 features are missing.
     
  4. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    128
    Location:
    Taiwan
    What do you mean? Byte addressable store is hardly a DX11 only feature. G8X supports that, for example.
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I didn't know G80 did byte addressable stores. Got a source?
     
  6. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    128
    Location:
    Taiwan
    They supported it in CUDA from the beginning.
     
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I didn't know that. Thanks for this.

    So, atomics and byte addressable stores rolled into core. Some new APIs, better thread safety.

    vec_step(int3) =4 is a fugly hack. :(

    Still miles away from DXCS 5.0. Is an OCL 1.2 planned in the short term, ie before 2.0?
     
  8. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    128
    Location:
    Taiwan
    3D vector is just for convinience so you don't have to write something like:

    Code:
    a.x = b.x + c.x;
    a.y = b.y + c.y;
    a.z = b.z + c.z;
    on scalar devices as using 4D vectors perform unnecessary operation on w.

    On the other hand, what features are missing or "miles away" from DXCS 5.0 in your mind? There are a lot of features in OpenCL 1.1 (and even 1.0) not available in CS 5.0.
     
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Yes, but you could not do it on 1.0 device as it did not guarantee byte addressable stores.

    Append/consume buffers.

    Like?
     
  10. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    128
    Location:
    Taiwan
    Byte addressable store is an extension in 1.0. Many devices support that.

    They don't look very useful to me though.

    The [loop]/[unroll] directives are probably better examples. OpenCL should have an official extension for that (NVIDIA has their own #pragma unroll lifted from CUDA, but it's not an extension in OpenCL).

    For example, CS 5.0 does not have a byte or short data type. That means you can't use bytes in shared memory.

    Many new features in OpenCL 1.1 are not available in CS 5.0. For example, I don't think it's possible to create "sub-buffers" in CS. Events are also not available in CS AFAIK. Rectangular buffer access is also new.
     
  11. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    This is the feature I'm going to like more, it was really needed.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Byte-wise data in Evergreen's LDS requires a song and a dance because the hardware is natively operating on 32-bit values addressed on 32-bit boundaries.

    EDIT: Er, actually there are byte-addressing instructions in the ISA for LDS.
     
  13. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...