The following are known issues with the CUDA 6.0 Release Candidate that will be resolved in the production release:
‣ The minBlocksPerMultiprocessor parameter for the launch_bounds() qualifier only accepts values up to 16 when used in compiling for sm_50, even
though values up to 32 are possible on that architecture.
‣ There is a performance issue with the new SIMD video intrinsics __v*2() and __v*4() when used in compiling for the sm_50 architecture.
‣ The sm_50 architecture supports 48 KB of shared memory per block; however, the check for this limit is not functioning properly in the compiler. This can allow
programs that use more than 48 KB of shared memory per block to compile successfully, although they will fail to run because the driver component does check
the limit properly.
‣ The MT19937 random number generator in the cuRAND library generates non-deterministic results for curandGenerateUniformDouble().
‣ The NPP library function nppiAlphaComp_8u_AC4R() generates incorrect results when used with the NPPI_OP_ALPHA_ATOP_PREMUL option.
‣ The NPP library functions FilterSobelHorizSecondBorder() and FilterSobelVertSecondBorder() may generate incorrect results.