Ext3h
Regular
Thanks. __half2 is already the packed format (2x half precision in a 32bit word), contrary to the old __half which is just 16 significant bits in a 32 bit word.The code was generated from explicit PTX but CUDA intrinsics would have the same result.