Running (16 x 16 x 16) blocks of 512 empty threads...done
Running (16 x 16 x 16) blocks of 512 empty threads: 79.610 ms
Running clock() test...
kclock:
(3591554430, 3591554454): 24
kclock_test2: [10 blocks, 1 thread(s)/block]
kclock_test2: [30 blocks, 1 thread(s)/block]
Block 00: start: 3591605780, stop: 3591608096
Block 01: start: 3591606066, stop: 3591608382
Block 02: start: 3591609784, stop: 3591612100
Block 03: start: 3591605790, stop: 3591608106
Block 04: start: 3591606072, stop: 3591608388
Block 05: start: 3591609790, stop: 3591612106
Block 06: start: 3591605790, stop: 3591608106
Block 07: start: 3591606072, stop: 3591608388
Block 08: start: 3591609790, stop: 3591612106
Block 09: start: 3591605798, stop: 3591608114
Block 00: start: 3591616774, stop: 3591619090
Block 10: start: 3591620632, stop: 3591622948
Block 20: start: 3591616874, stop: 3591619190
Block 01: start: 3591616790, stop: 3591619106
Block 11: start: 3591620662, stop: 3591622978
Block 21: start: 3591620626, stop: 3591622942
Block 02: start: 3591616796, stop: 3591619112
Block 12: start: 3591616780, stop: 3591619096
Block 22: start: 3591616908, stop: 3591619224
Block 03: start: 3591616808, stop: 3591619124
Block 13: start: 3591616788, stop: 3591619104
Block 23: start: 3591620652, stop: 3591622968
Block 04: start: 3591616862, stop: 3591619178
Block 14: start: 3591616790, stop: 3591619106
Block 24: start: 3591616948, stop: 3591619264
Block 05: start: 3591616876, stop: 3591619192
Block 15: start: 3591616802, stop: 3591619118
Block 25: start: 3591616778, stop: 3591619094
Block 06: start: 3591616880, stop: 3591619196
Block 16: start: 3591616860, stop: 3591619176
Block 26: start: 3591616866, stop: 3591619182
Block 07: start: 3591616910, stop: 3591619226
Block 17: start: 3591620612, stop: 3591622928
Block 27: start: 3591620618, stop: 3591622934
Block 08: start: 3591620614, stop: 3591622930
Block 18: start: 3591616870, stop: 3591619186
Block 28: start: 3591616948, stop: 3591619264
Block 09: start: 3591620628, stop: 3591622944
Block 19: start: 3591620622, stop: 3591622938
Block 29: start: 3591616794, stop: 3591619110
Running pipeline tests...
Pipeline latency (512 dependent operations)
mul: 9228 clk (18.023 clk/warp)
Running pipeline tests...
K_ADD_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_RSQRT_FLOAT_DEP128 latency: 17950 clk (70.117 clk/warp)
K_ADD_DOUBLE_DEP128 latency: 6148 clk (24.016 clk/warp)
K_ADD_UINT_DEP128 throughput: 4666 clk (28.091 ops/clk)
K_RSQRT_FLOAT_DEP128 throughput: 32876 clk (3.987 ops/clk)
K_ADD_DOUBLE_DEP128 throughput: 32752 clk (4.002 ops/clk)
K_ADD_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_SUB_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_MAD_UINT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_MUL_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_DIV_UINT_DEP128 latency: 67596 clk (264.047 clk/warp)
K_REM_UINT_DEP128 latency: 67596 clk (264.047 clk/warp)
K_MIN_UINT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_MAX_UINT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_ADD_UINT_DEP128 throughput: 4662 clk (28.115 ops/clk)
K_SUB_UINT_DEP128 throughput: 4666 clk (28.091 ops/clk)
K_MAD_UINT_DEP128 throughput: 8224 clk (15.938 ops/clk)
K_MUL_UINT_DEP128 throughput: 8224 clk (15.938 ops/clk)
K_DIV_UINT_DEP128 throughput: 77310 clk (1.695 ops/clk)
K_REM_UINT_DEP128 throughput: 75536 clk (1.735 ops/clk)
K_MIN_UINT_DEP128 throughput: 9280 clk (14.124 ops/clk)
K_MAX_UINT_DEP128 throughput: 9796 clk (13.380 ops/clk)
K_ADD_INT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_SUB_INT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_MAD_INT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_MUL_INT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_DIV_INT_DEP128 latency: 77580 clk (303.047 clk/warp)
K_REM_INT_DEP128 latency: 76044 clk (297.047 clk/warp)
K_MIN_INT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_MAX_INT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_ABS_INT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_ADD_INT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_SUB_INT_DEP128 throughput: 4662 clk (28.115 ops/clk)
K_MAD_INT_DEP128 throughput: 8224 clk (15.938 ops/clk)
K_MUL_INT_DEP128 throughput: 8228 clk (15.930 ops/clk)
K_DIV_INT_DEP128 throughput: 95372 clk (1.374 ops/clk)
K_REM_INT_DEP128 throughput: 88822 clk (1.476 ops/clk)
K_MIN_INT_DEP128 throughput: 9280 clk (14.124 ops/clk)
K_MAX_INT_DEP128 throughput: 9286 clk (14.115 ops/clk)
K_ABS_INT_DEP128 throughput: 9298 clk (14.097 ops/clk)
K_ADD_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_SUB_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_MAD_FLOAT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_MUL_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_DIV_FLOAT_DEP128 latency: 162842 clk (636.102 clk/warp)
K_MIN_FLOAT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_MAX_FLOAT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_ADD_FLOAT_DEP128 throughput: 4662 clk (28.115 ops/clk)
K_SUB_FLOAT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_MAD_FLOAT_DEP128 throughput: 5432 clk (24.130 ops/clk)
K_MUL_FLOAT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_DIV_FLOAT_DEP128 throughput: 221442 clk (0.592 ops/clk)
K_MIN_FLOAT_DEP128 throughput: 9282 clk (14.121 ops/clk)
K_MAX_FLOAT_DEP128 throughput: 9280 clk (14.124 ops/clk)
K_ADD_DOUBLE_DEP128 latency: 6150 clk (24.023 clk/warp)
K_SUB_DOUBLE_DEP128 latency: 6148 clk (24.016 clk/warp)
K_MAD_DOUBLE_DEP128 latency: 6150 clk (24.023 clk/warp)
K_MUL_DOUBLE_DEP128 latency: 6148 clk (24.016 clk/warp)
K_DIV_DOUBLE_DEP128 latency: 173078 clk (676.086 clk/warp)
K_MIN_DOUBLE_DEP128 latency: 12292 clk (48.016 clk/warp)
K_MAX_DOUBLE_DEP128 latency: 12294 clk (48.023 clk/warp)
K_ADD_DOUBLE_DEP128 throughput: 32752 clk (4.002 ops/clk)
K_SUB_DOUBLE_DEP128 throughput: 32766 clk (4.000 ops/clk)
K_MAD_DOUBLE_DEP128 throughput: 32754 clk (4.002 ops/clk)
K_MUL_DOUBLE_DEP128 throughput: 32760 clk (4.001 ops/clk)
K_DIV_DOUBLE_DEP128 throughput: 258918 clk (0.506 ops/clk)
K_MIN_DOUBLE_DEP128 throughput: 65530 clk (2.000 ops/clk)
K_MAX_DOUBLE_DEP128 throughput: 65520 clk (2.000 ops/clk)
K_AND_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_OR_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_XOR_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_SHL_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_SHR_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_AND_UINT_DEP128 throughput: 4666 clk (28.091 ops/clk)
K_OR_UINT_DEP128 throughput: 4666 clk (28.091 ops/clk)
K_XOR_UINT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_SHL_UINT_DEP128 throughput: 8242 clk (15.903 ops/clk)
K_SHR_UINT_DEP128 throughput: 8242 clk (15.903 ops/clk)
K_UMUL24_UINT_DEP128 latency: 9234 clk (36.070 clk/warp)
K_MUL24_INT_DEP128 latency: 9234 clk (36.070 clk/warp)
K_UMULHI_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_MULHI_INT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_USAD_UINT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_SAD_INT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_UMUL24_UINT_DEP128 throughput: 9332 clk (14.045 ops/clk)
K_MUL24_INT_DEP128 throughput: 9334 clk (14.042 ops/clk)
K_UMULHI_UINT_DEP128 throughput: 8224 clk (15.938 ops/clk)
K_MULHI_INT_DEP128 throughput: 8222 clk (15.942 ops/clk)
K_USAD_UINT_DEP128 throughput: 8240 clk (15.907 ops/clk)
K_SAD_INT_DEP128 throughput: 8242 clk (15.903 ops/clk)
K_FADD_RN_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_FADD_RZ_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_FMUL_RN_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_FMUL_RZ_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_FDIVIDEF_FLOAT_DEP128 latency: 21260 clk (83.047 clk/warp)
K_FADD_RN_FLOAT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_FADD_RZ_FLOAT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_FMUL_RN_FLOAT_DEP128 throughput: 4662 clk (28.115 ops/clk)
K_FMUL_RZ_FLOAT_DEP128 throughput: 4662 clk (28.115 ops/clk)
K_FDIVIDEF_FLOAT_DEP128 throughput: 32908 clk (3.983 ops/clk)
K_DADD_RN_DOUBLE_DEP128 latency: 6148 clk (24.016 clk/warp)
K_DADD_RN_DOUBLE_DEP128 throughput: 32752 clk (4.002 ops/clk)
K_RCP_FLOAT_DEP128 latency: 74766 clk (292.055 clk/warp)
K_SQRT_FLOAT_DEP128 latency: 70688 clk (276.125 clk/warp)
K_RSQRT_FLOAT_DEP128 latency: 17950 clk (70.117 clk/warp)
K_RCP_FLOAT_DEP128 throughput: 93152 clk (1.407 ops/clk)
K_SQRT_FLOAT_DEP128 throughput: 90428 clk (1.449 ops/clk)
K_RSQRT_FLOAT_DEP128 throughput: 32884 clk (3.986 ops/clk)
K_SINF_FLOAT_DEP128 latency: 10248 clk (40.031 clk/warp)
K_COSF_FLOAT_DEP128 latency: 10248 clk (40.031 clk/warp)
K_TANF_FLOAT_DEP128 latency: 29708 clk (116.047 clk/warp)
K_EXPF_FLOAT_DEP128 latency: 27154 clk (106.070 clk/warp)
K_EXP2F_FLOAT_DEP128 latency: 22558 clk (88.117 clk/warp)
K_EXP10F_FLOAT_DEP128 latency: 27154 clk (106.070 clk/warp)
K_LOGF_FLOAT_DEP128 latency: 22552 clk (88.094 clk/warp)
K_LOG2F_FLOAT_DEP128 latency: 17950 clk (70.117 clk/warp)
K_LOG10F_FLOAT_DEP128 latency: 22552 clk (88.094 clk/warp)
K_POWF_FLOAT_DEP128 latency: 27232 clk (106.375 clk/warp)
K_SINF_FLOAT_DEP128 throughput: 32772 clk (4.000 ops/clk)
K_COSF_FLOAT_DEP128 throughput: 32778 clk (3.999 ops/clk)
K_TANF_FLOAT_DEP128 throughput: 98380 clk (1.332 ops/clk)
K_EXPF_FLOAT_DEP128 throughput: 32902 clk (3.984 ops/clk)
K_EXP2F_FLOAT_DEP128 throughput: 33012 clk (3.970 ops/clk)
K_EXP10F_FLOAT_DEP128 throughput: 32986 clk (3.974 ops/clk)
K_LOGF_FLOAT_DEP128 throughput: 32888 clk (3.985 ops/clk)
K_LOG2F_FLOAT_DEP128 throughput: 32882 clk (3.986 ops/clk)
K_LOG10F_FLOAT_DEP128 throughput: 32912 clk (3.982 ops/clk)
K_POWF_FLOAT_DEP128 throughput: 66116 clk (1.982 ops/clk)
K_INTASFLOAT_UINT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_FLOATASINT_FLOAT_DEP128 latency: 4620 clk (18.047 clk/warp)
K_INTASFLOAT_UINT_DEP128 throughput: 8220 clk (15.945 ops/clk)
K_FLOATASINT_FLOAT_DEP128 throughput: 8228 clk (15.930 ops/clk)
K_POPC_UINT_DEP128 latency: 5130 clk (20.039 clk/warp)
K_CLZ_UINT_DEP128 latency: 9228 clk (36.047 clk/warp)
K_POPC_UINT_DEP128 throughput: 8228 clk (15.930 ops/clk)
K_CLZ_UINT_DEP128 throughput: 9310 clk (14.079 ops/clk)
K_ALL_UINT_DEP128 latency: 12336 clk (48.188 clk/warp)
K_ANY_UINT_DEP128 latency: 12336 clk (48.188 clk/warp)
K_SYNC_UINT_DEP128 latency: 58 clk (0.227 clk/warp)
K_ALL_UINT_DEP128 throughput: 16484 clk (7.951 ops/clk)
K_ANY_UINT_DEP128 throughput: 16488 clk (7.950 ops/clk)
K_SYNC_UINT_DEP128 throughput: 108 clk (1213.630 ops/clk)
Pipeline latency/throughput with multiple warps (200 iterations of 256 ops)
K_ADD_UINT_DEP128:
1 warp ( 1 thr) 924000 clk (18.047 clk/warp, 0.055 ops/clk) Histogram { (18: 200) }
1 warp ( 2 thr) 924000 clk (18.047 clk/warp, 0.111 ops/clk) Histogram { (18: 200) }
1 warp ( 3 thr) 924000 clk (18.047 clk/warp, 0.166 ops/clk) Histogram { (18: 200) }
1 warp ( 4 thr) 924000 clk (18.047 clk/warp, 0.222 ops/clk) Histogram { (18: 200) }
1 warp ( 6 thr) 924000 clk (18.047 clk/warp, 0.332 ops/clk) Histogram { (18: 200) }
1 warp ( 8 thr) 924000 clk (18.047 clk/warp, 0.443 ops/clk) Histogram { (18: 200) }
1 warp ( 16 thr) 924000 clk (18.047 clk/warp, 0.887 ops/clk) Histogram { (18: 200) }
1 warp ( 24 thr) 924000 clk (18.047 clk/warp, 1.330 ops/clk) Histogram { (18: 200) }
1 warp ( 32 thr) 924000 clk (18.047 clk/warp, 1.773 ops/clk) Histogram { (18: 200) }
2 warps ( 64 thr) 924000 clk (18.047 clk/warp, 3.546 ops/clk) Histogram { (18: 400) }
3 warps ( 96 thr) 924400 clk (18.047 clk/warp, 5.317 ops/clk) Histogram { (18: 600) }
4 warps (128 thr) 924800 clk (18.051 clk/warp, 7.087 ops/clk) Histogram { (18: 800) }
5 warps (160 thr) 925600 clk (18.056 clk/warp, 8.850 ops/clk) Histogram { (18: 1000) }
6 warps (192 thr) 926000 clk (18.059 clk/warp, 10.616 ops/clk) Histogram { (18: 1200) }
7 warps (224 thr) 926800 clk (18.065 clk/warp, 12.375 ops/clk) Histogram { (18: 1400) }
8 warps (256 thr) 926916 clk (18.064 clk/warp, 14.141 ops/clk) Histogram { (18: 1600) }
9 warps (288 thr) 928328 clk (18.071 clk/warp, 15.884 ops/clk) Histogram { (18: 1800) }
10 warps (320 thr) 928742 clk (18.076 clk/warp, 17.641 ops/clk) Histogram { (18: 2000) }
11 warps (352 thr) 928930 clk (18.079 clk/warp, 19.401 ops/clk) Histogram { (18: 2200) }
12 warps (384 thr) 929168 clk (18.093 clk/warp, 21.160 ops/clk) Histogram { (18: 2400) }
13 warps (416 thr) 930940 clk (18.103 clk/warp, 22.879 ops/clk) Histogram { (18: 2600) }
14 warps (448 thr) 931248 clk (18.111 clk/warp, 24.631 ops/clk) Histogram { (18: 2800) }
15 warps (480 thr) 932606 clk (18.121 clk/warp, 26.352 ops/clk) Histogram { (18: 3000) }
16 warps (512 thr) 932754 clk (18.130 clk/warp, 28.104 ops/clk) Histogram { (18: 3200) }
K_MUL_FLOAT_DEP128 throughput: 4664 clk (28.103 ops/clk)
K_MAD_FLOAT_DEP128 throughput: 5374 clk (24.390 ops/clk)
KADD_MUL throughput: 4146 clk (31.614 ops/clk)
KADD_MUL2 throughput: 64 thrds 2570 clk (6.375 ops/clk)
++++++++++++++++++++++++++++++++++++++++++++++++++
K_SYNC_UINT_DEP128 latency: 58 clk (0.227 clk/warp)
K_SYNC_UINT_DEP128 latency: 60 clk (0.234 clk/warp)
K_SYNC_UINT_DEP128 latency: 62 clk (0.242 clk/warp)
K_SYNC_UINT_DEP128 latency: 64 clk (0.250 clk/warp)
K_SYNC_UINT_DEP128 latency: 66 clk (0.258 clk/warp)
K_SYNC_UINT_DEP128 latency: 68 clk (0.266 clk/warp)
K_SYNC_UINT_DEP128 latency: 70 clk (0.273 clk/warp)
K_SYNC_UINT_DEP128 latency: 72 clk (0.281 clk/warp)
K_SYNC_UINT_DEP128 latency: 76 clk (0.297 clk/warp)
K_SYNC_UINT_DEP128 latency: 78 clk (0.305 clk/warp)
K_SYNC_UINT_DEP128 latency: 80 clk (0.312 clk/warp)
K_SYNC_UINT_DEP128 latency: 82 clk (0.320 clk/warp)
K_SYNC_UINT_DEP128 latency: 84 clk (0.328 clk/warp)
K_SYNC_UINT_DEP128 latency: 88 clk (0.344 clk/warp)
K_SYNC_UINT_DEP128 latency: 90 clk (0.352 clk/warp)
K_SYNC_UINT_DEP128 latency: 94 clk (0.367 clk/warp)
Running register file test...
Max threads x regs/thread before kernel spawn failure.
[516 x 4 = 2064]
[516 x 8 = 4128]
[516 x 12 = 6192]
[516 x 16 = 8256]
[516 x 20 = 10320]
[516 x 24 = 12384]
[516 x 28 = 14448]
[516 x 32 = 16512]
[516 x 36 = 18576]
[516 x 40 = 20640]
[516 x 44 = 22704]
[516 x 48 = 24768]
[516 x 52 = 26832]
[516 x 56 = 28896]
[516 x 60 = 30960]
[516 x 64 = 33024]
[516 x 68 = 35088]
[516 x 72 = 37152]
[516 x 76 = 39216]
[516 x 80 = 41280]
[516 x 84 = 43344]
[516 x 88 = 45408]
[516 x 92 = 47472]
[516 x 96 = 49536]
[516 x 100 = 51600]
[516 x 104 = 53664]
[516 x 108 = 55728]
[516 x 112 = 57792]
[516 x 116 = 59856]
[516 x 120 = 61920]
[516 x 124 = 63984]