Basically, each SM has a block of shared memory/L1 cache associated with it. In GP100, they doubled the L1/shared memory by having two blocks of it per what used to be a SM. Due to addressing/whatever, each half of the original SM can only see one of the blocks, so it behaves like 2 64-thread...