LiXiangyang
Newcomer
It seems that NVCC will use significant more regs by default, when build for Maxwell GPUs, yet the reg file size is the same as Kepler per SM.
I dont know whether it means Maxwell simply dont care occuppyies that much due to L2 cache and possibily reduced instruction latencies, or actually they have no choice but use that much regs due to they cut corners here and there too much (L1?).
https://devtalk.nvidia.com/default/...hats-new-about-maxwell-/post/4127010/#4127010
Also, notice, that DriverQuery come with CUDA 6.0 reports the maximum SMEM can be used per block is still 48 KB for maxwell cards, despite that Maxwell have 64 KB SMEM per SM, althrough according to above figure, NVCC can build SM5.0 objs even if a block ask for 64KB SMEM, so dont know if its the issue of driver, or NVCC, or both.
I dont know whether it means Maxwell simply dont care occuppyies that much due to L2 cache and possibily reduced instruction latencies, or actually they have no choice but use that much regs due to they cut corners here and there too much (L1?).
https://devtalk.nvidia.com/default/...hats-new-about-maxwell-/post/4127010/#4127010
Also, notice, that DriverQuery come with CUDA 6.0 reports the maximum SMEM can be used per block is still 48 KB for maxwell cards, despite that Maxwell have 64 KB SMEM per SM, althrough according to above figure, NVCC can build SM5.0 objs even if a block ask for 64KB SMEM, so dont know if its the issue of driver, or NVCC, or both.