Cool, I guess I just don't get the point you're trying to make then.Believe me, I'm aware of all these issues and how well the various solutions map to the architecture.
FWIW I'm not saying that the current solution of CUDA threads not acting like tradition threads is necessarily a bad model going forward, but I am arguing that given the limitations it's more natural to think of the programming model in terms of SIMD, *not* in terms of independent threads. It's convenient to write it in a scalar fashion - no doubt - but it's fundamentally important to understand the underlying hardware to write anything nontrivial and useful.
Thus I will continue to assert that the choice of the term "thread" is a bad one compared to Khronos' more general "work item".