Something I've been told: each SM has a dedicated double-precision MAD unit, so there's 30 in total. That's a surprise, 1/12th of single-precision, way less than I was expecting, 78 GFLOPs
Maybe that's what Rys was referring to.
That's another benefit of VLIW vs "scalar" -> Much easier to reconfigure the ALU's to do different things on the fly.
Either way, isn't 78 GFlops kinda weak even compared to CPUs?