Much likely NVIDIA scrapped the DP capabilities from Fermi for the smaller parts. In that process they would also need to change the schedulers etc. because the ALUs no longer need to work together together to process DP ops.42. Definitely 42.
Less seriously (what could possibly be more serious than 42?), I don't see how Fermi scales to anything not a multiple of 32 unless they spend more engineering resources on it than it's probably worth. As I said in the past (i.e. it wasn't likely then, it's less likely now, but that has never stopped me before!) I think it's more likely they scale the number of TMUs per block of 32 SMs to different multiples of 2.
The rumors are 3*16 ALUs + 2 quad TMUs per SM. That sounds very plausible to me. Especially since the ALU:TMU ratio of your proposal (2*16 ALUs + 2 quad TMUs) would be much too TMU heavy.