Hey, nobody said a TMU was defined as something which outputs one bilinear result per cycle It could need two clocks to output one. Alternatively it could refer to the number of shader cores, of course... While this might seem to be marketing BS, there's nothing that prevents your TMUs being half-pixel at a HW level too.
Anyway this thing is too confusing given the die size claims, I give up. At the very least, the numbers in the Series5 paper must be wrong wrt fillrate (i.e. 12.5mm² doesn't refer to the same chip as 1000MP/s with 2.5x overdraw) otherwise 32mm² for 8 TMUs vs 12.5mm² for 2 TMUs represents a 56% perf/mm² increase in terms of fillrate, while the PR claims most of the perf gains are ALU-related so the perf/mm² increase would have to be even larger! Surely that can't be... Even DX10.1 support is unlikely to change things to such a degree.
Anyway this thing is too confusing given the die size claims, I give up. At the very least, the numbers in the Series5 paper must be wrong wrt fillrate (i.e. 12.5mm² doesn't refer to the same chip as 1000MP/s with 2.5x overdraw) otherwise 32mm² for 8 TMUs vs 12.5mm² for 2 TMUs represents a 56% perf/mm² increase in terms of fillrate, while the PR claims most of the perf gains are ALU-related so the perf/mm² increase would have to be even larger! Surely that can't be... Even DX10.1 support is unlikely to change things to such a degree.