If that's the case then it can't be one dot per lane like in the GT200 diagram. Nvidia's diagrams had each SFU unit as being 4 lanes wide. In which case the SP:SFU ratio would have risen to 2:1.
Wouldn't SFUs mostly scale by texture unit, rather than SP?
If textures remain fixed at 8 per TPC, I would think 4 SFUs would suffice (assuming, as you say, that these aren't indicating per-lane). You could even have TUs grow to around 10 and be okay. That's roughly the same SFU/TU ratio (3/8 vs. 4/10) and similar SP/TU ratio (24/8 vs. 32/10). 160TUs total would be ... smooth.
It does make it seem like G[T]200's DP functionality was thrown in (this one seems integrated), and it makes me wonder whether this fellow would choke on inverse/division problems.
But if this thing really "only" has 512 ALUs, what sort of clocks would it need to be competitive with HD 5870, and more importantly the X2?
Well, at 2ops/clock, you'd have 2TF at 2Ghz. Wasn't that the original aim of the G200 chips? [sorry, my mind is fuzzy this early in AM]
Of course, for all we know, the diagram is hugely misleading, the four dots at the top handle branching and instruction re-ordering for multiple WARP/clock issue, and the SFUs are integrated into the data address and setup "bars". Yeah. And the 32 items are really register/cache and TUs, and there's just one big ALU at the bottom which runs at an effective speed of around 100Ghz, but using logic that doesn't require explicit clocking....
Heh.
-Dave