So, sure, if your program was a huge bunch of dependent integer MULs, INT24 would be twice as fast as INT32. But that doesn't mean INT32 is emulated; one of the two units just isn't capable of it.
Well, it's a theory that's testable (assuming driver works correctly in extracting ILP/co-issue) Write an OGL shader that uses the U24 MUL, with/without perspective correction and see what happens. Then try it with the 32-bit MUL. with/without perspective correction as a control group. I don't buy the theory of a separate perspective-correct MUL unit, since you'd have to compute 1/w first, wait for it to finish, and then issue the MUL, so it doesn't make much sense to make it separate from the SF unit. Rather, I think the SF unit is where the MUL is located, that is, it is used to do an RCP, followed by a MUL. If you ever find the "missing MUL", I bet you can't co-issue it with a SF or interpolant read. Whether the main MADD unit can do full 32-bit integer MULs at full speed is another question that hopefully someone will test. Given that 32-bit ADD is supported, it would seem probable, but you never know.
As for physics acceleration, call me back when I can use one in an indie game without paying $, okay? Havok FX actually has an extra price compared to plain Havok. On the plus side of things, I think CUDA will hopefully popular within the open source community, and engines like Bullet or ODE, or other ones, will get some much needed love from GPGPU.
It remains to be seen, however, what exactly is NVIDIA's business model for CUDA, of course...
I don't think it will be any different than Cg. I think CUDA will be free when it exits NDA, but Quantum Physics won't, just like Gelato isn't. It will then only be a matter of time before ODE et al is ported, although ODE IMHO needs alot of work to match half of what Havok does. It's possible than QPE might be free or reduced cost to "The Way It's Meant to Be Played" developers, since inclusion of QPE practically drives demand to own an NVidia G8x card.