First, the CUDA and the Brook clients use a different code base, different algorithms and calculate different classes of proteins, it's hard to make good comparisons on that base.
This isn't true. The F@H GPU2 client distributes the same work units to NV and ATi hardware. The client works differently on Radeons and Geforces, but it is processing the same dataset.
Furthermore the ATI version of Folding was developed with an ancient Brook release without support for local memory. AFAIK that is the reason it does actually twice the number of calculations as it is cheaper to redo it than to store it somewhere in memory and load it again later. Newer Brook releases support the local memory of RV770 GPUs, but Stanford never bothered to update their code. The scaling from RV670 -> RV770 -> RV870 ist extremely bad (virtually non-existent), not exactly a sign of an optimal and forward looking coding.
That sounds like what I alluded to hearing previously.
For the record, I'm not here to evangelize NV hardware, I'd rather have a good option from both IHVs (or more, if Intel ever joins the playing field).
I guess this should not become a discussion of how AMD's devrel department works or should be working, so I won't say anything to those points.
I think it's a relevant observation to make though, given the state of the market. We've all heard the stories about NV devrel working closely with devs to make sure their software runs properly on Geforces but we rarely hear about ATi doing the same.