Due to the state of AMD's driver optimizations DX10 games currently only scale well to 3 GPUs and not much beyond (Crysis/Bioshock), while DX9 games will generally scale better all the way up to 4 GPUs. We expected the opposite to be true but AMD provided us with technical insight as to why it is the case:
"The biggest issue is DX10 has a lot more opportunities for persistent resources (resources rendered or updated in one frame and then read in subsequent frames). In DX9 we only had to handle texture render targets, which we have a good handle on in the DX10 driver. In addition to texture render targets DX10 allows an application to render to IBs and VBs using stream out from the GS or as a traditional render target. An application can also update any resource with a copy blt operation, but in DX9 copy blt operations were restricted to offscreen plains and render targets. This additional flexibility makes it harder to maximize performance without impacting quality.
Another area that creates issues is constant buffers, which is new for DX10. Some applications update dynamic constant buffers every frame while other apps update them less frequently. So again we have to find the right balance that generally works for quality without impacting performance.
We are also seeing new software bottlenecks in DX10 that we continue to work through. These software bottlenecks are sometimes caused by interactions with the OS and the Vista driver model that did not exist for DX9, most likely due to the limited feature set. Software bottlenecks impact our multi-GPU performance more than single GPU and can be a contributing factor to limited scaling.
We’re continuing to push hard to find the right solution to each challenge and boost performance and scalability wherever we can. As you can see, there are a lot of things that factor in."