How to optimize for SLI/CF ?

Rayne

Newcomer
I have been testing my own code on nVidia SLI & ATI CrossFire rigs, and the performance scaling sucked badly.

On the GTX 280 SLI, the performance was a 7% higher when running in SLI mode, and on the 4870 CF, the performance was 5 times worse in CF than in single card mode (from 100 to 20 fps) :(

I tried to change the number of back buffers from 1 to 3, but, this did not help (vsync off).

I also know that my application is not taxing the CPU, because it is using a 20-30% of 1 core only.

I use 4 render targers in my app, including the back buffer.

Should i duplicate the number of render targets too ? (ex: Even frame -> render to the RTs A0, B0, C0 & D0. Odd frame -> render to the RTs A1, B1, C1 & D1)

Thank you very much !
 
1) Render main scene to the RT A.

2) Render texture projection / water / ... effects to RT B (uses data from RT A).

3) Render to RT A (uses data from RT B).

4) Render shadows to RT C.

5) Render to RT A (uses data from RT C).

6) Render post processing effects to RT D (blooming, alpha blending, ...).

7) Render to RT A (uses data from D).

A -> B -> A -> C -> A -> D -> A

The RTs have different size & format. Low number of drawing primitives.

The vertex data remains unchanged. I use DrawIndexedPrimitive method.
 
The pseudo-code you showed shouldn't impact multi-GPU performance as long as the same sequence of events is rendered every frame. The key message for obtaining optimal multi-GPU performance is to avoid any frame dependencies, i.e. every frame should be self-contained. A frame dependency occurs when, for example, you need to read from a render target that was written to in the previous frame (without having been written to in the current frame first). Since the previous frame was rendered on the second GPU accessing this data in the current frame means a copy operation has to be performed between the two GPUs, and this is where performance suffers due to synchronization.
Do you have any frame dependency in your code?
 
Back
Top