CoolAsAMoose
Newcomer
How feasable would it be implementing a multi-GPU architecture the AMD Opteron way?
Integrate two 16-bit HT links on each GPU, enabling glueless connection of 4 and 8 (higher possible) GPUs. Limiting it to 2 GPU would require a single HT link making it more reasonable.
Would the bandwidth of the HT links (3.2 GB/s in each direction - fully duplex) be enough to evenly divide the tasks.
How should the work be split between the GPUs?
1) The work on the vertices are split between the GPUs vertex shaders. Some rough pre-calculation of which GPU will finally render the primitive may be done, minimizing the traffic on the HT links
2) Screen is split in 2 (or more, depending on number of GPUs). After the vertex shader step, the primitive is sent to the pixel shader responsible of rendering that primitive. In many case, "our guess" from step 1 is correct and the primitive is processed locally, otherwise it has to be sent over the HT link.
3) Each GPU renders its part of the screen to the local framebuffer.
4) One of the GPUs (the master) are responsible for updating the screen. The slave GPUs send the content of their backbuffer to the master over the HT link.
Any thoughts?
Yeah, I know that card level cost (as well as size) is the big problem with multi-GPU cards. But one is allowed to dream!
And a note: NVidia have experience with HT (NForce, XBox?) and multi-chip solutions (3dfx).
Integrate two 16-bit HT links on each GPU, enabling glueless connection of 4 and 8 (higher possible) GPUs. Limiting it to 2 GPU would require a single HT link making it more reasonable.
Would the bandwidth of the HT links (3.2 GB/s in each direction - fully duplex) be enough to evenly divide the tasks.
How should the work be split between the GPUs?
1) The work on the vertices are split between the GPUs vertex shaders. Some rough pre-calculation of which GPU will finally render the primitive may be done, minimizing the traffic on the HT links
2) Screen is split in 2 (or more, depending on number of GPUs). After the vertex shader step, the primitive is sent to the pixel shader responsible of rendering that primitive. In many case, "our guess" from step 1 is correct and the primitive is processed locally, otherwise it has to be sent over the HT link.
3) Each GPU renders its part of the screen to the local framebuffer.
4) One of the GPUs (the master) are responsible for updating the screen. The slave GPUs send the content of their backbuffer to the master over the HT link.
Any thoughts?
Yeah, I know that card level cost (as well as size) is the big problem with multi-GPU cards. But one is allowed to dream!
And a note: NVidia have experience with HT (NForce, XBox?) and multi-chip solutions (3dfx).