Multiple GPUs connected with Hypertransport links

CoolAsAMoose

Newcomer
How feasable would it be implementing a multi-GPU architecture the AMD Opteron way?

Integrate two 16-bit HT links on each GPU, enabling glueless connection of 4 and 8 (higher possible) GPUs. Limiting it to 2 GPU would require a single HT link making it more reasonable.

Would the bandwidth of the HT links (3.2 GB/s in each direction - fully duplex) be enough to evenly divide the tasks.

How should the work be split between the GPUs?

1) The work on the vertices are split between the GPUs vertex shaders. Some rough pre-calculation of which GPU will finally render the primitive may be done, minimizing the traffic on the HT links
2) Screen is split in 2 (or more, depending on number of GPUs). After the vertex shader step, the primitive is sent to the pixel shader responsible of rendering that primitive. In many case, "our guess" from step 1 is correct and the primitive is processed locally, otherwise it has to be sent over the HT link.
3) Each GPU renders its part of the screen to the local framebuffer.
4) One of the GPUs (the master) are responsible for updating the screen. The slave GPUs send the content of their backbuffer to the master over the HT link.

Any thoughts?

Yeah, I know that card level cost (as well as size) is the big problem with multi-GPU cards. But one is allowed to dream!

And a note: NVidia have experience with HT (NForce, XBox?) and multi-chip solutions (3dfx).
 
Any thoughts?

Rasterizers aren't the problem the majority of the time right now, bandwith is. Be it bandwith to handle resolution/color depth/FSAA or bandwith in moving data to the graphics chip to be processed bandwith is the big concern. In some instances you can end up with rasterization limits, but it really isn't very common right now(at least, not when looking at any IMRs).

The cost of adding another core and HT support would be better spent on a wider bus right now. When the bandwith problem is solved, you'd need a bit more then 3.2GB/sec for pretty much any of the setups you bring up. Alternating frames is likely the best route to go, although then you need to pretty much build two seperate boards on one making the cost quite prohibitive.

Which hurdle do you think needs to be solved that would benefit from dual rasterizer cores? Odds are that it would be cheaper to simply build a more powerful single chip solution outside of some limited useage scenarios(high end 3D CAD/Viz machines where geometric throughput is the limiting factor).
 
With two chips it'd be easier to have a 256-bit bus (Voodoo5 5500 anyone?).

The problem with that is load balancing. For example, in an FPS like Tribes / Tribes2, the top half of the screen is generally just sky, with maybe a little more stuff as well.
 
Back
Top