With multi-GPU solutions in the form of Crossfire/SLI there are all kinds of problems with scaling because of shared resources, cross-frame dependencies etc, plus that you get added latency. Or if you choose SFR you waste vertex computation power. Also, with current solutions resources need to be duplicated between the two cards / chips.
I was just thinking, at least for dual-chips cards, what if we would instead let the chips work in a "dual-core" fashion, just like on the CPU, with both chips accessing the same memory. For that to be performant I suppose we would still need twice the bandwidth, although we wouldn't need twice the amount of physical memory, which would be a significant cost saving.
The advantage of this would be the you could schedule independent rendering tasks to run in parallel on the different chips, like you would with different game code tasks on a dual/quad core CPU. On consoles you can already do this for the CPU part of the rendering by using command buffers, although in the end the GPU consumes the commands in sequence. With true threaded GPU side rendering you wouldn't really need to explicitely build command buffers, at least as long as the number of CPU and GPU rendering threads is the same.
I'm thinking this might scale better than Crossfire/SLI. There are a lot of rendering tasks that are independent on other tasks. For instance you could render each shadow map in a separate thread, reflection/refraction maps, and even post-effects if you let it trail by a frame. Or GPU physics for that matter, which would make more sense if we can break loose from the sequential rendering paradigm. Also, we wouldn't have the additional frame of latency of AFR, or several frames if you add more chips.
For this to work we would need some kind of GPU side synchronization mechanism so that one GPU thread could wait for results from another, like the GPU equivalent of WaitForMultipleObjects(), but that would only affect the GPU thread and the CPU thread would not have to wait.
Thoughts?
I was just thinking, at least for dual-chips cards, what if we would instead let the chips work in a "dual-core" fashion, just like on the CPU, with both chips accessing the same memory. For that to be performant I suppose we would still need twice the bandwidth, although we wouldn't need twice the amount of physical memory, which would be a significant cost saving.
The advantage of this would be the you could schedule independent rendering tasks to run in parallel on the different chips, like you would with different game code tasks on a dual/quad core CPU. On consoles you can already do this for the CPU part of the rendering by using command buffers, although in the end the GPU consumes the commands in sequence. With true threaded GPU side rendering you wouldn't really need to explicitely build command buffers, at least as long as the number of CPU and GPU rendering threads is the same.
I'm thinking this might scale better than Crossfire/SLI. There are a lot of rendering tasks that are independent on other tasks. For instance you could render each shadow map in a separate thread, reflection/refraction maps, and even post-effects if you let it trail by a frame. Or GPU physics for that matter, which would make more sense if we can break loose from the sequential rendering paradigm. Also, we wouldn't have the additional frame of latency of AFR, or several frames if you add more chips.
For this to work we would need some kind of GPU side synchronization mechanism so that one GPU thread could wait for results from another, like the GPU equivalent of WaitForMultipleObjects(), but that would only affect the GPU thread and the CPU thread would not have to wait.
Thoughts?