Does DirectX Raytracing API supports MultiGPU rendering?

Hi!

I am a beginner in DirectX world. The raytracing API is very impressive. However, I have a question, does it support multiGPU rendering? If so, how?

If the DirectX raytracing supports multiple rtx gpus, real time rendering could be even faster. I would like to have your suggestions.
 
I believe you can assign DXR to different mgpu instances. So I believe the basic answer is yes. With dx12 you are in nearly in full command of the gpus so you will ultimately decide how to put the image back together once the gpus are done.
 
Is this question just "Does raytracing work with sli" or is it something different ?
Yeah, Davros is onto the same thought train I am... Are we talking about a homogeneous GPU platform (essentially multiples of the same card) or are we talking heterogeneous platform (an Intel Xe plus a Radon 6700 plus an NV 3070, all at the same time.)

iroboto's post above makes me think a heterogeneous platform might actually be feasible in DX12, if an application developer wanted to make it work. Is that true? Crazy if so...
 
Yeah, Davros is onto the same thought train I am... Are we talking about a homogeneous GPU platform (essentially multiples of the same card) or are we talking heterogeneous platform (an Intel Xe plus a Radon 6700 plus an NV 3070, all at the same time.)

iroboto's post above makes me think a heterogeneous platform might actually be feasible in DX12, if an application developer wanted to make it work. Is that true? Crazy if so...
Yep, possible I think, I was there in person when Max McCullen presented it IIRC.

https://www.pcgamer.com/directx-12-will-be-able-to-use-your-integrated-gpu-to-improve-performance/
I guess there are different ways to cut it up, but it's a lot of work for a developer to take on. Seems to only make sense if you know everyone has the same configuration.
 
I believe I've seen either that article, or one very similar to it, a few years back. Specifically focused around using the iGPU which comes with most entry and midrange CPUs these days. Agree with your summary: looks like you can do almost as much as you feel like you want to bite off...
 
does it support multiGPU rendering? If so, how?
My guess is it won't work well enough to be a win in most cases.
We could generate BVH on GPU1, copy it to GPU2, so both do not need to access VRAM of the other while tracing. Sounds practical if we can generate all BVH at level load, but not so much if we have open world and constantly generate new geometry, causing unsteady transfers. (idk if compatible or equal GPUs with some link can do such transfers faster)
We could also generate BVH on both GPUs. But then we duplicate both memory and processing time.
I'd rather try to keep RT on the stronger GPU entirely, and use the second for other work which can run independently. E.g. using iGPU for physics acceleration, audio, eventually postprocessing, shadow maps, etc.

The question feels a bit rhetorical, now that we have to be happy if people can afford a single dGPU. But on the other hand, afaik AMD plans to put iGPU on most future CPUs like Intel does. So dGPU+iGPU migh become a standard configuration we can rely on.
 
I have not much idea about it, but it could be useful and practical to denoisse in the iGPU?
To much data movement?
 
To much data movement?
Usually you hide the movement behind latency. So you lag one frame behind with display, then you get 16ms to transfer stuff 'for free'.
Combining denoising with other post processing tasks makes sense as well ofc.
But i lack personal experience. Currently debug transfers take the most of my frame time, but did not try to fix this yet.
 
Is this question just "Does raytracing work with sli" or is it something different ?
For this, lets think (1) I have two RTX 3090 GPU card in two different computers, with different Motherboard configuration, and also (2) I have one machine with two GPUs (again RTX 3090) with NVLink connection bridge.
 
We could generate BVH on GPU1, copy it to GPU2, so both do not need to access VRAM of the other while tracing. Sounds practical if we can generate all BVH at level load, but not so much if we have open world and constantly generate new geometry, causing unsteady transfers. (idk if compatible or equal GPUs with some link can do such transfers faster)
GPUs with a direct bridge can do sufficiently fast transfers, at 15-30GB/s (or 60GB/s latest gen). But the key feature is no the bandwidth, the keyfeature is not screwing up latency for everything CPU communication based. Because even though PCIe is full-duplex, you still have no chance of getting the scheduling right to *achieve* that (very poor design choices on the driver side, regarding how they interpreted copy queues!), and you often run into the situation that you risk getting stalled by a bulk transfer while trying to issue work to unrelated units...

Bonus points for something like full NVLink which allows almost transparent unified memory space, at a somewhat neglible overhead for *not* hitting the local memory.
 
Rendering apps will certainly utilize multiple GPUs taking full advantage of the RT cores via Optix-compatible renderers.
 
Do you know if Vulkan does better here than DX12?
As far as I'm aware, no. Discussed that in a different thread before, but so far the only API for which the drivers got transfers about right was CUDA on NVidia GPUs. And that mostly because the stream syntax over there is explicitly only a frontend to true dependency graph based scheduling, acknowledging that the developer would do a horrible job at properly at grouping command buffers. (Don't get me wrong, command buffers have their uses if we talk about batching micro-grid kernel launches, as the performance for recording the buffer is everything for that use case. But everything not matching that label is a bad fit, API wise.)

And a clever driver internal optimization which reserves the DMA engines exclusively per direction and peer, sorting every single transfer to the correct engine, rather than dropping everything into a single resource pool. It's like the difference between half- and full-duplex Ethernet. Little difference in raw numbers, but one just works so much better that it's inconceivable to ever go back.
 
Back
Top