D
Deleted member 2197
Guest
GPUDirect Benchmarking - HPC-Works - Confluence (atlassian.net)
September 27, 2021
September 27, 2021
The GPUDirect RDMA technology exposes GPU memory to I/O devices by enabling the direct communication path between GPUs in two remote systems. This feature eliminates the need to use the system CPUs to stage GPU data in and out intermediate system memory buffers. As a result the end-to-end latency is reduced and the sustained bandwidth is increased (depending on the PCIe topology).
The GDRCopy (GPUDirect RDMA Copy) library leverages the GPUDirect RDMA APIs to create CPU memory mappings of the GPU memory. The advantage of a CPU driven copy is the very small overhead involved. That is helpful when low latencies are required.
...
Latency
Over 8.5X performance boosting is achieved when comparing up to 128B messages (with and without GPUDirect and GDR Copy) and over 5X performance boosting 128B-4KB.
Another observation that is clearly seen is that GDR copy provides a latency benefit for small messages.
GPUDirect RDMA by itself, for small messages is not enough for best performance.
...
GPUDirect manages to push the GPU bandwidth to maximum PCIe capacity. GDRCopy doesn’t influence bandwidth.