Survey paper on Deep Learning on GPUs

sparsh

Newcomer
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications.

We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU and distributed system with multiple GPUs. It covers techniques for pruning, tiling, batching, the impact of data-layouts, data-reuse schemes and convolution strategies (FFT/direct/GEMM/Winograd), etc. It also covers techniques for offloading data to CPU memory for avoiding GPU-memory bottlenecks during training.

The paper is available here, accepted in J. of Systems Architecture 2019.
 
Back
Top