webinar register page

Webinar banner
7th KAUST-NVIDIA Workshop: Best Practices for Distributed Deep Learning on Ibex
With increasing size and complexity of both Deep Learning (DL) models and datasets, the computational cost of training these model can be non-trival, ranging from few tens of hours to even several days. Exploiting data parallelism exhibited inherently in the training process of DL models, we can distribute training on multiple GPUs on a single or multiple nodes of Ibex. We discuss and demonstrate the use of Horovod, a scalable distributed training framework, to train DL models on multiple GPUs on Ibex. Horovod integrates with Tensorflow 1 & 2, Pytorch and MxNet. We also discuss the some caveats to look for when using Horovod for large mini-batch training.

Nov 24, 2020 01:00 PM in Riyadh

Webinar logo
Webinar is over, you cannot register now. If you have any questions, please contact Webinar host: Mohsin Shaikh.