webinar register page

Webinar banner
Distributed Deep Learning on Ibex GPUs
With increasing size and complexity of both Deep Learning (DL) models and datasets, the computational cost of training these model can be non-trival, ranging from few tens of hours to even several days. Exploiting data parallelism exhibited inherently in the training process of DL models, we can distribute training on multiple GPUs on a single or multiple nodes of Ibex. We discuss and demonstrate the use of Horovod, a scalable distributed training framework, to train DL models on multiple GPUs on Ibex. Horovod integrates with Tensorflow 1 & 2, Pytorch and MxNet. We also discuss the some caveats to look for when using Horovod for large mini-batch training.

Apr 11, 2021 01:00 PM in Riyadh

Webinar logo
Webinar is over, you cannot register now. If you have any questions, please contact Webinar host: Mohsin Shaikh.