Deep Learning Training Efficiency

Deep Learning Requires Learning

From technology to manufacturing, from healthcare to insurance, discussions surrounding the impact of artificial intelligence on a host of industries continue to gather momentum. At present, a significant amount of attention is focused on training accurate models (based on specific data sets) as quickly as possible. And because AI involves a complex matrix of operations and the vast number of iterations, the performance of the computing platform is very critical—and we’re not just talking about GPUs. They’re important, of course, but there are also a multitude of other critical factors that need to be addressed, beginning with GPU processing power vs. data storage performance:

  • Understanding the characteristics of your chosen DL framework

  • Dual CPU systems present special challenges, so using two CPUs rather than one is only half the story

  • Local storage is a major source of bottlenecks in deep learning applications

  • Selecting the right NVMe drives

  • With data storage and multi-GPU systems, adding GPUs is an effective way of accelerating DNN model development

  • A clear understanding of the inherent latency of your platform

And because a mismatched storage sub-system can leave the GPU (or multiple GPUs) idle, leading to a severe loss of training efficiency, data richness and DNN training topics like aligning data type and storage sub-system performance or how to solve a latency problem, are also essential.

In order to be successful, data scientists working to achieve the fastest possible time-to-model need to fully understand the precise hardware configuration necessary for their specific data type. That’s why the new BOXX whitepaper, Deep Learning Training Efficiency: It’s Not Just About GPUs, is essential reading. From GPU processing and data storage performance to data richness and DNN training performance, the deep learning infrastructure experts at BOXX and Cirrascale Cloud Services provide the critical information you need to configure the right AI computing platform.