Deep Learning for NLP – Part 7

Model Compression for NLP

In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs) networks, and Transformer based models like Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-training Transformer (GPT-2), Multi-task Deep Neural Network (MT-DNN), Extra-Long Network (XLNet), Text-to-text transfer transformer (T5), T-NLG and GShard.

What you’ll learn

  • Deep Learning for Natural Language Processing.
  • Model Compression for NLP.
  • Pruning.
  • Quantization.
  • Knowledge Distillation.
  • Parameter sharing.
  • Matrix decomposition.
  • DL for NLP.

Course Content

  • Introduction to Model Compression –> 4 lectures • 26min.
  • Compression of Deep Learning Models: Pruning –> 6 lectures • 1hr 13min.
  • Compression of Deep Learning Models: Quantization –> 5 lectures • 1hr 8min.
  • Compression of Deep Learning Models: Knowledge Distillation –> 7 lectures • 1hr 22min.
  • Compression of Deep Learning Models: Parameter sharing –> 5 lectures • 44min.
  • Compression of Deep Learning Models: Matrix decomposition –> 6 lectures • 40min.
  • Compression of Deep Learning Models: Applications, Summary and Future Trends –> 3 lectures • 31min.

Deep Learning for NLP - Part 7

Requirements

  • Basics of machine learning.
  • Basic understanding of Transformer based models and word embeddings.
  • Transformer Models like BERT and GPT.

In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs) networks, and Transformer based models like Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-training Transformer (GPT-2), Multi-task Deep Neural Network (MT-DNN), Extra-Long Network (XLNet), Text-to-text transfer transformer (T5), T-NLG and GShard.

These models are humongous in size: BERT (340M parameters), GPT-2 (1.5B parameters), T5 (11B parameters, 21.7GB), etc. On the other hand, real world applications demand small model size, low response times and low computational power wattage. In this course, we discuss five different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition) for compression of such models to enable their deployment in real industry NLP  projects. Given the critical need of building applications with efficient and small models, and the large amount of recently published work in this area, we believe that this course organizes the plethora of work done by the “deep learning for NLP” community in the past few years and presents it as a coherent story.

Compression for deep learning text models has gained a lot of interest in recent years both from the research community and the industry. Many business owners shy away from using deep learning models fearing the model sizes and infrastructure requirements. Mobile apps need to have a low RAM footprint and clearly a small power envelope. IoT (Internet of Things) and embedded systems related organizations have been investing significantly in designing machine learning solutions for resource constrained environments like sensors.

Researchers in the field of applied deep learning for text will benefit the most, as this tutorial will give them an exhaustive overview of the research in the direction of practical deep learning. We believe that the tutorial will give the newcomers a complete picture of the current work, introduce important research topics in this field, and inspire them to learn more. Practitioners and people from the industry will clearly benefit from the discussions both from the methods perspective, as well from the point of view of applications where such mechanisms are starting to be deployed. This tutorial can be considered an intermediate level tutorial where we assume the folks in audience to know some basic deep learning architectures. Prerequisite knowledge includes introductory level knowledge in deep learning, specifically recurrent neural networks models, and transformers. Also, basic understanding of natural language processing and machine learning concepts is expected.

Get Tutorial