Model Compression

1 min readDec 11, 2022

The process of making a model smaller is called model compression.

While there are many new techniques being developed, the four types of techniques that is often used is as follows:

Low-Rank Factorization
Knowledge Distillation
Pruning
Quantization

Let’s discuss about all these techniques in detail:

Low-Rank Factorization: The key idea behind low-rank factorization is to replace high-dimensional tensors with lower-dimensional tensors.

2. Knowledge Distillation: It is a method in which a small model (student) is trained to mimic a larger or ensemble of models (teacher). The small model is what you’ll deploy.

3. Pruning: Pruning was a method originally used for decision trees where you remove sections of a tree that are uncritical and redundant for classification. As neural networks gained wider adopation, people started to realize that neural netwoks are over-parameterized and began to find a ways to reuce the workload caused by extra parameters.

4. Quantization: It is the most general and commonly used model compression method. It’s straightforward to do and generalizes over taks and architectures. Quantization reduces a model’s size by using fewer bits to represnet its parameters.

Model Compression

Written by Anjani Suman