Quantization In Neural Networks

Photo by Clint Adair on Unsplash
  • Pruning Neural Networks
  • Deep Compression
  • Data Quantization
  • Low-Rank Approximation
  • Trained Ternary Quantization
  • What is quantization?
  • Why is quantization needed?
  • What is the memory requirement of a typical neural network?
  • What is the arithmetic complexity (number of parameters and operations) of a simple neural network?
  • What are different quantization methods?
  • Which frameworks support quantization?

Quantization in Neural Network

  • VGG-16 has 138 million parameters and performs 15300 million mult-add operations for single image classification with 71.5 % accuracy.
  • AlexNet has 60 million operators and performs 720 million mult-add operations with 57.2 % accuracy.
  • YOLOv3 performs 39 billion operations to process one image.
  • The MobileNet model has only 13 million parameters with the usual 3 million for the body and 10 million for the final layer and 0.58 Million mult-adds.
  • VGG-16 has over 500 MB requirements.
  • Alexnet has over 200 MB requirements.
  • Memory transfer speed improvement by ~ 4 times
  • Reduced storage requirements of the network graphs because memory used to store all the weights and bias is reduced by 4 times.
  • Power consumption reduction because of reduced memory access and increased compute efficiency. The general assumption is; larger the model, the larger the memory reference, and the larger would be energy consumption.
  • Compute performance gains

Two different approaches to performing quantizations

  1. Post-training Quantizations
    For example, a pre-trained 32-bit floating-point model is converted to an 8-bits quantized model.
  2. Quantize Aware training
    quantize aware training is considered to produce better model accuracy [ link]

Number of Parameters of a Simple Neural Network

  • weights: 10×8+8×8+8×10 = 224
  • Bias: 16+10 = 26 (There are 16 neurons in the hidden layers and 10 in the output layer.)
  • Total parameters: 224 + 26 = 250
Source: Image by Author

Frameworks Supporting Quantizations

  1. TensorFlow Lite converter [ link ]
  2. Pytorch Quantization [ link ]
  3. ONNX quantization [ link ]
  4. OpenVINO quantization [ link ]


  1. Speeding up Deep Learning with Quantization, link
  2. Why is 8-bit quantization, link
  3. Neural Network Quantization, link
  4. Pruning and Quantization, link
  5. Parameters calculation, link link
  6. Quantization in AI, link
  7. Quantization and deployment of neural networks, link
  8. The 5 Algorithms for Efficient Deep Learning Inference on Small Devices link




Imran is a computer vision and AI enthusiast with a PhD in computer vision. Imran loves to share his experience with self-improvement and technology.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Breaking Symmetry in Deep Learning

Takeaways from a young researcher at an International Machine Learning Conference

Regression and its variants: Simple, Multiple, LASSSO, Ridge and Stepwise Regression

Apple Silicon, run Scikit-Learn and TensorFlow on the new Macs M1

Reasoning and Machine Learning

The Case Against Mandatory Revise and Resubmit

Current ARR submission form.

<<<:(<<<<<Why I quite on using Faster RCNN on KITTI BEV Benchmark

Multi-class Classification — One-vs-All & One-vs-One

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Imran Bangash

Imran Bangash

Imran is a computer vision and AI enthusiast with a PhD in computer vision. Imran loves to share his experience with self-improvement and technology.

More from Medium

Python Implementation of Gradient Descent and Its Variants (Part 2)

Disease detection in oranges with Machine learning

Schema of a convolutional network for image classification


Human Segmentaion using UNet

Journey of Deep Learning-Part 1