ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Neural Network Compression Framework for fast model inference

2021-07-08 16:01:15  阅读:272  来源: 互联网

标签:training inference compression Neural NNCF sparsity model Compression


论文背景

文章地址
代码地址

  • Alexander Kozlov Ivan Lazarevich Vasily Shamporov Nikolay Lyalyushkin Yury Gorbachev
    intel

名字看起来都是俄罗斯人

  • 期刊/会议: CVPR 2020

Abstract

基于pytorch框架, 可以提供quantization, sparsity, filter pruning and binarization等压缩技术. 可独立使用, 也可以与现有的training code整合在一起.

features

  • Support of quantization, binarization, sparsity and filter pruning algorithms with fine-tuning.
  • Automatic model graph transformation in PyTorch – the model is wrapped and additional layers are inserted in the model graph.
  • Ability to stack compression methods and apply several of them at the same time.
  • Training samples for image classification, object detection and semantic segmentation tasks as well as configuration files to compress a range of models.
  • Ability to integrate compression-aware training into third-party repositories with minimal modifications of the existing training pipelines, which allows integrating NNCF into large-scale model/pipeline aggregation repositories such as MMDetection or Transformers.
  • Hardware-accelerated layers for fast model fine-tuning and multi-GPU training support.
  • Compatibility with O p e n V I N O T M OpenVINO^{TM} OpenVINOTM Toolkit for model inference.

A few caveats and Framework Architecture

  • NNCF does not perform additional network graph transformations during the quantization process, such as batch normalization folding
  • The sparsity algorithms implemented in NNCF constitute non-structured network sparsification approaches. Another approach is the so-called structured sparsity, which aims to prune away whole neurons or convolutional filters.
  • Each compression method acts on this wrapper by defining the following basic components:
    • Compression Algorithm Builder
    • Compression Algorithm Controller
    • Compression Loss
    • Compression Scheduler
  • Another important novelty of NNCF is the support of algorithm stacking where the users can build custom compression pipelines by combining several compression methods.(可以在一次训练中同时生成稀疏且量化的模型)
  • 使用步骤
    • the model is wrapped by the transparent NNCFNetwork wrapper
    • one or more particular compression algorithm builders are instantiated and applied to the wrapped model.
    • The wrapped model can then be fine-tuned on the target dataset using either an original training pipeline, or a slightly modified pipeline.
    • After the compressed model is trained we can export it to ONNX format for further usage in the O p e n V I N O T M OpenVINO^{TM} OpenVINOTM inference toolkit

Compression Methods Overview

quantization

借鉴的方法有

  • QAT
  • PACT
  • TQT
q m i n q_{min} qmin​ q m a x q_{max} qmax​
Weights − 2 b i t s − 1 + 1 -2^{bits-1}+1 −2bits−1+1 2 b i t s − 1 − 1 2^{bits-1}-1 2bits−1−1
Signed Activation − 2 b i t s − 1 -2^{bits-1} −2bits−1 2 b i t s − 1 − 1 2^{bits-1}-1 2bits−1−1
Unsigned Activation0 2 b i t s − 1 2^{bits}-1 2bits−1

对称量化

scale是训练得到的, 用以表示实际的范围
在这里插入图片描述

非对称量化

训练优化float的范围, 0点为最小是
float zero-point经过映射后需要是在量化范围内的一个整数, 这个限制可以使带padding的layer计算效率高
在这里插入图片描述

Training and inference

和QAT, TQT不同, 论文中的方法并不会进行BN fold, 但是为了train和inference时的统计量一致, 需要使用大的batch size.(>256)

混合精度量化

使用HAWQ-v2方法来选择bit位,
敏感度计算方式如下:
在这里插入图片描述

压缩率计算方式: int8的复杂度/mixed-precision复杂度
复杂度 = FLOPs * bit-width

混合精度就是在满足压缩率阈值的情况下, 找到具有最小敏感度的精度配置.

Binarization

weights通过XNOR和DoReFa实现.

  • Stage 1: the network is trained without any binarization,
  • Stage 2: the training continues with binarization enabled for activations only,
  • Stage 3: binarization is enabled both for activations and weights,
  • Stage 4: the optimizer learning rate, which had been kept constant at previous stages, is decreased according to a polynomial law, while weight decay parameter of the optimizer is set to 0.

Sparsity

NNCF支持两只sparsity方式:
1 根据weights大小来训练
2 基于L0 regularization的训练

Filter pruning

NNCF implements three different criteria for filter importance:

  • L1-norm,
  • L2-norm
  • geometric median.

标签:training,inference,compression,Neural,NNCF,sparsity,model,Compression
来源: https://blog.csdn.net/xieyi4650/article/details/118393883

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有