深度学习模型的压缩和加速是指利用神经网络参数的冗余性和网络结构的冗余性精简模型,在不影响任务完成度的情况下,得到参数量更少、结构更精简的模型。被压缩后的模型计算资源需求和内存需求更小,相比原始模型能够满足更加广泛的应用需求。在深度学习技术日益火爆的背景下,对深度学习模型强烈的应用需求使得人们对内存占用少、计算资源要求低、同时依旧保证相当高的正确率的“小模型”格外关注。利用神经网络的冗余性进行深度学习的模型压缩和加速引起了学术界和工业界的广泛兴趣,各种工作也层出不穷。
本文参考2021发表在软件学报上的《深度学习模型压缩与加速综述》进行了总结和学习。
相关链接:
深度学习模型压缩与加速技术(一):参数剪枝
深度学习模型压缩与加速技术(二):参数量化
深度学习模型压缩与加速技术(三):低秩分解
深度学习模型压缩与加速技术(四):参数共享
深度学习模型压缩与加速技术(五):紧凑网络
深度学习模型压缩与加速技术(六):知识蒸馏
深度学习模型压缩与加速技术(七):混合方式
模型压缩与加速技术 | 描述 |
---|---|
参数剪枝(A) | 设计关于参数重要性的评价准则,基于该准则判断网络参数的重要程度,删除冗余参数 |
参数量化(A) | 将网络参数从 32 位全精度浮点数量化到更低位数 |
低秩分解(A) | 将高维参数向量降维分解为稀疏的低维向量 |
参数共享(A) | 利用结构化矩阵或聚类方法映射网络内部参数 |
紧凑网络(B) | 从卷积核、特殊层和网络结构3个级别设计新型轻量网络 |
知识蒸馏(B) | 将较大的教师模型的信息提炼到较小的学生模型 |
混合方式(A+B) | 前几种方法的结合 |
A:压缩参数 B:压缩结构
参数量化是指用较低位宽表示 32 位浮点网络参数,网络参数包括权重、激活值、梯度和误差等等,可以使用统一的位宽(如16-bit、8-bit、2-bit 和 1-bit 等),也可以根据经验或一定策略自由组合不同的位宽
优点:
缺点:
二值化是指限制网络参数取值为 1 或-1,极大地降低了模型对存储空间和内存空间的需求,并且将原来的乘法操作转化成加法或者移位操作,显著提高了运算速度,但同时也带来训练难度和精度下降的问题。
三值化是指在二值化的基础上引入 0 作为第 3 阈值,减少量化误差。
当参数数量庞大时,可利用聚类方式进行权重量化。
Gong 等人[70]最早提出将 k-means 聚类用于量化全连接层参数,如图所示,对原始权重聚类形成码本,为权值分配码本中的索引,所以只需存储码本和索引,无需存储原始权重信息。
Wu 等人[71]将 k-means 聚类拓展到卷积层,将权值矩阵划分成很多块,再通过聚类获得码本,并提出一种有效的训练方案抑制量化后的多层累积误差。
Choi 等人[72]分析了量化误差与 loss 的定量关系,确定海森加权失真测度是量化优化的局部正确目标函数,提出了基于海森加权 k-means 聚类的量化方法。
Xu 等人[73]提出了分别针对不同位宽的 Single-level network quantization(SLQ) 和 Multi-level network quantization(MLQ) 两种方法,SLQ 方法针对高位宽,利用 k-means 聚类将权重分为几簇,依据量化 loss,将簇分为待量化组和再训练组,待量化组的每个簇用簇内中心作为共享权重,剩下的参数再训练。而 MLQ 方法针对低位宽,不同于 SLQ 方法一次量化所有层,MLQ 方法采用逐层量化的方式。
由于二值网络会降低模型的表达能力,研究人员提出,可以根据经验手工选定最优的网络参数位宽组合。
方法 | W | A | G | E | 特点 |
---|---|---|---|---|---|
[79] | 16 | 32 | 32 | 32 | 引入随机舍入技术 |
[81] | 32 | 8 | 8 | 32 | 加速数据传输,提高并行训练的性能 |
[82] | 8 | 8 | 32 | 32 | 仅测试时使用整数运算 |
[83] | 8 | 8 | 8 | 32 | 计算梯度的最后一个步骤保留更高精度 |
[84] | 2 | 8 | 8 | 8 | 离散训练和推理 |
由于手工确定网络参数位宽存在一定的局限性,可以设计一定的策略,以帮助网络选择合适的位宽组合。
由于量化网络的网络参数不是连续的数值,所以不能像普通的卷积神经网络那样直接使用梯度下降方法进行训练,而需要特殊的方法对这些离散的参数值进行处理,使其不断优化,最终实现训练目。
主要参考:高晗,田育龙,许封元,仲盛.深度学习模型压缩与加速综述[J].软件学报,2021,32(01):68-92.DOI:10.13328/j.cnki.jos.006096.
[56] Courbariaux M, Bengio Y, David JP. Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems. 2015. 31233131.
[57] Hou L, Yao Q, Kwok JT. Loss-aware binarization of deep networks. arXiv Preprint arXiv: 1611.01600, 2016.
[58] Juefei-Xu F, Naresh Boddeti V, Savvides M. Local binary convolutional neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 1928.
[59] Guo Y, Yao A, Zhao H, et al. Network sketching: Exploiting binary structure in deep CNNs. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 59555963.
[60] McDonnell MD. Training wide residual networks for deployment using a single bit for each weight. arXiv Preprint arXiv: 1802. 08530, 2018.
[61] Hu Q, Wang P, Cheng J. From hashing to CNNs: Training binary weight networks via hashing. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. 2018.
[62] Courbariaux M, Hubara I, Soudry D, et al. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or 1. arXiv Preprint arXiv: 1602.02830, 2016.
[63] Rastegari M, Ordonez V, Redmon J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks. In: Proc. of the European Conf. on Computer Vision. Cham: Springer-Verlag, 2016. 525542.
[64] Li Z, Ni B, Zhang W, et al. Performance guaranteed network acceleration via high-order residual quantization. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 25842592.
[65] Liu Z, Wu B, Luo W, et al. Bi-real net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 722737.
[66] Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network. In: Advances in Neural Information Processing Systems. 2017. 345353.
[67] Li F, Zhang B, Liu B. Ternary weight networks. arXiv Preprint arXiv: 1605.04711, 2016.
[68] Zhu C, Han S, Mao H, et al. Trained ternary quantization. arXiv Preprint arXiv: 1612.01064, 2016.
[69] Achterhold J, Koehler J M, Schmeink A, et al. Variational network quantization. In: Proc. of the ICLR 2017. 2017.
[70] Gong Y, Liu L, Yang M, et al. Compressing deep convolutional networks using vector quantization. arXiv Preprint arXiv: 1412. 6115, 2014.
[71] Wu J, Leng C, Wang Y, et al. Quantized convolutional neural networks for mobile devices. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 48204828.
[72] Choi Y, El-Khamy M, Lee J. Towards the limit of network quantization. arXiv Preprint arXiv: 1612.01543, 2016.
[73] Xu Y, Wang Y, Zhou A, et al. Deep neural network compression with single and multiple level quantization. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. 2018.
[74] Lin Z, Courbariaux M, Memisevic R, et al. Neural networks with few multiplications. arXiv Preprint arXiv: 1510.03009, 2015.
[75] Zhou S, Wu Y, Ni Z, et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv Preprint arXiv: 1606.06160, 2016.
[76] Mishra A, Nurvitadhi E, Cook JJ, et al. WRPN: Wide reduced-precision networks. arXiv Preprint arXiv: 1709.01134, 2017.
[77] Köster U, Webb T, Wang X, et al. Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In: Advances in Neural Information Processing Systems. 2017. 17421752.
[78] Wang N, Choi J, Brand D, et al. Training deep neural networks with 8-bit floating point numbers. In: Advances in Neural Information Processing Systems. 2018. 76757684.
[79] Gupta S, Agrawal A, Gopalakrishnan K, et al. Deep learning with limited numerical precision. In: Proc. of the Int’l Conf. on Machine Learning. 2015. 17371746.
[80] Dettmers T. 8-bit approximations for parallelism in deep learning. arXiv Preprint arXiv: 1511.04561, 2015.
[81] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 27042713.
[82] Banner R, Hubara I, Hoffer E, et al. Scalable methods for 8-bit training of neural networks. In: Advances in Neural Information Processing Systems. 2018. 51455153.
[83] Wu S, Li G, Chen F, et al. Training and inference with integers in deep neural networks. arXiv Preprint arXiv: 1802.04680, 2018.
[84] Wang P, Hu Q, Zhang Y, et al. Two-step quantization for low-bit neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 43764384.
[85] Faraone J, Fraser N, Blott M, et al. SYQ: Learning symmetric quantization for efficient deep neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 43004309.
[86] Zhang D, Yang J, Ye D, et al. LQ-nets: Learned quantization for highly accurate and compact deep neural networks. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 365382.
[87] Zhou A, Yao A, Guo Y, et al. Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv Preprint arXiv: 1702.03044, 2017.
[88] Cai Z, He X, Sun J, et al. Deep learning with low precision by half-wave Gaussian quantization. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 59185926.
[89] Leng C, Dou Z, Li H, et al. Extremely low bit neural network: Squeeze the last bit out with ADMM. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. 2018.
[90] Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011,3(1):1122.
[91] Zhuang B, Shen C, Tan M, et al. Towards effective low-bitwidth convolutional neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 79207928.
[92] Zhou A, Yao A, Wang K, et al. Explicit loss-error-aware quantization for low-bit deep neural networks. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 94269435.
[93] Park E, Yoo S, Vajda P. Value-Aware quantization for training and inference of neural networks. In: Proc. of the European Conf.on Computer Vision (ECCV). 2018. 580595.
[94] Shayer O, Levi D, Fetaya E. Learning discrete weights using the local reparameterization trick. arXiv Preprint arXiv: 1710.07739, \2017.
[95] Louizos C, Reisser M, Blankevoort T, et al. Relaxed quantization for discretized neural networks. arXiv Preprint arXiv: 1810.01875, 2018.