深度学习模型的压缩和加速是指利用神经网络参数的冗余性和网络结构的冗余性精简模型,在不影响任务完成度的情况下,得到参数量更少、结构更精简的模型。被压缩后的模型计算资源需求和内存需求更小,相比原始模型能够满足更加广泛的应用需求。在深度学习技术日益火爆的背景下,对深度学习模型强烈的应用需求使得人们对内存占用少、计算资源要求低、同时依旧保证相当高的正确率的“小模型”格外关注。利用神经网络的冗余性进行深度学习的模型压缩和加速引起了学术界和工业界的广泛兴趣,各种工作也层出不穷。
本文参考2021发表在软件学报上的《深度学习模型压缩与加速综述》进行了总结和学习。
相关链接:
深度学习模型压缩与加速技术(一):参数剪枝
深度学习模型压缩与加速技术(二):参数量化
深度学习模型压缩与加速技术(三):低秩分解
深度学习模型压缩与加速技术(四):参数共享
深度学习模型压缩与加速技术(五):紧凑网络
深度学习模型压缩与加速技术(六):知识蒸馏
深度学习模型压缩与加速技术(七):混合方式
模型压缩与加速技术 | 描述 |
---|---|
参数剪枝(A) | 设计关于参数重要性的评价准则,基于该准则判断网络参数的重要程度,删除冗余参数 |
参数量化(A) | 将网络参数从 32 位全精度浮点数量化到更低位数 |
低秩分解(A) | 将高维参数向量降维分解为稀疏的低维向量 |
参数共享(A) | 利用结构化矩阵或聚类方法映射网络内部参数 |
紧凑网络(B) | 从卷积核、特殊层和网络结构3个级别设计新型轻量网络 |
知识蒸馏(B) | 将较大的教师模型的信息提炼到较小的学生模型 |
混合方式(A+B) | 前几种方法的结合 |
A:压缩参数 B:压缩结构
参数剪枝是指在预训练好的大型模型的基础上,设计对网络参数的评价准则,以此为根据删除“冗余”参数。
非结构化剪枝
非结构化剪枝的粒度比较细,可以无限制地去掉网络中期望比例的任何“冗余”参数,但这样会带来裁剪后网络结构不规整、难以有效加速的问题(对神经元之间的连接重要性设计评价准则,删除冗余连接,可达到模型压缩的目的)。
结构化剪枝
结构化剪枝的粒度比较粗,剪枝的最小单位是 filter 内参数的组合,通过对 filter 或者 feature map 设置评价因子,甚至可以删除整个 filter 或者某几个 channel,使网络“变窄”,从而可以直接在现有软/硬件上获得有效加速,但可能会带来预测精度(accuracy)的下降,需要通过对模型微调(fine-tuning)以恢复性能。
对filter 的评价准则可分为以下 4 种:
基于 filter 范数大小
自定义 filter 评分因子
最小化重建误差
设神经网络中某一卷积层权重为 W,通道数为 C,输入为 X,输出为 Y,忽略偏置项 B,则有
Y = ∑ c = 1 C ∑ k 1 = 1 K 1 ∑ k 2 = 1 K 2 W c , k 1 , k 2 × C c , k 1 , k 2 Y=\sum_{c=1}^{C} \sum_{k_{1}=1}^{K 1} \sum_{k_{2}=1}^{K_{2}} W_{c, k_{1}, k_{2}} \times C_{c, k_{1}, k_{2}} Y=c=1∑Ck1=1∑K1k2=1∑K2Wc,k1,k2×Cc,k1,k2
令
X ^ c = ∑ k 1 = 1 K 1 ∑ k 2 = 1 K 2 W c , k 1 , k 2 × C c , k 1 , k 2 \hat{X}_{c}=\sum_{k_{1}=1}^{K 1} \sum_{k_{2}=1}^{K_{2}} W_{c, k_{1}, k_{2}} \times C_{c, k_{1}, k_{2}} X^c=k1=1∑K1k2=1∑K2Wc,k1,k2×Cc,k1,k2
则有
Y = ∑ c = 1 K 1 X ^ c Y=\sum_{c=1}^{K1} \hat{X}_{c} Y=c=1∑K1X^c
令 S 作为从 C 个通道中取得的最优子集,裁剪过程其实就是使子集 S 的最终输出与原始 C 个通道的最终输出 Y 的差别最小。即:
arg min S ( Y − ∑ j ∈ S X ^ j ) \underset{S}{\arg \min }\left(Y-\sum_{j \in S} \hat{X}_{j}\right) Sargmin⎝⎛Y−j∈S∑X^j⎠⎞
其他方法
主要参考:高晗,田育龙,许封元,仲盛.深度学习模型压缩与加速综述[J].软件学报,2021,32(01):68-92.DOI:10.13328/j.cnki.jos.006096.
[19]LeCun Y, Denker JS, Solla SA. Optimal brain damage. In: Advances in Neural Information Processing Systems. 1990. 598-605.
[20] Hassibi B, Stork DG. Second order derivatives for network pruning: Optimal brain surgeon. In: Advances in Neural Information Processing Systems. 1993. 164-171.
[21] Srinivas S, Babu RV. Data-free parameter pruning for deep neural networks. arXiv Preprint arXiv: 1507.06149, 2015.
[22] Dong X, Chen S, Pan S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems. 2017. 4857-4867.
[23] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems. 2015. 1135-1143.
[24] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In: Advances in Neural Information Processing Systems. \2016. 1379-1387.
[25] Lin C, Zhong Z, Wei W, et al. Synaptic strength for convolutional neural network. In: Advances in Neural Information Processing Systems. 2018. 10149-10158.
[26] Lee N, Ajanthan T, Torr PHS. Snip: Single-shot network pruning based on connection sensitivity. arXiv Preprint arXiv: 1810.02340, 2018.
[27] Macchi O. The coincidence approach to stochastic point processes. Advances in Applied Probability, 1975,7(1):83-122.
[28] Mariet Z, Sra S. Diversity networks: Neural network compression using determinantal point processes. arXiv Preprint arXiv: 1511. 05077, 2015.
[29] Kingma DP, Salimans T, Welling M. Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems. 2015. 2575-2583.
[30] Molchanov D, Ashukha A, Vetrov D. Variational dropout sparsifies deep neural networks. In: Proc. of the 34th Int’l Conf. on Machine Learning, Vol.70. JMLR.org, 2017. 2498-2507.
[31] Louizos C, Welling M, Kingma DP. Learning sparse neural networks through $ L_0 $ regularization. arXiv Preprint arXiv: 1712. 01312, 2017.
[32] Tartaglione E, Lepsøy S, Fiandrotti A, et al. Learning sparse neural networks via sensitivity-driven regularization. In: Advances in Neural Information Processing Systems. 2018. 3878-3888.
[33] Carreira-Perpinán MA, Idelbayev Y. “Learning-Compression” algorithms for neural net pruning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 8532-8541.
[34] Liu Z, Xu J, Peng X, et al. Frequency-domain dynamic pruning for convolutional neural networks. In: Advances in Neural Information Processing Systems. 2018. 1043-1053.
[35] Wen W, Wu C, Wang Y, et al. Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems. 2016. 2074-2082.
[36] Alvarez JM, Salzmann M. Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems. 2016. 2270-2278.
[37] Figurnov M, Ibraimova A, Vetrov DP, et al. Perforatedcnns: Acceleration through elimination of redundant convolutions. In: Advances in Neural Information Processing Systems. 2016. 947-955.
[38] Lebedev V, Lempitsky V. Fast convnets using group-wise brain damage. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2016. 2554-2564.
[39] Zhou H, Alvarez JM, Porikli F. Less is more: Towards compact cnns. In: Proc. of the European Conf. on Computer Vision. Cham: Springer-Verlag, 2016. 662-677.
[40] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv Preprint arXiv: 1608.08710, 2016.
[41] Chen YH, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News, 2016,44(3):367-379.
[42] Yang TJ, Howard A, Chen B, et al. Netadapt: Platform-aware neural network adaptation for mobile applications. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 285-300.
[43] He Y, Kang G, Dong X, et al. Soft filter pruning for accelerating deep convolutional neural networks. arXiv Preprint arXiv: 1808. 06866, 2018.
[44] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 2736-2744.
[45] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 304-320.
[46] Ye J, Lu X, Lin Z, et al. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv Preprint arXiv: 1802.00124, 2018.
[47] Dai B, Zhu C, Wipf D. Compressing neural networks using the variational information bottleneck. arXiv Preprint arXiv: 1802.10399, 2018.
[48] He Y, Lin J, Liu Z, et al. AMC: Automl for model compression and acceleration on mobile devices. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 784-800.
[49] Luo JH, Wu J, Lin W. Thinet: A filter level pruning method for deep neural network compression. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 5058-5066.
[50] He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proc. of the IEEE Int’l Conf. on ComputerVision. 2017. 1389-1397.
[51] Yu R, Li A, Chen CF, et al. Nisp: Pruning networks using neuron importance score propagation. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 9194-9203.
[52] Zhuang Z, Tan M, Zhuang B, et al. Discrimination-aware channel pruning for deep neural networks. In: Advances in Neural Information Processing Systems. 2018. 875-886.
[53] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient transfer learning. arXiv Preprint arXiv: 1611.06440, 2016.
[54] Lin S, Ji R, Li Y, et al. Accelerating convolutional networks via global & dynamic filter pruning. In: Proc. of the IJCAI. 2018. 2425-2432.
[55] Zhang T, Ye S, Zhang K, et al. A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proc. of the European Conf. on Computer Vision (ECCV). 2018. 184-199.