DSD: Dense-Sparse-Dense training for deep neural network

Approach

In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacityby removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network.

DSD: Dense-Sparse-Dense training for deep neural network_第1张图片
DSD
DSD: Dense-Sparse-Dense training for deep neural network_第2张图片
Algorithm

The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution.

Experiment

DSD: Dense-Sparse-Dense training for deep neural network_第3张图片
GoogLeNet
DSD: Dense-Sparse-Dense training for deep neural network_第4张图片
VGG-16
DSD: Dense-Sparse-Dense training for deep neural network_第5张图片
ResNet

References:
DSD: DENSE-SPARSE-DENSE TRAINING FOR DEEP NEURAL NETWORKS,Song Han, 2017, ICLR

你可能感兴趣的:(DSD: Dense-Sparse-Dense training for deep neural network)