Incremental network quantization

Approach

We present INQ which incorporates three interdependent operations: weight partition, groupwise quantization and re-training. Weight partition is to divide the weights in each layer of a pre-trained full-precision CNN model into two disjoint groups which play complementary roles in our INQ. The weights in the first group are responsible for forming a low-precision base for the original model. The weights in the second group adapt to compensate for the loss in model accuracy, thus they are the ones to be re-trained. Once the first run of the quantization and retraining operations is finished, all the three operations are further conducted on the second weight group in an iterative manner, until all the weights are converted to be either powers of two or zero, acting as an incremental network quantization and accuracy enhancement procedure.

Experiment

References:
INCREMENTAL NETWORK QUANTIZATION: TOWARDS LOSSLESS CNNS WITH LOW-PRECISION WEIGHTS, Aojun Zhou, 2017, ICLR

你可能感兴趣的:(Incremental network quantization)