为了熟悉监控视频编码研究进展,本文将上世纪九十年代至今的相关论文做了总结,并且提取出每篇论文的主要研究内容,以便个人写论文使用。
[1] P. Gorur, B. Amrutur, “Skip decision and reference frame selection for low-complexity H.264/AVC surveillance video coding,” IEEE Transactions on Circuits and System for Video Technology(TCSVT), vol. 24, no. 7, pp. 1156-1169, Jul. 2014.
Gorur et al. [1] proposed GMM S-MD to reduce the computation cost of the detection process by using a cascade of spatial samplers that serve as rejection classifiers. The input image is initially sampled sparsely. The sampled pixels are classified as either background/foreground using the GMM(Gaussian mixture model) algorithm. The regions surrounding the foreground pixels are considered to be salient. These salient regions are further sampled using a dense sampler. The sampled pixels are segmented to verify the presence of foreground objects. MBs(Macro Blocks) that do not contain foreground pixels are included in the set of indices of MBs that contain only
background objects.
[2] I. Martins, L. Corte-Real, “A video coder using 3-D model based background for video surveillance applications,” International Conference on Image Processing(ICIP), vol. 2, pp. 919-923,1998.
Martins et al. [2] present a background modeling based approach for remote video surveillance applications at very low bit rates.The codec has two layers,one layer for the background using a 3-D model and a second layer for the part of the scene not represented by the background.The second layer may use conventional hybrid coding schemes.
[3] Tieyan Liu, Xudong Zhang, Yingning Peng, “A novel coding algorithm for video surveillance,” 2002 6th International Conference on Signal Processing(ICSP), vol 1, pp. 660-663,2002.
Liu et al. [3] designed a new coding algorithm for video surveillance. By use of the techniques such as context switch,inertia-based motion estimation,early-out DCT transform,region-based quantization and so on, this algorithm provides some attractive features for video surveillance.
[4] Rong Shi, Xiaofeng Li,Zaiming Li, “Efficient spatiotemporal segmentation and video object generation for highway surveillance video,” IEEE 2002 International Conference on Communications,Circuits and Systems and West Sino Expositions(ICCCAS), vol 1, pp. 580-584,2002.
Shi et al. [4] described a new procedure for spatiotemporal segmentation and video object generation of highway surveillance video.First the representation model is proposed to describe the relationship between the background and the moving objects. Based on this model we analyze the spatiotemporal information of some successive frames statistically to recover the background image.Then the moving objects are detected and the video object planes are extracted by the difference between the video sequence and the recovered background image.Finally the spatiotemporal similarity operator is used for object tracking.
[5] A. Vetro, T. Haga, K. Sumi, Huifang Sun, “Object-based coding for long-term archive of surveillance video,” 2003 International Conference on Multimedia and Expo(ICME), vol 2, pp.Ⅱ-417-20,2003.
Vetro et al. [5] considered video coding using several automatic segmentation algorithms to achieve significant increase in storage capacity.
[6] T. Nishi, H. Fujiyoshi,”Object-based video coding using pixel state analysis,” 2004 International Conference on Pattern Recognition(ICPR),vol 2, pp.306-309,2004.
Nishi et al. [6] described object-based coding by pixel state analysis. Pixel state analysis detects the foreground objects and background regions in video frames and distinguishes foreground object pixels as stationary or transient pixels.For stationary pixels,it is possible to restore the color intensity by referring to the same pixel location in the last frame.
[7] Yu Yang, D. Doermann, “Model of Object-Based Coding for Surveillance Video,” 2005 IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP), vol 2, pp.693-696,2005.
Yang et al. [7] explored the model of potential savings of object-based coding for surveillance video. Moving foreground objects in stationary camera surveillance video are detected by a background subtraction technique and encoded with MPEG-4 object-based coding.
[8] J. Meesssen, C. Parisot, X Desurmont, J.-F. Delaigle, ”Scene analysis for reducing motion JPEG 2000 video surveillance delivery bandwidth and complexity,” 2005 IEEE International Conference on Image Processing(ICIP), vol 1, pp. I-577-88,2005.
Meesssen et al. [8] proposed a new object-based video coding/transmission system using the emerging Motion JPEG 2000 standard for the efficient storage and delivery of video surveillance over low bandwidth channels.
[9] Yi-Lum Lin, Shu-Fa Lin, H.H. Chen, Yuh-Feng Hsu,”Improving the coding of regions of interest,” 2006 IEEE International Symposium on Circuits and Systems(ISCAS), pp.4313-4316,2006.
Lin et al. [9] considered a video coding system for surveillance applications. It consists of one “base encoder” that encodes a down-sampled,full-view version of the input video sequence and one “region of interest”(ROI) encoder that encodes an ROI of the video sequence at the original image resolution. However,an important requirement of the video coding system is that the ROI bit stream and the base bit stream should be independently decoded.
[10] R. Venkatesh Babu,A. Makur, “Object-based Surveillance Video Compression using Foreground Motion Compensation,” 2006 International Conference on Control,Automation, Robotics and Vision(ICARCV), pp. 1-6,2006.
Venkatesh Babu et al. [10] proposed an object based video compression system using foreground motion compensation for application such as archival and transmission of surveillance video.The proposed system segments independently moving objects from the video and codes them with respect to the previously reconstructed frame. The error resulting from object-based motion compensation is coded using SA-DCT procedure.
[11] Xuedong Liu, Yihua Tan,Jian Liu, “A Coding Scheme for Surveillance Video.” The 1st International Conference on Bioinformatics and Biomedical Engineering(ICBBE), pp. 888-891,2007.
Liu et al. [11] incorporated a coding scheme for the special surveillance video with SPIHT(set partitioning in hierarchical trees) algorithm for wavelet-based video coding.By scaling up the wavelet coefficients pertaining to the surveillant region,the region is coded with higher quality than the rest of the frame if the bit rate is limited.
[12] Wei-Di Hong, Tien-Hsu Lee, Pao-Chi Chang, “Real-Time Foreground Segmentation for the Moving Camera Based on H.264 Video Coding Information,” Future Generation Communication and Networking(FGCN), vol 1, pp. 385-390, 2007.
Hong et al. [12] proposed a real-time foreground segmentation algorithm for the moving camera based on the H.264 video coding information. Hong utilize the relative global motion model to calculate the approximate global motion vector and get the motion vector difference of each block and then assign different weightings and apply spatio-temporal refinement to these motion vector differences for further improving the accuracy of segmentation results according to the block partition modes, and finally segment out the foreground blocks by an adaptive threshold.
[13] D. Venkatraman, A. Makur, “A compressive sensing approach to object-based surveillance video coding,” IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP), pp. 3513-3516, 2009.
Venkatraman et al. [13] studies the feasibility and investigates various choices in the application of compressive sensing(CS) to object-based surveillance video coding. This work proposes several techniques using two approaches- direct CS and transform-based CS. The techniques are studied and analyzed by varying the different trade off parameters such as the measurement index,quantization levels etc. Finally we recommend an optical scheme for a range of bitrates.
[14] Xin Jin, S. Goto, “Difference detection with encoder adaptability for low complexity surveillance video compression,” International Symposium on Intelligent Signal Processing and Communication Systems(ISPCS), pp. 489-492,2009.
Jin et al. [14] proposed a difference detection algorithm to work as a preprocessing module before video encoder to reduce the computational complexity of video
compression.
[15] Limin Liu, Zhen Li, E.J. Delp, “Efficient and Low-Complexity Surveillance Video Compression Using Backward-Channel Aware Wyner-Ziv Video Coding,” IEEE Transactions on Circuits and System for Video Technology(TCSVT), vol. 19, no. 4, pp. 453-465, Apr. 2009.
Liu et al. [15] presented a surveillance video compression system with low-complexity encoder based on Wyner-Ziv coding principles to address the tradeoff between computational complexity and coding efficiency. In addition,we proposed a backward-channel aware Wyner-Ziv(BCAWZ) video coding approach to improve the coding efficiency while maintaining low complexity at the encoder.
[16] Xuedong Liu, Hong Wang, “A fast motion segmentation algorithm based on hypothesis test for surveillance video coding,” International Conference on Image Analysis and Signal Processing(IASP), pp. 653-655, 2010.
Liu et al. [16] proposed a fast motion segmentation algorithm based on hypothesis test. At first,statistical model of camera noise is obtained offline. Then,pixels are classified into the moving and still by hypothesis test and a binary mask image is generated. Median filtering is used further to remove isolated spots. At last,macro block(MB) mask is formed according to the number of moving pixels inside MBs.
[17] Xianguo Zhang, Luhong Liang, Qian Huang, Yazhou Liu, Tiejun Huang, Wen Gao, “An Efficient Coding Scheme for Surveillance Videos Captured by Stationary Cameras,” in Proc. Visual Communication and Image Processing(VCIP),vol 7744,pp. 77442A-1-10,2010.
Zhang et al. [17] presented a new scheme to improve the coding efficiency of sequences captured by stationary cameras for video surveillance applications. The author introduced two novel kinds of frames (namely background and difference frame) for input frames to represent the foreground/background without object detection,tracking or segmentation. A sequence structure is proposed to generate high quality background frames and efficiently code difference frames without delay,and then surveillance videos can be easily compressed by encoding the background frames and difference frames in a traditional manner.
[18] A.D. Bagdanov, M. Bertini, A. Del Bimbo, L. Seidenari, “Adaptive Video Compression for Video Surveillance Applications,” IEEE International Symposium on Multimedia(ISM), pp. 190-197, 2011.
Bagdanov et al. [18] described an approach to adaptive video coding for video surveillance applications. Using a combination of low-level featured with low computational cost to control the quality of video compression and background elements are allocated fewer bits in the transmitted representation.
[19] Shumin Han, Xianguo Zhang, Yonghong Tian, Tiejun Huang, “An Efficient Background Reconstruction Based Coding Method for Surveillance Videos Captured by Moving Camera,” IEEE Ninth International Conference on Advanced Video and and Signal-Based Surveillance(AVSS), pp. 160-165, 2012.
Han et al. [19] proposed to dynamically build up a background frame for each input frame from a generated panorama background and employ it for a background frame based motion compensation to improve the coding efficiency.
[20] Shanghang Zhang, Kaijin Wei, Huizhu Jia, Xiaodong Xie, Wen Gao, “An efficient foreground-based surveillance video coding scheme in low bit-rate compression,” IEEE Visual Communication and Image Processing(VCIP), pp. 1-6, 2012.
Zhang et al. [20] presented a novel foreground-based (FG-based) coding scheme to solve the problems of blocking artifacts and excessive bit consumption for gaining better video quality at low bit-rate.
[21] Wei Chen, Xianguo Zhang, Yonghong Tian, Tiejun Huang, “An efficient surveillance coding method based on a timely and bit-saving background updating model,” IEEE Visual Communication and Image Processing(VCIP), pp. 1-6, 2012.
Chen et al. [21] firstly build up a background updating model from a detailed analysis of results on surveillance video. Following this, they propose a bit-saving and quality maintaining background frame coding method, in this way,the background frame can be updated more timely,consequently leading to the better coding efficiency.
[22] Xianguo Zhang, Yonghong Tian, Luhong Liang, Tiejun Huang, Wen Gao, “Macro-Block-Level Selective Background Difference Coding for Surveillance Video,” IEEE International Conference on Multimedia and Expo(ICME), pp. 1067-1072, 2012.
Zhang et al. [22] proposed a macro-block-level selective background difference coding method(MSBDC) to address the potential “foreground pollution” phenomenon of inaccurate foreground segmentation, the low-quality or unclear background frame and so on.
[23] Long Zhao, Xianguo Zhang, Yonghong Zhang, Ronggang Wang, Tiejun Huang, “A background proportion adaptive Lagrange multiplier selection method for surveillance video on HEVC,” IEEE International Conference on Multimedia and Expo(ICME), pp. 1-6, 2013.
Zhao et al. [23] proposed a Lagrange multiplier selection model to obtain the optimal coding performance for surveillance videos. Following this, they further develop a Lagrange multiplier optimized video coding method.
[24] Xianguo Zhang, Tiejun Huang, Yonghong Tian, Wen Gao, “Hierarchical-and-Adaptive Bit-Allocation with Selective Background Prediction for High Efficiency Video Coding (HEVC),” Data Compression Conference(DCC), pp.535,2013.
[25] B. Dey, M.K. Kundu, “Robust Background Subtraction for Network Surveillance in H.264 Streaming Video,” IEEE Transactions on Circuits and System for Video Technology(TCSVT), vol. 23, no. 10, pp. 1695-1703, Jul. 2013.
Dey et al. [25] presented a novel approach for background subtraction in bitstreams encoded in the Baseline profile of H.264/AVC.Temporal statistics of the proposed feature vectors, describing macro-block units in each frame, are used to select potential candidates containing moving objects. From the candidate macro-blocks,foreground pixels are determined by comparing the colors of corresponding pixels pair-wise with a background model. The basic contribution of the current work compared to the related approaches is that, it allows each macro-block to have a different quantization parameter, in view of the requirements in variable as well as constant bit-rate applications. Additionally, a low-complexity technique for color comparison is proposed which enables us to obtain pixel-resolution segmentation at a negligible computational cost as compared to those of classical pixel-based approaches.
[26] Peiyin Xing, Yonghong Tian, Tiejun Huang, WenGao, “Surveillance video coding with quadtree partition based ROI extraction,” Picture Coding Symposium(PCS), pp. 157-160, 2013.
Xing et al. [26] presented a surveillance video coding method with High Efficiency Video Coding (HEVC) quadtree partition based ROI extraction. With automatically generated foreground mask and modeled background frame, a ROI extraction following the block partition in HEVC’s quadtree structure is firstly performed. Afterwards,surveillance videos can be compressed by coding two-layer videos. One is the ROI-layer video generated by merging ROIs and background data in each frame together.The other is the background-layer video produced by subtracting the ROIs from the original input video.
[27] Jianfu Wang, Lanfang Dong, “An efficient coding scheme for surveillance videos based on high efficiency video coding,” 2014 10th International Conference on Natural Computation(ICNC), pp. 899-904, 2014.
Wang et al. [27] proposed a new coding scheme for surveillance videos using inter-frame difference to encode different image areas with different encoder options. The scheme is implemented through the proposed fast Coding Unit (CU) size decision algorithm. With using the luma component of difference image, the proposed algorithm can segment out moving objects from background, and then select proper CU size for different areas.
[28] Xianguo Zhang, Tiejun Huang, Yonghong Tian, Wen Gao, “Background-Modeling-Based Adaptive Prediction for Surveillance Video Coding,” IEEE Transactions on Image Processing(TIP), vol 23, no 2, pp. 769-784, Feb, 2014
Zhang et al. [28] proposed a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction(BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference;whereas for foreground–background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels.
[29] Xianguo Zhang, Yonghong Tian, Tiejun Huang, Siwei Dong, Wen Gao, “Optimizing the Hierarchical Prediction and Coding in HEVC for Surveillance and Conference Videos With Background Modeling,” IEEE Transactions on Image Processing (TIP), vol 23, no 10, pp. 4511-4526, Oct, 2014.
Zhang et al. [29] proposed an optimization method for the hierarchical prediction and coding in HEVC for these videos with background modeling. First, several experimental and theoretical analyses are conducted on how to utilize the G-picture to optimize the hierarchical prediction structure and hierarchical quantization.Following these results, they propose to encode the G-picture as the long-term reference frame to improve the background prediction, and then present a G-picture-based bit allocation algorithm to increase the coding efficiency. Meanwhile, according to the proportions of background and foreground pixels in coding units (CUs), an adaptive speed-up algorithm is developed to classify each CU into different categories and then adopt different speed-up strategies to reduce the encoding complexity.