混合高斯分布(GMM)是背景建模中的经典算法,自提出至今已经有了很多围绕它改进和应用的论文。opencv中(2.4.13版本)也引入了该算法及其改进版本。
首先是基本版本的GMM,opencv将其封装为BackgroundSubtractorMOG,有关该版本算法源码解读及相关论文翻译参考点击打开链接
之后是改进版GMM,opencv把它封装为BackgroundSubtractorMOG2算法类,源代码位于opencv\sources\modules\video\src\bgfg_gaussmix2.cpp中。算法如何改进?改进的效果如何?下面将给出详细分析。
首先通过例程观察BackgroundSubtractorMOG2与BackgroundSubtractorMOG的区别,有关该算法的opencv应用可以参考http://blog.csdn.net/sinat_31337047/article/details/52586160
测试代码如下:
#include
#include
#include
#include
#include
#include "VideoProcessor.h"
#include "FeatureTracker.h"
#include "BGFSSegmentor.h"
using namespace std;
using namespace cv;
int main(int argc, char* argv[])
{
VideoCapture capture("../768X576.avi");
if(!capture.isOpened())
return 0;
Mat frame;
Mat foreground, foreground2;
BackgroundSubtractorMOG mog;
BackgroundSubtractorMOG2 mog2;
bool stop(false);
namedWindow("Extracted Foreground");
while(!stop)
{
if(!capture.read(frame))
break;
cvtColor(frame, frame, CV_BGR2GRAY);
long long t = getTickCount();
mog(frame, foreground, 0.01);
long long t1 = getTickCount();
mog2(frame, foreground2, -1);
long long t2 = getTickCount();
cout<<"t1 = "<<(t1-t)/getTickFrequency()<<" t2 = "<<(t2-t1)/getTickFrequency()<= 0)
stop = true;
}
waitKey();
return 0;
}
背景提取效果如下图所示,左左边为BackgroundSubtractorMOG2算法效果,右边为BackgroundSubtractorMOG效果。先忽略左边图像中的一些噪点,我们可以看到两者的最大不同在于左边图像中存在灰色填充的一些区域,这些正是BackgroundSubtractorMOG2算法的一个改进点-阴影检测,那些灰色区域就是算法计算得到的“阴影”区域。另外一处不同在于算法的运行时间,根据控制台输出结果,BackgroundSubtractorMOG2每帧检测大概0.03s,BackgroundSubtractorMOG每帧检测大概0.06s,BackgroundSubtractorMOG2算法在运行时间上有较大提升(不全是算法本身原因,实际上BackgroundSubtractorMOG2在执行时通过多线程并行执行)。
所以目前看来改进后的GMM算法,即BackgroundSubtractorMOG2算法主要有两点改进点:(1)增加阴影检测功能(2)算法效率有较大提升。后者意义不言而喻,前者的意义在于如果不使用一些方法检测得到阴影,那么它有可能被识别为前景物体,导致前景物体得到了错误的形状,从而对后续处理(譬如跟踪)产生不好的影响。
那么BackgroundSubtractorMOG2如何做到这两点呢?还是要看源文件bgfg_gaussmix2.cpp了,文件中有关该算法的来源是这样说的(个人总结)
(1)该算法实现了混合高斯模型中模型参数的自适应,从而减少了运算量,这部分的论文支撑是《Improved adaptive Gausian mixture model for background subtraction》、《Efficient Adaptive Density Estimapion per Image Pixel for the Task of Background Subtraction》、《Recursive unsupervised learning of finite mixture models》
(2)该算法实现的阴影检测,参考了论文《Detecting Moving Shadows-Algorithms and Evaluation》
好吧,一共4篇外文,没办法只能下载下来看了。当然既然这里是在总结,就肯定不会让你们再走弯路跳坑。。。首先关于(1),自己仔细看了第一篇并且完成了翻译,见点击打开链接。这篇文章很好,不仅把改进步骤和具体实现讲的很清楚,还把基本的GMM算法中更新方程的数学推导明确的给了出来。所以推荐大家去读一读点击打开链接。第二篇文章实际上就是第一篇的全部内容+作者提出的另一种背景建模方法,所以如果只是为了理解BackgroundSubtractorMOG2的话就没必要读它了,第三篇呢个人没看,不过觉得并不影响算法原理的理解(好牵强。。。)。一句话总结,看《Improved adaptive Gausian mixture model for background subtraction》并且博主已经给你们翻译好啦
接着关于(2),也是挺坑,我大致看了《Detecting Moving Shadows-Algorithms and Evaluation》,发现这篇文章内容不是提出阴影检测算法,而是对之前阴影检测算法效果的评估比较。BackgroundSubtractorMOG2中用到的阴影检测原理应该来自于《The Sakbot System for Moving Object Detection and Tracking》,这篇文章我瞄了一眼,这里直接把阴影检测的原理大致总结一下:作者在HSV空间中检测阴影(原因在于该颜色空间与人眼感知的更为接近,且对阴影产生的亮度变化更为敏感),原理是作者通过实验发现,阴影覆盖的区域像素点的亮度会降低(V减小),且H和S也会衰减,这算是一个经验性的结论,不过貌似还比较有效果。不过另外一点坑是,opencv在实现该部分内容时,还是在RGB颜色空间中,代码看上去应该还是借鉴了上面的思想。。。
接下来分析bgfg_gaussmix2.cpp文件,作者已经给出了比较详细地描述,这里我把一些重要的地方使用中文再标注了下。
#include "precomp.hpp"
namespace cv
{
/*
Interface of Gaussian mixture algorithm from:
"Improved adaptive Gausian mixture model for background subtraction"
Z.Zivkovic
International Conference Pattern Recognition, UK, August, 2004
http://www.zoranz.net/Publications/zivkovic2004ICPR.pdf
Advantages:
-fast - number of Gausssian components is constantly adapted per pixel.
-performs also shadow detection (see bgfg_segm_test.cpp example)
*/
// default parameters of gaussian background detection algorithm
// 这部分的参数建议把《Improved adaptive Gausian mixture model for background subtraction》读一遍,大部分的就基本理解了
static const int defaultHistory2 = 500; // Learning rate; alpha = 1/defaultHistory2
static const float defaultVarThreshold2 = 4.0f*4.0f;
static const int defaultNMixtures2 = 5; // maximal number of Gaussians in mixture
static const float defaultBackgroundRatio2 = 0.9f; // threshold sum of weights for background test
static const float defaultVarThresholdGen2 = 3.0f*3.0f;
static const float defaultVarInit2 = 15.0f; // initial variance for new components
static const float defaultVarMax2 = 5*defaultVarInit2;
static const float defaultVarMin2 = 4.0f;
// additional parameters
static const float defaultfCT2 = 0.05f; // complexity reduction prior constant 0 - no reduction of number of components
static const unsigned char defaultnShadowDetection2 = (unsigned char)127; // value to use in the segmentation mask for shadows, set 0 not to do shadow detection
static const float defaultfTau = 0.5f; // Tau - shadow threshold, see the paper for explanation
// 注意这个结构体后面没有用到,不用管它
struct GaussBGStatModel2Params
{
//image info
int nWidth;
int nHeight;
int nND;//number of data dimensions (image channels)
bool bPostFiltering;//defult 1 - do postfiltering - will make shadow detection results also give value 255
double minArea; // for postfiltering
bool bInit;//default 1, faster updates at start
/////////////////////////
//very important parameters - things you will change
////////////////////////
float fAlphaT;
//alpha - speed of update - if the time interval you want to average over is T
//set alpha=1/T. It is also usefull at start to make T slowly increase
//from 1 until the desired T
float fTb;
//Tb - threshold on the squared Mahalan. dist. to decide if it is well described
//by the background model or not. Related to Cthr from the paper.
//This does not influence the update of the background. A typical value could be 4 sigma
//and that is Tb=4*4=16;
/////////////////////////
//less important parameters - things you might change but be carefull
////////////////////////
float fTg;
//Tg - threshold on the squared Mahalan. dist. to decide
//when a sample is close to the existing components. If it is not close
//to any a new component will be generated. I use 3 sigma => Tg=3*3=9.
//Smaller Tg leads to more generated components and higher Tg might make
//lead to small number of components but they can grow too large
float fTB;//1-cf from the paper
//TB - threshold when the component becomes significant enough to be included into
//the background model. It is the TB=1-cf from the paper. So I use cf=0.1 => TB=0.
//For alpha=0.001 it means that the mode should exist for approximately 105 frames before
//it is considered foreground
float fVarInit;
float fVarMax;
float fVarMin;
//initial standard deviation for the newly generated components.
//It will will influence the speed of adaptation. A good guess should be made.
//A simple way is to estimate the typical standard deviation from the images.
//I used here 10 as a reasonable value
float fCT;//CT - complexity reduction prior
//this is related to the number of samples needed to accept that a component
//actually exists. We use CT=0.05 of all the samples. By setting CT=0 you get
//the standard Stauffer&Grimson algorithm (maybe not exact but very similar)
//even less important parameters
int nM;//max number of modes - const - 4 is usually enough
//shadow detection parameters
bool bShadowDetection;//default 1 - do shadow detection
unsigned char nShadowDetection;//do shadow detection - insert this value as the detection result
float fTau;
// Tau - shadow threshold. The shadow is detected if the pixel is darker
//version of the background. Tau is a threshold on how much darker the shadow can be.
//Tau= 0.5 means that if pixel is more than 2 times darker then it is not shadow
//See: Prati,Mikic,Trivedi,Cucchiarra,"Detecting Moving Shadows...",IEEE PAMI,2003.
};
// 定义了高斯模型中有关权重和方差的结构体
struct GMM
{
float weight;
float variance;
};
// shadow detection performed per pixel
// should work for rgb data, could be usefull for gray scale and depth data as well
// See: Prati,Mikic,Trivedi,Cucchiarra,"Detecting Moving Shadows...",IEEE PAMI,2003.
static CV_INLINE bool
detectShadowGMM(const float* data, int nchannels, int nmodes,
const GMM* gmm, const float* mean,
float Tb, float TB, float tau)
{
// 输入的是像素,函数判断非背景的像素是前景还是阴影
float tWeight = 0;
// check all the components marked as background:
for( int mode = 0; mode < nmodes; mode++, mean += nchannels )
{
GMM g = gmm[mode];
float numerator = 0.0f;
float denominator = 0.0f;
for( int c = 0; c < nchannels; c++ )
{
numerator += data[c] * mean[c];
denominator += mean[c] * mean[c];// 使用高斯分布中的均值计算得到近似的不受到前景影响的“背景”
}
// no division by zero allowed
if( denominator == 0 )
return false;
// 大前提是该像素“颜色”相对于“背景”有所衰减
// if tau < a < 1 then also check the color distortion
if( numerator <= denominator && numerator >= tau*denominator )
{
float a = numerator / denominator;
float dist2a = 0.0f;
for( int c = 0; c < nchannels; c++ )
{
float dD= a*mean[c] - data[c];
dist2a += dD*dD;
}
// 没看懂,感觉像是作者的经验公式
if (dist2a < Tb*g.variance*a*a)
return true;
};
tWeight += g.weight;
if( tWeight > TB )
return false;
};
return false;
}
//update GMM - the base update function performed per pixel
//
//"Efficient Adaptive Density Estimapion per Image Pixel for the Task of Background Subtraction"
//Z.Zivkovic, F. van der Heijden
//Pattern Recognition Letters, vol. 27, no. 7, pages 773-780, 2006.
//
//The algorithm similar to the standard Stauffer&Grimson algorithm with
//additional selection of the number of the Gaussian components based on:
//
//"Recursive unsupervised learning of finite mixture models "
//Z.Zivkovic, F.van der Heijden
//IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.26, no.5, pages 651-656, 2004
//http://www.zoranz.net/Publications/zivkovic2004PAMI.pdf
// 通过该结构体实现算法的并行计算,可以百度下parallel_for_大致了解下
struct MOG2Invoker : ParallelLoopBody
{
MOG2Invoker(const Mat& _src, Mat& _dst,
GMM* _gmm, float* _mean,
uchar* _modesUsed,
int _nmixtures, float _alphaT,
float _Tb, float _TB, float _Tg,
float _varInit, float _varMin, float _varMax,
float _prune, float _tau, bool _detectShadows,
uchar _shadowVal)
{
src = &_src;// 原图
dst = &_dst;
gmm0 = _gmm;
mean0 = _mean;
modesUsed0 = _modesUsed;
nmixtures = _nmixtures;
alphaT = _alphaT;
Tb = _Tb;
TB = _TB;
Tg = _Tg;
varInit = _varInit;
varMin = MIN(_varMin, _varMax);
varMax = MAX(_varMin, _varMax);
prune = _prune;
tau = _tau;
detectShadows = _detectShadows;
shadowVal = _shadowVal;
cvtfunc = src->depth() != CV_32F ? getConvertFunc(src->depth(), CV_32F) : 0;
}
void operator()(const Range& range) const
{
int y0 = range.start, y1 = range.end;// 每个并行计算单元的输入是一行图像
int ncols = src->cols, nchannels = src->channels();
AutoBuffer buf(src->cols*nchannels);
float alpha1 = 1.f - alphaT;
float dData[CV_CN_MAX];
for( int y = y0; y < y1; y++ )
{
const float* data = buf;
if( cvtfunc )
cvtfunc( src->ptr(y), src->step, 0, 0, (uchar*)data, 0, Size(ncols*nchannels, 1), 0);// 转换为1XN的图像,N为原图的列数
else
data = src->ptr(y);
float* mean = mean0 + ncols*nmixtures*nchannels*y;
GMM* gmm = gmm0 + ncols*nmixtures*y;
uchar* modesUsed = modesUsed0 + ncols*y;
uchar* mask = dst->ptr(y);
// 遍历1XN图像的每个像素
for( int x = 0; x < ncols; x++, data += nchannels, gmm += nmixtures, mean += nmixtures*nchannels )
{
//calculate distances to the modes (+ sort)
//here we need to go in descending order!!!
bool background = false;//return value -> true - the pixel classified as background
//internal:
bool fitsPDF = false;//if it remains zero a new GMM mode will be added
int nmodes = modesUsed[x], nNewModes = nmodes;//current number of modes in GMM
float totalWeight = 0.f;
float* mean_m = mean;
//////
//go through all modes
// 1.计算是否符合当前混合模型
for( int mode = 0; mode < nmodes; mode++, mean_m += nchannels )
{
float weight = alpha1*gmm[mode].weight + prune;//need only weight if fit is found
int swap_count = 0;
////
//fit not found yet
if( !fitsPDF )
{
//check if it belongs to some of the remaining modes
float var = gmm[mode].variance;
//calculate difference and distance
float dist2;
if( nchannels == 3 )
{
dData[0] = mean_m[0] - data[0];
dData[1] = mean_m[1] - data[1];
dData[2] = mean_m[2] - data[2];
dist2 = dData[0]*dData[0] + dData[1]*dData[1] + dData[2]*dData[2];
}
else
{
dist2 = 0.f;
for( int c = 0; c < nchannels; c++ )
{
dData[c] = mean_m[c] - data[c];
dist2 += dData[c]*dData[c];
}
}
//background? - Tb - usually larger than Tg
if( totalWeight < TB && dist2 < Tb*var )
background = true;
//check fit
if( dist2 < Tg*var )
{
/////
//belongs to the mode
fitsPDF = true;
//update distribution
//update weight
weight += alphaT;
float k = alphaT/weight;
//update mean
for( int c = 0; c < nchannels; c++ )
mean_m[c] -= k*dData[c];
//update variance
float varnew = var + k*(dist2-var);
//limit the variance
varnew = MAX(varnew, varMin);
varnew = MIN(varnew, varMax);
gmm[mode].variance = varnew;
//sort
//all other weights are at the same place and
//only the matched (iModes) is higher -> just find the new place for it
for( int i = mode; i > 0; i-- )
{
//check one up
if( weight < gmm[i-1].weight )
break;
swap_count++;
//swap one up
std::swap(gmm[i], gmm[i-1]);
for( int c = 0; c < nchannels; c++ )
std::swap(mean[i*nchannels + c], mean[(i-1)*nchannels + c]);
}
//belongs to the mode - bFitsPDF becomes 1
/////
}
}//!bFitsPDF)
//check prune
if( weight < -prune )// 保证下一次运算中模型的权值非负
{
weight = 0.0;
nmodes--;// 丢弃掉该模型
}
gmm[mode-swap_count].weight = weight;//update weight by the calculated value
totalWeight += weight;
}
//go through all modes
//////
//renormalize weights
// 2.权重归一化
totalWeight = 1.f/totalWeight;
for( int mode = 0; mode < nmodes; mode++ )
{
gmm[mode].weight *= totalWeight;
}
nmodes = nNewModes;
//make new mode if needed and exit
// 3.根据情况增加新的高斯分布
if( !fitsPDF )
{
// replace the weakest or add a new one
int mode = nmodes == nmixtures ? nmixtures-1 : nmodes++;
if (nmodes==1)
gmm[mode].weight = 1.f;
else
{
gmm[mode].weight = alphaT;
// renormalize all other weights
for( int i = 0; i < nmodes-1; i++ )
gmm[i].weight *= alpha1;
}
// init
for( int c = 0; c < nchannels; c++ )
mean[mode*nchannels + c] = data[c];
gmm[mode].variance = varInit;
//sort
//find the new place for it
for( int i = nmodes - 1; i > 0; i-- )
{
// check one up
if( alphaT < gmm[i-1].weight )
break;
// swap one up
std::swap(gmm[i], gmm[i-1]);
for( int c = 0; c < nchannels; c++ )
std::swap(mean[i*nchannels + c], mean[(i-1)*nchannels + c]);
}
}
//set the number of modes
modesUsed[x] = uchar(nmodes);// 更新GMM中实际使用的模型个数
// 4.如果不是背景,根据参数确定是否继续进行阴影判断
mask[x] = background ? 0 :
detectShadows && detectShadowGMM(data, nchannels, nmodes, gmm, mean, Tb, TB, tau) ?
shadowVal : 255;
}
}
}
const Mat* src;
Mat* dst;
GMM* gmm0;
float* mean0;
uchar* modesUsed0;
int nmixtures;
float alphaT, Tb, TB, Tg;
float varInit, varMin, varMax, prune, tau;
bool detectShadows;
uchar shadowVal;
BinaryFunc cvtfunc;
};
BackgroundSubtractorMOG2::BackgroundSubtractorMOG2()
{
frameSize = Size(0,0);
frameType = 0;
nframes = 0;
history = defaultHistory2;
varThreshold = defaultVarThreshold2;
bShadowDetection = 1;
nmixtures = defaultNMixtures2;
backgroundRatio = defaultBackgroundRatio2;
fVarInit = defaultVarInit2;
fVarMax = defaultVarMax2;
fVarMin = defaultVarMin2;
varThresholdGen = defaultVarThresholdGen2;
fCT = defaultfCT2;
nShadowDetection = defaultnShadowDetection2;
fTau = defaultfTau;
}
BackgroundSubtractorMOG2::BackgroundSubtractorMOG2(int _history, float _varThreshold, bool _bShadowDetection)
{
frameSize = Size(0,0);
frameType = 0;
nframes = 0;
history = _history > 0 ? _history : defaultHistory2;
varThreshold = (_varThreshold>0)? _varThreshold : defaultVarThreshold2;
bShadowDetection = _bShadowDetection;
nmixtures = defaultNMixtures2;
backgroundRatio = defaultBackgroundRatio2;
fVarInit = defaultVarInit2;
fVarMax = defaultVarMax2;
fVarMin = defaultVarMin2;
varThresholdGen = defaultVarThresholdGen2;
fCT = defaultfCT2;
nShadowDetection = defaultnShadowDetection2;
fTau = defaultfTau;
}
BackgroundSubtractorMOG2::~BackgroundSubtractorMOG2()
{
}
void BackgroundSubtractorMOG2::initialize(Size _frameSize, int _frameType)
{
frameSize = _frameSize;
frameType = _frameType;
nframes = 0;
int nchannels = CV_MAT_CN(frameType);
CV_Assert( nchannels <= CV_CN_MAX );
// for each gaussian mixture of each pixel bg model we store ...
// the mixture weight (w),
// the mean (nchannels values) and
// the covariance
bgmodel.create( 1, frameSize.height*frameSize.width*nmixtures*(2 + nchannels), CV_32F );
//make the array for keeping track of the used modes per pixel - all zeros at start
bgmodelUsedModes.create(frameSize,CV_8U);
bgmodelUsedModes = Scalar::all(0);
}
void BackgroundSubtractorMOG2::operator()(InputArray _image, OutputArray _fgmask, double learningRate)
{
Mat image = _image.getMat();
bool needToInitialize = nframes == 0 || learningRate >= 1 || image.size() != frameSize || image.type() != frameType;
if( needToInitialize )
initialize(image.size(), image.type());
_fgmask.create( image.size(), CV_8U );
Mat fgmask = _fgmask.getMat();
++nframes;
learningRate = learningRate >= 0 && nframes > 1 ? learningRate : 1./min( 2*nframes, history );
CV_Assert(learningRate >= 0);
// 并行计算,应该是分为了image.rows个计算单元,所以每个计算单元只计算每一行的“图像”
parallel_for_(Range(0, image.rows),
MOG2Invoker(image, fgmask,
(GMM*)bgmodel.data,
(float*)(bgmodel.data + sizeof(GMM)*nmixtures*image.rows*image.cols),
bgmodelUsedModes.data, nmixtures, (float)learningRate,
(float)varThreshold,
backgroundRatio, varThresholdGen,
fVarInit, fVarMin, fVarMax, float(-learningRate*fCT), fTau,
bShadowDetection, nShadowDetection));
}
// 可以调用该函数得到视频中每一帧的不受前景物体存在影响的近似背景图像
void BackgroundSubtractorMOG2::getBackgroundImage(OutputArray backgroundImage) const
{
int nchannels = CV_MAT_CN(frameType);
CV_Assert(nchannels == 1 || nchannels == 3);
Mat meanBackground(frameSize, CV_MAKETYPE(CV_8U, nchannels), Scalar::all(0));
int firstGaussianIdx = 0;
const GMM* gmm = (GMM*)bgmodel.data;
const float* mean = reinterpret_cast(gmm + frameSize.width*frameSize.height*nmixtures);
std::vector meanVal(nchannels, 0.f);
for(int row=0; row(row, col);
float totalWeight = 0.f;
for(int gaussianIdx = firstGaussianIdx; gaussianIdx < firstGaussianIdx + nmodes; gaussianIdx++)
{
GMM gaussian = gmm[gaussianIdx];
size_t meanPosition = gaussianIdx*nchannels;
for(int chn = 0; chn < nchannels; chn++)
{
meanVal[chn] += gaussian.weight * mean[meanPosition + chn];// 核心代码,就是利用多个高斯分布的权值、均值的加权和作为背景像素值
}
totalWeight += gaussian.weight;
if(totalWeight > backgroundRatio)
break;
}
float invWeight = 1.f/totalWeight;
switch(nchannels)
{
case 1:
meanBackground.at(row, col) = (uchar)(meanVal[0] * invWeight);
meanVal[0] = 0.f;
break;
case 3:
Vec3f& meanVec = *reinterpret_cast(&meanVal[0]);
meanBackground.at(row, col) = Vec3b(meanVec * invWeight);
meanVec = 0.f;
break;
}
firstGaussianIdx += nmixtures;
}
}
meanBackground.copyTo(backgroundImage);
}
}