mean-shift 的特点是把支撑空间和特征空间在数据密度的框架下综合了起来。对图像来讲,支撑空间就是像素点的坐标,特征空间就是对应像素点的灰度或者RGB三分量。将这两个空间综合后,一个数据点就是一个5维的向量:[x,y,r,g,b]。
这在观念上看似简单,实质是一个飞跃,它是mean-shift方法的基点。
mean-shift方法很宝贵的一个特点就是在这样迭代计算的框架下,求得的mean-shift向量必收敛于数据密度的局部最大点。可以细看[ComaniciuMeer2002]的文章。
写了点程序,可以对图像做简单的mean-shift filtering,供参考:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [DRGB, DSD, MSSD] = MScut(sMode, RGB_raw, hs, hf, m );
% designed for segmenting a colour image using mean-shift [ComaniciuMeer 2002]
% image must be color
% procedure in mean-shift
% 1. combine support space and feature space to make a mean-shift space
% based data description
% 2. for every mean-shift space data
% 3. do mean-shift filtering
% until convergence
% 4. end
% 5. find the converged mean-shift space data that you are interested in
% and label it
% 6. repeat the above steps
%
% a -- data in support space
% b -- data in feature space
% x -- data in mean-shift space
% f(.) -- data density function
% k(.) -- profile function (implicit)
% g(.) -- profile function (explicit)
% m -- mean shift vector
% hs -- bandwidth in support space
% hf -- bandwidth in feature space
% M -- threshold to make a distinct cluster
%% enter $hs$, $hf$, $m$ if necessary
if ~exist('hs')
hs = input('please enter spatial bandwidth (hs):\n');
end
if ~exist('hf')
hf = input('please enter feature bandwidth (hf):\n');
end
if ~exist('m')
m = input('please enter minimum cluster size (m):\n');
end
switch upper(sMode)
case 'RGB'
RGB = double( RGB_raw );
case 'gray'
error('FCMcut must use colored image to do segmentation!')
end
sz = size(RGB);
mTCUT = Tcut( RGB(:,:,1) ); % trivial segmentation
%% project data into mean-shift space to make $MSSD$ (mean-shift space data)
mT = repmat([1:sz(1)]', 1, sz(2));
vX = mT(1:end)'; % row
mT = repmat([1:sz(2)], sz(1), 1);
vY = mT(1:end)'; % column
mT = RGB(:,:,1);
vR = mT(1:end)'; % red
mT = RGB(:,:,2);
vG = mT(1:end)'; % green
mT = RGB(:,:,3);
vB = mT(1:end)'; % blue
MSSD = [vX, vY, vR, vG, vB];
%% make $g$ - explicit profile function
disp('Using flat kernel: Epanechnikov kernel...')
g_s = ones(2*hs+1, 2); % 's' for support space
g_f = ones(2*hf+1, 3); % 'f' for feature space
%% main part $$
nIteration = 4;
nData = length(MSSD); % total number of data
DSD = MSSD*0; % 'DSD' for destination space data
for k = 1:nData
%
tMSSD = MSSD(k,:); % 't' for temp
for l = 1:nIteration
%
mT = abs( MSSD - repmat(tMSSD, nData, 1));
vT = logical( (mT(:,1)<=hs).*(mT(:,2)<=hs).*(mT(:,3)<=hf).*(mT(:,4)<=hf).*(mT(:,5)<=hf) );
v = MSSD(vT,:);
% update $tMSSD$
tMSSD = mean( v, 1 );
if nIteration == l
DSD(k,:) = tMSSD;
end
end
end
% show result
DRGB = RGB * 0;
DRGB(:,:,1) = reshape(DSD(:,3), sz(1), sz(2)); % red
DRGB(:,:,2) = reshape(DSD(:,4), sz(1), sz(2)); % red
DRGB(:,:,3) = reshape(DSD(:,5), sz(1), sz(2)); % red
figure, imshow(uint8(DRGB), [])
Mean Shift算法,一般是指一个迭代的步骤,即先算出当前点的偏移均值,移动该点到其偏移均值,然后以此为新的起始点,继续移动,直到满足一定的条件结束.
给定d维空间Rd的n个样本点 ,i=1,…,n,在空间中任选一点x,那么Mean Shift向量的基本形式定义为:
Sk是一个半径为h的高维球区域,满足以下关系的y点的集合,
k表示在这n个样本点xi中,有k个点落入Sk区域中.
以上是官方的说法,即书上的定义,我的理解就是,在d维空间中,任选一个点,然后以这个点为圆心,h为半径做一个高维球,因为有d维,d可能大于2,所以是高维球。落在这个球内的所有点和圆心都会产生一个向量,向量是以圆心为起点落在球内的点位终点。然后把这些向量都相加。相加的结果就是Meanshift向量。
如图所以。其中黄色箭头就是Mh(meanshift向量)。
再以meanshift向量的终点为圆心,再做一个高维的球。如下图所以,重复以上步骤,就可得到一个meanshift向量。如此重复下去,meanshift算法可以收敛到概率密度最大得地方。也就是最稠密的地方。
最终的结果如下:
Meanshift推导:
把基本的meanshift向量加入核函数,核函数的性质在这篇博客介绍:http://www.cnblogs.com/liqizhou/archive/2012/05/11/2495788.html
那么,meanshift算法变形为
(1)
解释一下K()核函数,h为半径,Ck,d/nhd 为单位密度,要使得上式f得到最大,最容易想到的就是对上式进行求导,的确meanshift就是对上式进行求导.
(2)
令:
K(x)叫做g(x)的影子核,名字听上去听深奥的,也就是求导的负方向,那么上式可以表示
对于上式,如果才用高斯核,那么,第一项就等于fh,k
第二项就相当于一个meanshift向量的式子:
那么(2)就可以表示为
下图分析的构成,如图所以,可以很清晰的表达其构成。
要使得=0,当且仅当=0,可以得出新的圆心坐标:
(3)
上面介绍了meanshift的流程,但是比较散,下面具体给出它的算法流程。
真正大牛的人就能创造算法,例如像meanshift,em这个样的算法,这样的创新才能推动整个学科的发展。还有的人就是把算法运用的实际的运用中,推动整个工业进步,也就是技术的进步。下面介绍meashift算法怎样运用到图像上的聚类核跟踪。
一般一个图像就是个矩阵,像素点均匀的分布在图像上,就没有点的稠密性。所以怎样来定义点的概率密度,这才是最关键的。
如果我们就算点x的概率密度,采用的方法如下:以x为圆心,以h为半径。落在球内的点位xi 定义二个模式规则。
(1)x像素点的颜色与xi像素点颜色越相近,我们定义概率密度越高。
(2)离x的位置越近的像素点xi,定义概率密度越高。
所以定义总的概率密度,是二个规则概率密度乘积的结果,可以(4)表示
(4)
其中:代表空间位置的信息,离远点越近,其值就越大,表示颜色信息,颜色越相似,其值越大。如图左上角图片,按照(4)计算的概率密度如图右上。利用meanshift对其聚类,可得到左下角的图。
Its been quite some time since I wrote a Data Mining post . Today, I intend to post on Mean Shift – a really cool but not very well known algorithm. The basic idea is quite simple but the results are amazing. It was invented long back in 1975 but was not widely used till two papers applied the algorithm to Computer Vision.
I learned this algorithm in my Advanced Data Mining course and I wrote the lecture notes on it. So here I am trying to convert my lecture notes to a post. I have tried to simplify it – but this post is quite involved than the other posts.
It is quite sad that there exists no good post on such a good algorithm. While writing my lecture notes, I struggled a lot for good resources . The 3 “classic" papers on Mean Shift are quite hard to understand. Most of the other resources are usually from Computer Vision courses where Mean Shift is taught lightly as yet another technique for vision tasks (like segmentation) and contains only the main intuition and the formulas.
As a disclaimer, there might be errors in my exposition – so if you find anything wrong please let me know and I will fix it. You can always check out the reference for more details. I have not included any graphics in it but you can check the ppt given in the references for an animation of Mean Shift.
Mean Shift is a powerful and versatile non parametric iterative algorithm that can be used for lot of purposes like finding modes, clustering etc. Mean Shift was introduced in Fukunaga and Hostetler [1] and has been extended to be applicable in other fields like Computer Vision.This document will provide a discussion of Mean Shift , prove its convergence and slightly discuss its important applications.
This section provides an intuitive idea of Mean shift and the later sections will expand the idea. Mean shift considers feature space as a empirical probability density function. If the input is a set of points then Mean shift considers them as sampled from the underlying probability density function. If dense regions (or clusters) are present in the feature space , then they correspond to the mode (or local maxima) of the probability density function. We can also identify clusters associated with the given mode using Mean Shift.
For each data point, Mean shift associates it with the nearby peak of the dataset’s probability density function. For each data point, Mean shift defines a window around it and computes the mean of the data point . Then it shifts the center of the window to the mean and repeats the algorithm till it converges. After each iteration, we can consider that the window shifts to a more denser region of the dataset.
At the high level, we can specify Mean Shift as follows :
1. Fix a window around each data point.
2. Compute the mean of data within the window.
3. Shift the window to the mean and repeat till convergence.
A kernel is a function that satisfies the following requirements :
1.
2.
Some examples of kernels include :
1. Rectangular
2. Gaussian
3. Epanechnikov
Kernel Density Estimation
Kernel density estimation is a non parametric way to estimate the density function of a random variable. This is usually called as the Parzen window technique. Given a kernel K, bandwidth parameter h , Kernel density estimator for a given set of d-dimensional points is
Mean shift can be considered to based on Gradient ascent on the density contour. The generic formula for gradient ascent is ,
Applying it to kernel density estimator,
Setting it to 0 we get,
Finally , we get
As explained above, Mean shift treats the points the feature space as an probability density function . Dense regions in feature space corresponds to local maxima or modes. So for each data point, we perform gradient ascent on the local estimated density until convergence. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.
Assuming , we have
The quantity is called as the mean shift. So mean shift procedure can be summarized as : For each point
1. Compute mean shift vector
2. Move the density estimation window by
3. Repeat till convergence
Using a Gaussian kernel as an example,
Using the kernel profile,
To prove the convergence , we have to prove that
But since the kernel is a convex function we have ,
Using it ,
Thus we have proven that the sequence is convergent. The second part of the proof in [2] which tries to prove the sequence is convergent is wrong.
The classic mean shift algorithm is time intensive. The time complexity of it is given by where is the number of iterations and is the number of data points in the data set. Many improvements have been made to the mean shift algorithm to make it converge faster.
One of them is the adaptive Mean Shift where you let the bandwidth parameter vary for each data point. Here, the parameter is calculated using kNN algorithm. If is the k-nearest neighbor of then the bandwidth is calculated as
Here we use or norm to find the bandwidth.
An alternate way to speed up convergence is to alter the data points
during the course of Mean Shift. Again using a Gaussian kernel as
an example,
1. Even though mean shift is a non parametric algorithm , it does require the bandwidth parameter h to be tuned. We can use kNN to find out the bandwidth. The choice of bandwidth in influences convergence rate and the number of clusters.
2. Choice of bandwidth parameter h is critical. A large h might result in incorrect
clustering and might merge distinct clusters. A very small h might result in too many clusters.
3. When using kNN to determining h, the choice of k influences the value of h. For good results, k has to increase when the dimension of the data increases.
4. Mean shift might not work well in higher dimensions. In higher dimensions , the number of local maxima is pretty high and it might converge to a local optima soon.
5. Epanechnikov kernel has a clear cutoff and is optimal in bias-variance tradeoff.
Mean shift is a versatile algorithm that has found a lot of practical applications – especially in the computer vision field. In the computer vision, the dimensions are usually low (e.g. the color profile of the image). Hence mean shift is used to perform lot of common tasks in vision.
Clustering
The most important application is using Mean Shift for clustering. The fact that Mean Shift does not make assumptions about the number of clusters or the shape of the cluster makes it ideal for handling clusters of arbitrary shape and number.
Although, Mean Shift is primarily a mode finding algorithm , we can find clusters using it. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.
An alternate way is to use the concept of Basin of Attraction. Informally, the set of points that converge to the same mode forms the basin of attraction for that mode. All the points in the same basin of attraction are associated with the same cluster. The number of clusters is obtained by the number of modes.
Computer Vision Applications
Mean Shift is used in multiple tasks in Computer Vision like segmentation, tracking, discontinuity preserving smoothing etc. For more details see [2],[8].
Note : I have discussed K-Means at K-Means Clustering Algorithm. You can use it to brush it up if you want.
K-Means is one of most popular clustering algorithms. It is simple,fast and efficient. We can compare Mean Shift with K-Means on number of parameters.
One of the most important difference is that K-means makes two broad assumptions – the number of clusters is already known and the clusters are shaped spherically (or elliptically). Mean shift , being a non parametric algorithm, does not assume anything about number of clusters. The number of modes give the number of clusters. Also, since it is based on density estimation, it can handle arbitrarily shaped clusters.
K-means is very sensitive to initializations. A wrong initialization can delay convergence or some times even result in wrong clusters. Mean shift is fairly robust to initializations. Typically, mean shift is run for each point or some times points are selected uniformly from the feature space [2] . Similarly, K-means is sensitive to outliers but Mean Shift is not very sensitive.
K-means is fast and has a time complexity where k is the number of clusters, n is the number of points and T is the number of iterations. Classic mean shift is computationally expensive with a time complexity .
Mean shift is sensitive to the selection of bandwidth, . A small can slow down the convergence. A large can speed up convergence but might merge two modes. But still, there are many techniques to determine reasonably well.
Update [30 Apr 2010] : I did not expect this reasonably technical post to become very popular, yet it did ! Some of the people who read it asked for a sample source code. I did write one in Matlab which randomly generates some points according to several gaussian distribution and the clusters using Mean Shift . It implements both the basic algorithm and also the adaptive algorithm. You can download my Mean Shift code here. Comments are as always welcome !
1. Fukunaga and Hostetler, "The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition", IEEE Transactions on Information Theory vol 21 , pp 32-40 ,1975
2. Dorin Comaniciu and Peter Meer, Mean Shift : A Robust approach towards feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 24 No 5 May 2002.
3. Yizong Cheng , Mean Shift, Mode Seeking, and Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 17 No 8 Aug 1995.
4. Mean Shift Clustering by Konstantinos G. Derpanis
5. Chris Ding Lectures CSE 6339 Spring 2010.
6. Dijun Luo’s presentation slides.
7. cs.nyu.edu/~fergus/teaching/vision/12_segmentation.ppt
8. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, Kernel-Based Object Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 25 No 5 May 2003.
9. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, The Variable Bandwidth Mean Shift and Data-Driven Scale Selection, ICCV 2001.