(from:http://en.wikipedia.org/wiki/Mahalanobis_distance)
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs fromEuclidean distance in that it takes into account the correlations of the data set and is scale-invariant. In other words, it is a multivariateeffect size.
Formally, the Mahalanobis distance of a multivariate vector from a group of values with mean and covariance matrix is defined as:
(注:1.这个是X和总体均值的马氏距离。2.这里的S是可逆的,那么协方差矩阵不可逆的话怎么办?)
Mahalanobis distance (or "generalized squared interpoint distance" for its squared value) can also be defined as a dissimilarity measure between two random vectors and of the same distribution with the covariance matrix :
If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance measure is called the normalized Euclidean distance:
where is the standard deviation of the ( ) over the sample set.
(源自:百度百科)
马氏优缺点: