度量(Metric)的定义
在数学中,一个度量(或距离函数)是一个定义集合中元素之间距离的函数。一个具有度量的集合被称为度量空间。
度量学习(Metric Learning)也就是常说的相似度学习。距离测度学习的目的即为了衡量样本之间的相近程度,而这也正是模式识别的核心问题之一。大量的机器学习方法,比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法,还有一些基于图的方法,其性能好坏都主要有样本之间的相似度量方法的选择决定。如果需要计算两张图片之间的相似度,如何度量图片之间的相似度使得不同类别的图片相似度小而相同类别的图片相似度大(maximize the inter-class variations and minimize the intra-class variations)就是度量学习的目标。
例如如果我们的目标是识别人脸,那么就需要构建一个距离函数去强化合适的特征(如发色,脸型等);而如果我们的目标是识别姿势,那么就需要构建一个捕获姿势相似度的距离函数。为了处理各种各样的特征相似度,我们可以在特定的任务通过选择合适的特征并手动构建距离函数。然而这种方法会需要很大的人工投入,也可能对数据的改变非常不鲁棒。度量学习作为一个理想的替代,可以根据不同的任务来自主学习出针对某个特定任务的度量距离函数。
Eric Xing在NIPS 2002提出。
度量学习通常的目标是使同类样本之间的距离尽可能缩小,不同类样本之间的距离尽可能放大。
TODO
人脸识别、物体识别、音乐的相似性、人体姿势估计、信息检索、语音识别、手写体识别等领域。
很多的算法越来越依赖于在输入空间给定的好的度量。例如K-means、K近邻方法、SVM等算法需要给定好的度量来反映数据间存在的一些重要关系。这一问题在无监督的方法(如聚类)中尤为明显。举一个实际的例子,考虑图1的问题,假设我们需要计算这些图像之间的相似度(或距离,下同)(例如用于聚类或近邻分类)。面临的一个基本的问题是如何获取图像之间的相似度,例如如果我们的目标是识别人脸,那么就需要构建一个距离函数去强化合适的特征(如发色,脸型等);而如果我们的目标是识别姿势,那么就需要构建一个捕获姿势相似度的距离函数。为了处理各种各样的特征相似度,我们可以在特定的任务通过选择合适的特征并手动构建距离函数。然而这种方法会需要很大的人工投入,也可能对数据的改变非常不鲁棒。度量学习作为一个理想的替代,可以根据不同的任务来自主学习出针对某个特定任务的度量距离函数。
图 1
根据相关论文[2,3,4],度量学习方法可以分为通过线性变换的度量学习和度量学习的非线性模型。
由于线性度量学习具有简洁性和可扩展性(通过核方法可扩展为非线性度量方法),现今的研究重点放在了线性度量学习问题上。线性的度量学习问题也称为马氏度量学习问题,可以分为监督的和非监督的学习算法。
监督的马氏度量学习可以分为以下两种基本类型:
I 监督的全局度量学习:该类型的算法充分利用数据的标签信息。如
II 监督的局部度量学习:该类型的算法同时考虑数据的标签信息和数据点之间的几何关系。如
此外,一些很经典的非监督线性降维算法可以看作属于非监督的马氏度量学习。如
非线性的度量学习更加的一般化,非线性降维算法可以看作属于非线性度量学习。经典的算法有等距映射(Isometric Mapping,ISOMAP) 、局部线性嵌入(Locally Linear Embedding, LLE) ,以及拉普拉斯特征映射(Laplacian Eigenmap,LE ) 等。另一个学习非线性映射的有效手段是通过核方法来对线性映射进行扩展。此外还有如下几个方面
传统马氏距离度量学习是从训练集 X 中寻找矩阵 M∈Rd×d ,计算两个样本 x1 , x2 之间的马氏距离:
由于传统方法用到的线性变换不能够捕捉面部图片所依赖的非线性流形(nonlinear manifold)
线性流型
几何空间的直线或平面具有性质:集合中任意2点生成的直线一定包含在这个集合里,即直线和平面是平和直的。把平和直的概念推广到高维就能得到线性流形的概念。
为了解决传统方法的限制,论文提到将样本投影到高维特征空间中,在高维空间中进行距离度量。
度量学习已应用于计算机视觉中的图像检索和分类、人脸识别、人类活动识别和姿势估计,文本分析和一些其他领域如音乐分析,自动化的项目调试,微阵列数据分析等[4]。
Reference 2中可找到《An Overview of Distance Metric Learning》、《Distance Metric Learning: A Comprehensive Survey》。
Methods | Locality | Linearity | Learning Strategies | Code Download |
---|---|---|---|---|
Probablistic Global Distance Metric Learning (PGDM) | global | linear | constrained convex programming | by Eric P. Xing |
Relevant Components Analysis (RCA) | global | linear | capture global structure; use equivalence constraints | by Aharon Bar-Hillel and Tomer Hertz, |
Discriminative Component Analysis (DCA) | global | linear | improve RCA by exploring negative constraints | by Steven C.H. Hoi |
Local Fisher Discriminant Analysis (LFDA) | local | linear | extend LDA by assigning greater weights to closer connecting examples | [by Masashi Sugiyama] |
Neighborhood Component Analysis (NCA) | local | linear | extend the nearest neighbor classifier toward metric learing | [by Charless C. Fowlkes] |
Large Margin NN Classifier (LMNN) | local | linear | extend NCA through a maximum margin framework | [by Kilian Q. Weinberger] |
Localized Distance Metric Learning (LDM) | local | linear | optimize local compactness and local separability in a probabilistic framework | [by Liu Yang] |
DistBoost | global | linear | learn distance functions by training binary classifiers with margins in a boosting framework | by Tomer Hertz and Aharon Bar-Hillel |
notes on calling its kernel version | ||||
Active Distance Metric Learning (BAYES+VAR) | global | linear | select example pairs with the greatest uncertainty, posterior estimation with a full Bayesian treatment | [by Liu Yang] |
- Unsupervised Distance Metric Learning
Methods | Locality | Linearity | Learning Strategies | Code Download |
---|---|---|---|---|
Principal Component Analysis(PCA) | global structure preserved | linear | best preserve the variance of the data | [by Deng Cai] |
Multidimensional Scaling(MDS) | global structure preserved | linear | best preserve inter-point distance in low-rank | [ included in Matlab Toolbox for Dimensionality Reduction] |
ISOMAP | global structure preserved | nonlinear | preserve the geodesic distances | [by J. B. Tenenbaum, V. de Silva and J. C. Langford] |
Laplacian Eigenamp (LE) | local structure preserved | nonlinear | preserve local neighbor | [by Mikhail Belkin] |
Locality Preserving Projections (LPP) | local structure preserved | linear | linear approximation to LE | [LPP by Deng Cai] |
[Kernel LPP by Deng Cai] | ||||
Locally Linear Embedding (LLE) | local structure preserved | nonlinear | nonlinear preserve local neighbor | [by Sam T. Roweis and Lawrence K. Saul] |
Hessian LLE can be found at [MANI fold Learning Matlab Demo, by Todd Wittman] | ||||
Neighborhood Preserving Embedding (NPE) | lobal structure preserved | linear | linear approximation to LLE | [by Deng Cai] |
python
from metric_learn import LMNN
import numpy as np
X = np.array([[0., 0., 1.], [0., 0., 2.], [1.,0.,0.], [2.,0.,0.], [2.,2.,2.], [2.,5.,4.]])
Y = np.array([1, 1, 2, 2, 0, 0])
lmnn = LMNN(k=2, learn_rate=1e-6)
lmnn.fit(X, Y, verbose=False)
Y_c = lmnn.transform(X)
text
>>> Y_c
array([[ 0. , -0.07987306, 0.11081795],
[ 0. , -0.15974612, 0.22163591],
[ 0.07113444, 0. , 0. ],
[ 0.14226889, 0. , 0. ],
[ 0.14226889, -0.04460763, 0.06188978],
[ 0.14226889, -0.03164602, 0.04390651]])
TODO
以下列举的论文大都对后来度量学习产生了很大影响(最高的google引用量上了5000次)。1-6篇论文是关于一些方法的论文,最后一篇为综述。
顶级会议上矩阵学习的paper清单:http://blog.csdn.net/lzt1983/article/details/7831524
近2年顶级会议上度量学习相关的论文,数量之多,颇受震动。这其中怕是不乏灌水炒作新概念的文章,看来DML大有前几年sparse coding的势头啊。
ICML 2012
Maximum Margin Output Coding
Information-theoretic Semi-supervised Metric Learning via Entropy Regularization
A Hybrid Algorithm for Convex Semidefinite Optimization
Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
Similarity Learning for Provably Accurate Sparse Linear Classification
ICML 2011
Learning Discriminative Fisher Kernels
Learning Multi-View Neighborhood Preserving Projections
CVPR 2012
Order Determination and Sparsity-Regularized Metric Learning for Adaptive Visual Tracking
Non-sparse Linear Representations for Visual Tracking with Online Reservoir Metric Learning
Unsupervised Metric Fusion by Cross Diffusion
Learning Hierarchical Similarity Metrics
Large Scale Metric Learning from Equivalence Constraints
Neighborhood Repulsed Metric Learning for Kinship Verification
Learning Robust and Discriminative Multi-Instance Distance for Cost Effective Video Classification
PCCA: a new approach for distance learning from sparse pairwise constraints
Group Action Induced Distances for Averaging and Clustering Linear Dynamical Systems with Applications to the Analysis of Dynamic Visual Scenes
CVPR 2011
A Scalable Dual Approach to Semidefinite Metric Learning
AdaBoost on Low-Rank PSD Matrices for Metric Learning with Applications in Computer Aided Diagnosis
Adaptive Metric Differential Tracking (HUST)
Tracking Low Resolution Objects by Metric Preservation (HUST)
ACM MM 2012
Optimal Semi-Supervised Metric Learning for Image Retrieval
Low Rank Metric Learning for Social Image Retrieval
Activity-Based Person Identification Using Sparse Coding and Discriminative Metric Learning
Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification
ACM MM 2011
Biased Metric Learning for Person-Independent Head Pose Estimation
ICCV 2011
Learning Mixtures of Sparse Distance Metrics for Classification and Dimensionality Reduction
Unsupervised Metric Learning for Face Identification in TV Video
Random Ensemble Metrics for Object Recognition
Learning Nonlinear Distance Functions using Neural Network for Regression with Application to Robust Human Age Estimation
Learning parameterized histogram kernels on the simplex manifold for image and action classification
ECCV 2012
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost
Dual-force Metric Learning for Robust Distractor Resistant Tracker
Learning to Match Appearances by Correlations in a Covariance Metric Space
Image Annotation Using Metric Learning in Semantic Neighbourhoods
Measuring Image Distances via Embedding in a Semantic Manifold
Supervised Earth Mover’s Distance Learning and Its Computer Vision Applications
Learning Class-to-Image Distance via Large Margin and L1-norm Regularization
Labeling Images by Integrating Sparse Multiple Distance Learning and Semantic Context Modeling
IJCAI 2011
Distance Metric Learning Under Covariate Shift
Learning a Distance Metric by Empirical Loss Minimization
AAAI 2011
Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification
NIPS 2011
Learning a Distance Metric from a Network
Learning a Tree of Metrics with Disjoint Visual Features
Metric Learning with Multiple Kernels
KDD 2012
Random Forests for Metric Learning with Implicit Pairwise Position Dependence
WSDM 2011
Mining Social Images with Distance Metric Learning for Automated Image Tagging
UCI machine learning repository:http://archive.ics.uci.edu/ml/
一些DML的参考资源,以后有时间再详细谈谈。
1. Wikipedia
2. CMU的Liu Yang总结的关于DML的综述页面。对DML的经典算法进行了分类总结,其中她总结的论文非常有价值,也是我的入门读物。
3. ECCV 2010的turorial。
4. Weinberger的页面,上面有LMNN(Distance Metric Learning for Large Margin Nearest Neighbor Classification)的论文、sclides和代码。
5. ITML(Information Throretic Metric Learning)。ITML是DML的经典算法,获得了ICML 2007的best paper award。sclides。
参考文献
[1] Xing E P, Jordan M I, Russell S, et al. Distance metric learning with application to clustering with side-information[C]//Advances in neural information processing systems. 2002: 505-512.
[2] Kulis B. Metric learning: A survey[J]. Foundations and Trends in Machine Learning, 2012, 5(4): 287-364.
[3] Yang L, Jin R. Distance metric learning: A comprehensive survey[J]. Michigan State Universiy, 2006, 2.
[4]王微. 融合全局和局部信息的度量学习方法研究[D]. 中国科学技术大学, 2014.