Learning with Local and Global Consistency

简介

半监督学习问题(直推推断)的主要方法设计一个相对于intrinsic structure平滑的分类函数,本证结构由已标注和未标注的数据揭露。
给定一组数据 X={x1,...,xl,xl+1,...,xn} 以及一个标签集合 L={1,2,...,c} ,前l个数据已有标签,剩下的没有,算法的性能由这些未标注的数据的错误率测量。
半监督学习的的关键是一致性先验假设(prior assumption of consistency)。这意味着

  • 靠近的数据点可能有相同的标签。
  • 有相同结构的的数据点可能有相同的标签。
    这个论点很类似与聚类假设。前一个假设是局部的,后一个全局的。经典的监督学习方法大部分只是应用了第一种假设。

The main differences between the various semi-supervised learning algorithms, such as spectral methods, random walks, graph mincuts and transductive SVM, lie in their way of realizing the assumption of consistency.

一个简单的迭代算法可以构造一个平滑的函数。文中方法的关键是让每个点iteratively spread 他的标签信息到他的临近,直到实现全局稳定 。

算法

Given a point set X={x1,...,xl,xl+1,...,xn}Rm and a label set L={1,...,c} the first l points xi(il) are labeled as yiL and the remaining points xu(l+1un) are unlabeled. The goal is to predict the label of the unlabeled points.
Let F denote the set of n×c matrices with nonnegative entries. A matrix F=[FT1,...,FTn]T2F corresponds to a classication on the dataset X by labeling each point xi as a label yi=argmaxjcFij : We can understand F as a vectorial function F:XRc which assigns a vector Fi to each point xi : Define a n×c matrix YF with Yij=1 if xi is labeled as yi=j and Yij=0 otherwise. Clearly, Y is consistent with the initial labels according the decision rule. The algorithm is as follows:
Learning with Local and Global Consistency_第1张图片
这个算法可以被理解为关于spreading activation network从实验心理学。首先定义两两之间的关系 W ,对角元素为0,考虑一个定义在数据集上的图网络,他的边是权重 W 。第二步是权重矩阵 W 被对称归一化。前两步和谱聚类完全相同。
第三步中的每一次迭代接受来自临近点的信息。

文中证明了该方法的收敛性
F(0)=Y
F=(1α)(IαS)1Y

正则化框架

对于上边的算法文中给出了一个正则化框架的解释。损失函数为

Q(F)=12i,j=1n1DiiFi1DjjFj2=μi=1nFiYi2
这里为正则化参数, 第一项是平滑性约束,后一项是拟合约束,the trade off is μ

The smoothness term essentially splits the function value at each point among the edges attached to it before computing the local changes, and the value assigned to each edge is proportional to its weight.
对F求导可得

FSF+μ(FY)=0

F=11+μSF+mu1+μ(FY)

α==11+μ,andβ=μ1+μ
then
F=β(1αS)1Y

你可能感兴趣的:(ML)