半监督学习问题(直推推断)的主要方法设计一个相对于intrinsic structure平滑的分类函数,本证结构由已标注和未标注的数据揭露。
给定一组数据 X={x1,...,xl,xl+1,...,xn} 以及一个标签集合 L={1,2,...,c} ,前l个数据已有标签,剩下的没有,算法的性能由这些未标注的数据的错误率测量。
半监督学习的的关键是一致性先验假设(prior assumption of consistency)。这意味着
The main differences between the various semi-supervised learning algorithms, such as spectral methods, random walks, graph mincuts and transductive SVM, lie in their way of realizing the assumption of consistency.
一个简单的迭代算法可以构造一个平滑的函数。文中方法的关键是让每个点iteratively spread 他的标签信息到他的临近,直到实现全局稳定 。
Given a point set X={x1,...,xl,xl+1,...,xn}⊂Rm and a label set L={1,...,c} the first l points xi(i≤l) are labeled as yi∈L and the remaining points xu(l+1≤u≤n) are unlabeled. The goal is to predict the label of the unlabeled points.
Let F denote the set of n×c matrices with nonnegative entries. A matrix F=[FT1,...,FTn]T2⊂F corresponds to a classication on the dataset X by labeling each point xi as a label yi=argmaxj≤cFij : We can understand F as a vectorial function F:X→Rc which assigns a vector Fi to each point xi : Define a n×c matrix Y∈F with Yij=1 if xi is labeled as yi=j and Yij=0 otherwise. Clearly, Y is consistent with the initial labels according the decision rule. The algorithm is as follows:
这个算法可以被理解为关于spreading activation network从实验心理学。首先定义两两之间的关系 W ,对角元素为0,考虑一个定义在数据集上的图网络,他的边是权重 W 。第二步是权重矩阵 W 被对称归一化。前两步和谱聚类完全相同。
第三步中的每一次迭代接受来自临近点的信息。
文中证明了该方法的收敛性
F(0)=Y
F∗=(1−α)(I−αS)−1Y
对于上边的算法文中给出了一个正则化框架的解释。损失函数为
The smoothness term essentially splits the function value at each point among the edges attached to it before computing the local changes, and the value assigned to each edge is proportional to its weight.
对F求导可得
F∗−SF∗+μ(F∗−Y)=0
F∗=11+μSF∗+mu1+μ(F∗−Y)
令 α==11+μ,andβ=μ1+μ
then
F∗=β(1−αS)−1Y