论文下载
bib:
@ARTICLE{MaMeng2020SPamCo,
title = {Self-Paced Multi-View Co-Training},
author = {Fan Ma and Deyu Meng and Xuanyi Dong and Yi Yang},
journal = {J. Mach. Learn. Res.},
year = {2020},
volume = {21},
number = {1},
numpages = {1--38}
}
Co-training is a well-known semi-supervised learning approach which trains classifiers on two or more different views and
exchanges pseudo labels of unlabeled instances in an iterative way
.
提纲挈领的第一句。(八股文)
During the co-training process, pseudo labels of unlabeled instances are very likely to be false especially in the initial training, while the standard co-training algorithm adopts a “draw without replacement” strategy and does not remove these wrongly labeled instances from training stages.
指出现有方法的不足,第一点: 初始伪标签质量差,现有的方法不会替换(更新)以前打的伪标签。值得注意的是,一般只会提出一项不足,这篇论文提出了三点,这也意味着跟多的贡献点。
Besides, most of the traditional co-training approaches are implemented for two-view cases, and their extensions in multi-view scenarios are not intuitive.
These issues not only degenerate their performance as well as available application range but also hamper their fundamental theory.
第二点不足:现有方法大多针对两个视图,不能直观的拓展到多个视图。
Moreover, there is no optimization model to explain the objective a co-training process manages to optimize.
第三点不足:没有一个优化模型来解释一个协同训练过程管理优化的目标。
To address these issues, in this study we design a unified self-paced multi-view co-training (SPamCo) framework which draws unlabeled instances with replacement.
Two specified co-regularization terms are formulated to develop different strategies for selecting pseudo-labeled instances during training.
提出方案处理第一个不足,方案会替换前期打的伪标签(draws unlabeled instances with replacement
)。
Both forms share the same optimization strategy which is consistent with the iteration process in co-training and can be naturally extended to multi-view scenarios.
处理第二个不足,能自然的拓展到多视图(不局限于两个视图)。隐含处理了第三个不足(optimization strategy
)。
A distributed optimization strategy is also introduced to train the classifier of each view in parallel to further improve the efficiency of the algorithm.
额外的并行优化方案。
Furthermore, the SPamCo algorithm is proved to be PAC learnable, supporting its theoretical soundness.
Experiments conducted on synthetic, text categorization, person re-identification, image recognition and object detection data sets substantiate the superiority of the proposed method.
SPamCo: optimization problem
min Θ , V , Y ~ ∑ j = 1 M ( ∑ i = 1 N l ℓ i ( j ) + ∑ i = N l + 1 N l + N u ( v i ( j ) ℓ i ( j ) + f ( v i ( j ) , λ ( j ) ) ) + R ( Θ ) + R ( V ) (1) \min_{\Theta, \bm{V}, \widetilde{\bm{Y}}}\sum_{j=1}^M{( \sum_{i=1}^{N_l}\ell_i^{(j)}+\sum_{i=N_l+1}^{N_l+N_u}{(v_i ^{(j)}\ell_i^{(j)}+f(v_i^{(j)},\lambda^{(j)})}) + \mathcal{R}(\Theta})+\mathcal{R}(\bm{V}) \tag{1} Θ,V,Y minj=1∑M(i=1∑Nlℓi(j)+i=Nl+1∑Nl+Nu(vi(j)ℓi(j)+f(vi(j),λ(j)))+R(Θ)+R(V)(1)
Self-paced Regularization term:
f ( v i ( j ) , λ ( j ) ) = − λ ( j ) v i ( j ) (2) f(v_i^{(j)},\lambda^{(j)}) = -\lambda^{(j)}v_i^{(j)} \tag{2} f(vi(j),λ(j))=−λ(j)vi(j)(2)
Co-Regularization Term:
hard:
R h ( V ) = − γ ∑ p < q ( v ( p ) ) T v ( q ) \mathcal{R}_h(\bm{V}) = -\gamma\sum_{pRh(V)=−γp<q∑(v(p))Tv(q)
v i ( j ) ∗ = { 1 , ℓ i j < λ c ( j ) + γ ∑ p ≠ j v i p ; 0 , otherwise . v_i^{(j)*} = \begin{cases} 1, \ell_i^{{j}}<\lambda_c^{(j)}+\gamma\sum_{p\neq j}{v_i^{p}};\\ 0, \text{otherwise}. \end{cases} vi(j)∗={1,ℓij<λc(j)+γ∑p=jvip;0,otherwise.
soft:
R s ( V ) = − γ ∑ p < q ( v ( p ) − v ( q ) ) T ( v ( p ) − v ( q ) ) \mathcal{R}_s(\bm{V}) = -\gamma\sum_{pRs(V)=−γp<q∑(v(p)−v(q))T(v(p)−v(q))
与硬正则之间的重要区别是,硬正则是只有选或者不选,别的视图选了,那么该样本在本视图中有更大的可能被选择。而软正则是说, v i ( j ) ∈ [ 0 , 1 ] v_i^{(j)}\in [0, 1] vi(j)∈[0,1],表示0到1的实数,软正则要求两者的选择逼近(类似与均方误差)。
v i ( j ) ∗ = { 0 , ℓ i j ≥ λ c ( j ) + γ ∑ p ≠ j v i p ; 1 , ℓ i j ≥ λ c ( j ) + γ ∑ p ≠ j ( v i p − 1 ) ; 1 M − 1 ( ∑ p ≠ j v i p + λ c j − ℓ i ( j ) γ ) , otherwise . v_i^{(j)*} = \begin{cases} 0, \ell_i^{{j}} \geq \lambda_c^{(j)}+\gamma\sum_{p\neq j}{v_i^{p}};\\ 1, \ell_i^{{j}} \geq \lambda_c^{(j)}+\gamma\sum_{p\neq j}{(v_i^{p}-1)};\\ \frac{1}{M-1}(\sum_{p\neq j}{v_i^{p}}+ \frac{\lambda_c^{j}-\ell_i^{(j)}}{\gamma}), \text{otherwise}.\\ \end{cases} vi(j)∗=⎩ ⎨ ⎧0,ℓij≥λc(j)+γ∑p=jvip;1,ℓij≥λc(j)+γ∑p=j(vip−1);M−11(∑p=jvip+γλcj−ℓi(j)),otherwise.
串行版本的SPamCo,体现在每个视图串行更新。
Note:
这里有一点让我很在意,就是为什么伪代码中的 v ( vid ) \bm{v}^{(\text{vid})} v(vid)要更新两次。主要的原因在于,模型在其中发生了更新,而 v ( vid ) \bm{v}^{(\text{vid})} v(vid)与模型的当前预测密切相关。也可以看作,第一次更新 v ( vid ) \bm{v}^{(\text{vid})} v(vid)是为了更新当前视图的模型参数,第二次更新 v ( vid ) \bm{v}^{(\text{vid})} v(vid)是为了选择自己自信的无标记样本给其他的视图。
串行版本的SPamCo,体现在每个视图并行更新,用其他视图的上一个版本的 v i , t − 1 ( j ) v_{i,t-1}^{(j)} vi,t−1(j)来更新当前版本当前视图的 v i , t ( m ) v_{i,t}^{(m)} vi,t(m)。
Github
from sklearn.datasets import make_moons, make_classification, make_circles, make_blobs
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
import numpy as np
from itertools import cycle, islice
import matplotlib
import warnings
from matplotlib.ticker import MaxNLocator
import copy
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import LinearSVC, SVC, NuSVC
from copy import deepcopy
warnings.filterwarnings("ignore")
# matplotlib.rcParams.update({'font.size': 10})
def sel_ids_y(score, add_num=10):
ids_sort = np.argsort(score)
add_id = np.zeros(score.shape[0])
add_id[ids_sort[:add_num]] = -1
add_id[ids_sort[-add_num:]] = 1
# 同时获取前add_num和后add_num个样本为负正样本
return add_id
def update_train_untrain(sel_ids, train_data, train_labels, untrain_data, weights=None):
# sel_ids = np.array(sel_ids, dtype='bool')
add_ids = np.where(np.array(sel_ids) != 0)[0]
untrain_ids = np.where(np.array(sel_ids) == 0)[0]
add_datas = [d[add_ids] for d in untrain_data]
new_train_data = [np.concatenate([d1, d2]) for d1, d2 in zip(train_data, add_datas)]
add_y = [1 if sel_ids[idx] > 0 else 0 for idx in add_ids]
new_train_y = np.concatenate([train_labels, add_y])
new_untrain_data = [d[untrain_ids] for d in untrain_data]
return new_train_data, new_train_y, new_untrain_data
def cotrain(labeled_data, labels, unlabeled_data, iter_step=1):
lbls = copy.deepcopy(labels)
for step in range(iter_step):
scores = []
add_ids = []
add_ys = []
clfs = []
for view in range(2):
clfs.append(LinearSVC())
clfs[view].fit(labeled_data[view], lbls)
scores.append(clfs[view].decision_function(unlabeled_data[view]))
add_id = sel_ids_y(scores[view], 6)
add_ids.append(add_id)
add_id = sum(add_ids)
labeled_data, lbls, unlabeled_data = update_train_untrain(add_id, labeled_data, lbls, unlabeled_data)
if len(unlabeled_data[view]) <= 0:
break
return clfs
def update_train(sel_ids, train_data, train_labels, untrain_data, pred_y):
add_ids = np.where(np.array(sel_ids) != 0)[0]
add_data = [d[add_ids] for d in untrain_data]
new_train_data = [np.concatenate([d1, d2]) for d1, d2 in zip(train_data, add_data)]
add_y = pred_y[add_ids]
new_train_y = np.concatenate([train_labels, pred_y[add_ids]])
return new_train_data, new_train_y
def spaco(l_data, lbls, u_data, iter_step=1, gamma=0.5):
# initiate classifier
clfs = []
scores = []
add_ids = []
add_num = 6
clfss = []
# initial
for view in range(2):
clfs.append(LinearSVC())
clfs[view].fit(l_data[view], lbls)
scores.append(clfs[view].decision_function(u_data[view]))
add_ids.append(sel_ids_y(scores[view], add_num))
# 置信度大于0,则为正样本;置信度为负,则为负样本
py = [0 if s < 0 else 1 for s in scores[view]]
score = sum(scores)
pred_y = np.array([0 if s < 0 else 1 for s in score])
# for each step
for step in range(iter_step):
# for each view
for view in range(2):
# 如果无标记样本不足,推出循环
if add_num * 2 > u_data[0].shape[0]:
break
# update v
ov = np.where(add_ids[1 - view] != 0)[0]
scores[view][ov] += add_ids[1 - view][ov] * gamma
add_ids[view] = sel_ids_y(scores[view], add_num)
# update w
nl_data, nlbls = update_train(add_ids[view], l_data, lbls, u_data, pred_y)
clfs[view].fit(nl_data[view], nlbls)
# update y, v
scores[view] = clfs[view].decision_function(u_data[view])
add_num += 6
# 为什么要做这一步,是应为scores[view]这一步在更新y的时候发生了改变
# 值得注意的是,这个不算重复计算
scores[view][ov] += add_ids[1 - view][ov] * gamma
add_ids[view] = sel_ids_y(scores[view], add_num)
score = sum(scores)
pred_y = np.array([0 if s < 0 else 1 for s in score])
py = [0 if s < 0 else 1 for s in scores[view]]
return clfs
def main():
# toy 2
np.random.seed(4)
X, y = make_blobs(n_samples=400, centers=2, cluster_std=0.7)
X[:, 0] -= 9.3
X[:, 1] -= 2.5
np.random.seed(1)
pos_ids = np.where(y == 0)[0]
neg_ids = np.where(y == 1)[0]
ids1 = np.random.randint(0, len(pos_ids), 5)
ids2 = np.random.randint(0, len(neg_ids), 5)
# 正负样本各选了五个点
p1 = pos_ids[ids1]
p2 = neg_ids[ids2]
# generate labeled and unlabeled data
l_ids = np.concatenate((p1, p2))
u_ids = np.array(list(set(np.arange(X.shape[0])) - set(l_ids)))
l_data1, l_data2 = X[l_ids, 0].reshape(-1, 1), X[l_ids, 1].reshape(-1, 1)
u_data1, u_data2 = X[u_ids, 0].reshape(-1, 1), X[u_ids, 1].reshape(-1, 1)
labels = y[l_ids]
colors = np.array(list(islice(cycle(['#377eb8', '#ff7f00', '#4daf4a',
'#f781bf', '#a65628', '#984ea3',
'#999999', '#e41a1c', '#dede00']),
int(max(y) + 3))))
x = [-1.5, 0, 1.5]
my_xticks = [-2, 0, 2]
### parameters
# steps = 16
steps = 30
gamma = 3
### original fig
fig = plt.figure(figsize=(12, 12))
plt.subplots_adjust(bottom=.05, top=.9, left=.05, right=0.9)
ax = fig.add_subplot(141)
plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[y], s=4)
plt.scatter(X[p1, 0], X[p1, 1], marker='^', c='#0F0F0F', s=100)
plt.scatter(X[p2, 0], X[p2, 1], marker='*', c='#0F0F0F', s=100)
ax.set_xlabel('$x^{(1)}$')
ax.set_ylabel('$x^{(2)}$')
plt.xticks(x, my_xticks)
#### cotrain experiment
clfs = cotrain([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps)
score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
score = score1 + score2
pred_y = np.array([0 if s < 0 else 1 for s in score])
print('cotrain:', np.mean(pred_y == y))
ax = fig.add_subplot(142)
ax.set_xlabel('$x^{(1)}$')
ax.set_ylabel('$x^{(2)}$')
plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)
#### spaco experiment1 gamma=3
clfs = spaco([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps, gamma=3)
score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
score = score1 + score2
pred_y = np.array([0 if s < 0 else 1 for s in score])
print('spaco experiment(gamma=3): %0.5f' % np.mean(pred_y == y))
ax = fig.add_subplot(143)
ax.set_xlabel('$x^{(1)}$')
ax.set_ylabel('$x^{(2)}$')
plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)
plt.xticks(x, my_xticks)
#### spaco experiment2 gamma=0.3
clfs = spaco([l_data1, l_data2], labels, [u_data1, u_data2], iter_step=steps, gamma=0.3)
score1 = clfs[0].decision_function(X[:, 0].reshape(-1, 1))
score2 = clfs[1].decision_function(X[:, 1].reshape(-1, 1))
score = score1 + score2
pred_y = np.array([0 if s < 0 else 1 for s in score])
print('spaco experiment(gamma=0.3): %0.5f' % np.mean(pred_y == y))
ax = fig.add_subplot(144)
ax.set_xlabel('$x^{(1)}$')
ax.set_ylabel('$x^{(2)}$')
plt.scatter(X[:, 0], X[:, 1], marker='o', c=colors[pred_y], s=4)
plt.xticks(x, my_xticks)
plt.show()
if __name__ == '__main__':
main()
全文读下来,这是一篇很完整的工作,从上到下透露出严谨。本文的核心是将self-paced
,muti-view
,co-train
糅合在一起,用清晰的数学优化目标表达出来了,特别是Co-Regularization
的设计,简洁又漂亮。