【论文阅读】Self-Paced Boost Learning for Classification

论文下载
bib:

@INPROCEEDINGS{PiLi2016SPBL,
  title		= {Self-Paced Boost Learning for Classification},
  author	= {Te Pi and Xi Li and Zhongfei Zhang and Deyu Meng and Fei Wu and Jun Xiao and Yueting Zhuang},
  booktitle	= {IJCAI},
  year		= {2016},
  pages     = {1932--1938}
}

GitHub

1. 摘要

Effectiveness and robustness are two essential aspects of supervised learning studies.

For effective learning, ensemble methods are developed to build a strong effective model from ensemble of weak models.

For robust learning, self-paced learning (SPL) is proposed to learn in a self-controlled pace from easy samples to complex ones.

Motivated by simultaneously enhancing the learning effectiveness and robustness, we propose a unified framework, Self-Paced Boost Learning (SPBL).

With an adaptive from-easy-to-hard pace in boosting process, SPBL asymptotically guides the model to focus more on the insufficiently learned samples with higher reliability.

Via a max-margin boosting optimization with self-paced sample selection, SPBL is capable of capturing the intrinsic inter-class discriminative patterns while ensuring the reliability of the samples involved in learning.

We formulate SPBL as a fully-corrective optimization for classification.

The experiments on several real-world datasets show the superiority of SPBL in terms of both effectiveness and robustness.

Note:

  1. Self-paced learning(自步学习,从容易到难的学习)和Boost(集成学习)融合在一起,同时保证有效性与鲁棒性。

2. 算法

问题:多分类问题
y ~ ( x ) = arg max ⁡ r ∈ { 1 , … , C } F r ( x ; Θ ) (1) \widetilde{y}(x) = \argmax_{r \in \{1, \dots, C\} }F_r(x; \Theta) \tag{1} y (x)=r{1,,C}argmaxFr(x;Θ)(1)

  • { ( x i , y i ) } i = 1 n \{(x_i, y_i)\}_{i=1}^n {(xi,yi)}i=1n 表示带标签的训练数据,其中又 n n n个带标签的样本。 x i ∈ R d x_i \in \mathbb{R}^d xiRd 是第 i i i个样本的特征, y i ∈ { 1 , … , C } y_i \in \{1, \dots, C\} yi{1,,C}表示第个样本的标签。
  • F r ( ⋅ ) : R d → R F_r(\cdot):\mathbb{R}^d \rightarrow \mathbb{R} Fr():RdR 表示将样本 x x x分类到类别 r r r的置信度得分。值得注意的是, 这里相当于将多分类问题转化为了 C C C个二分类问题,对应于OvA策略。优点是只用训练类别数目 C C C个分类器,缺点是,会出现类别不平衡的问题(A对应类别样本多)。
  • 最后的多分类预测则是预测样本对应最大评分的类。在实际操作中,可以理解为softmax操作后对应最大概率的类(threshold)。

boost:
boost是一种集成学习中的一个方法,目的是集成多个弱学习器成为一个强学习器。
F r ( x ; W ) = ∑ j = 1 k w r j h j ( x ) , r ∈ { 1 , … , C } (2) F_r(x;W) = \sum_{j=1}^k w_{rj}h_j(x), r \in \{1, \dots, C\} \tag{2} Fr(x;W)=j=1kwrjhj(x),r{1,,C}(2)

  • h j ( x ) : R d → { 0 , 1 } h_j (x) : \mathbb{R}^d \rightarrow \{0, 1\} hj(x):Rd{0,1},表示一个弱二分类器, w r j w_{rj} wrj学习器对应权重,是一个学习参数。
  • W = [ w 1 , … , w C ] ∈ R k × C W = [w_1, \dots, w_C ] \in \mathbb{R}^{k \times C} W=[w1,,wC]Rk×C with each w r = [ w r 1 , … , w r k ] T w_r = [w_{r1}, \dots, w_{r_k}]^{\mathsf{T}} wr=[wr1,,wrk]T.

general objective of SPBL:
min ⁡ W , v ∑ i = 1 n v i ∑ r = 1 C L ( ρ i r ) + ∑ i = 1 n g ( v i ; λ ) + υ R ( W ) s . t . ∀ i , r , ρ i , r = H i : w y i − H i : w r ; W ≥ 0 ; v ∈ [ 0 , 1 ] n (3) \min_{W, v}\sum^{n}_{i=1}v_i\sum^{C}_{r=1}L(\rho_{ir}) + \sum^{n}_{i=1}g(v_i;\lambda) + \upsilon R(W) s.t. \forall i,r, \rho_{i,r} = H_{i:}w_{y_i} - H_{i:}w_{r}; W \geq 0; v \in [0, 1]^n \tag{3} W,vmini=1nvir=1CL(ρir)+i=1ng(vi;λ)+υR(W)s.t.∀i,r,ρi,r=Hi:wyiHi:wr;W0;v[0,1]n(3).

  • H ∈ R n × k H \in \mathbb{R}^{n \times k} HRn×k with each item H i j = h j ( x i ) H_{ij} = h_j(x_i) Hij=hj(xi).
  • H i : w y i = H i : × w y i , w y i = [ w y i 1 , … , w y i k ] T H_{i:}w_{y_i} = H_{i:} \times w_{y_i}, w_{y_i} = [w_{y_i1}, \dots, w_{y_ik}]^{\mathsf{T}} Hi:wyi=Hi:×wyi,wyi=[wyi1,,wyik]T.

specific formulation:
min ⁡ W , v ∑ i , r v i ln ⁡ ( 1 + exp ⁡ ( − ρ i r ) ) + ∑ i = 1 n g ( v i ; λ ) + υ ∥ W ∥ 2 , 1 \min_{W, v}\sum_{i, r}v_i \ln(1+ \exp(-\rho_{ir})) + \sum^{n}_{i=1}g(v_i;\lambda) + \upsilon \|W\|_{2, 1} W,vmini,rviln(1+exp(ρir))+i=1ng(vi;λ)+υW2,1
s.t. ∀ i , r , ρ i , r = H i : w y i − H i : w r ; W ≥ 0 ; v ∈ [ 0 , 1 ] n (3) \text{s.t.} \forall i,r, \rho_{i,r} = H_{i:}w_{y_i} - H_{i:}w_{r}; W \geq 0; v \in [0, 1]^n \tag{3} s.t.i,r,ρi,r=Hi:wyiHi:wr;W0;v[0,1]n(3)

  • ∥ W ∥ 2 , 1 ∥ = ∑ j = 1 k ∥ W j : ∥ 2 \|W\|_{2, 1}\| = \sum_{j=1}^k \|W_{j:}\|_2 W2,1=j=1kWj:2,鼓励矩阵行列都稀疏。
  • the logistic loss. 我的理解该损失就是简单的对差值求 exp ⁡ \exp exp。区别在于现有的是二分类的概率,概率值是由 sigmod = 1 1 + e − x \text{sigmod} = \frac{1}{1+ e^{-x}} sigmod=1+ex1计算的,即 ln ⁡ ( sigmod ) = − ln ⁡ ( 1 + exp ⁡ ( − x ) ) \ln{(\text{sigmod})} = -\ln(1+ \exp(-x)) ln(sigmod)=ln(1+exp(x))

3. 总结

关于优化目标的求解,涉及到了对偶问题(dual problem),实在是懂不了了。

你可能感兴趣的:(论文阅读,论文阅读)