Boosting(AdaBoost、GBDT)

Boosting

Boosting也是Ensemble Learning(集成学习)中重要的一类,和Bagging的并行式不同,Boosting的核心思想是按顺序去训练分类器,每一个都要尝试修正前面的分类。其中最具有代表性的是的是Adaboost(适应性提升, Adaptive Boosting)Gradient Boosting(梯度提升)
对于Boosting方法来说,有两个非常重要的问题:

1. 在每一轮如何改变训练数据的权值或概率分布,修改的策略是什么?
2. 如何将弱分类器组合成一个强分类器?


Adaboost

对于上面两个问题,AdaBoost的做法是:
1.提高那些被前一轮弱分类器错误分类的样本的权值,而降低那些被正确分类样本的权值。这样一来,那些被分错的数据,在下一轮就会得到更大的关注。所以,分类问题被一系列的弱分类器“分而治之”。
2. 对弱分类器的组合,AdaBoost采取加权多数表决的方法。即加大分类误差率小的弱分类器的权值,使其在表决中起较大作用,减小分类误差率大的弱分类器的权值,使其在表决中起较小的作用。

算法流程

1.初始化数据权值分布

假设训练数据集中有均匀的权值分布(只有初始是均匀,后面要按照误差率进行更新):
D 1 = ( w 11 , w 12 , . . . , w 1 N ) , w 1 i = 1 N , i = 1 , 2 , . . . , N D_1=(w_{11},w_{12},...,w_{1N}),w_{1i}=\frac{1}{N}, \quad i=1,2,...,N D1=(w11,w12,...,w1N)w1i=N1,i=1,2,...,N

2.基于权值分布 D m D_m Dm得到基本分类器

G m ( x ) : x → { − 1 , 1 } G_m(x):x\rightarrow\{-1,1\} Gm(x):x{1,1}

3.计算分类误差率

G m ( x ) G_m(x) Gm(x)在训练数据集上的分类误差率为(注意不要忘记数据权值):
e m = ∑ i = 1 N P ( G m ( x i ) ̸ = y i ) = ∑ i = 1 N w m i I ( G m ( x i ) ̸ = y i ) e_m=\sum_{i=1}^NP(G_m(x_i) \not= y_i)=\sum_{i=1}^Nw_{mi}I(G_m(x_i)\not=y_i) em=i=1NP(Gm(xi)̸=yi)=i=1NwmiI(Gm(xi)̸=yi)

4.计算 G m ( x ) G_m(x) Gm(x)的系数

a m = 1 2 log ⁡ 1 − e m e m a_m=\frac{1}{2}\log\frac{1-e_m}{e_m} am=21logem1em

5.更新数据集的权值分布

D m + 1 = ( w m + 1 , 1 , w m + 1 , 2 , . . . , w m + 1 , N ) D_{m+1}=(w_{m+1,1},w_{m+1,2},...,w_{m+1,N}) Dm+1=(wm+1,1,wm+1,2,...,wm+1,N)
w m + 1 , 1 = w m i Z m exp ⁡ ( − a m y i G m ( x i ) ) , i = 1 , 2 , . . . , N w_{m+1,1}=\frac{w_{mi}}{Z_m}\exp(-a_my_iG_m(x_i)), \quad i=1,2,...,N wm+1,1=Zmwmiexp(amyiGm(xi)),i=1,2,...,N

其中, Z m Z_m Zm是规范化因子,目的在于保证每次权值总和是1:
Z m = ∑ i = 1 N w m i e x p ( − a m y i G m ( x i ) ) Z_m=\sum_{i=1}^Nw_{mi}exp(-a_my_iG_m(x_i)) Zm=i=1Nwmiexp(amyiGm(xi))

最终得到分类器:
G ( x ) = s i g n ( f ( x ) ) = s i g n ( ∑ m = 1 M a m G m ( x ) ) G(x)=sign(f(x))=sign(\sum_{m=1}^Ma_mGm(x)) G(x)=sign(f(x))=sign(m=1MamGm(x))

算法实例

我们结合李航《统计学习方法》中的一个例子分析:
假设弱分类器由 x < v x<v x<v x > v x>v x>v产生,其阈值 v v v使该分类器在训练数据集上分类误差率最低。试用AdaBoost算法学习一个强分类器。注意y=1为正例,y=-1为反例。

序号 1 2 3 4 5 6 7 8 9 10
x 0 1 2 3 4 5 6 7 8 9
y 1 1 1 -1 -1 -1 1 1 1 -1
第一轮m=1:
1.初始化数据权值分布

D 1 = ( w 11 , w 12 , . . . , w 110 ) D_1=(w_{11},w_{12},...,w_{110}) D1=(w11,w12,...,w110)

w i 1 = 0.1 , i = 1 , 2 , . . . , 10 w_{i1}=0.1, \quad i=1,2,...,10 wi1=0.1,i=1,2,...,10

2.计算分类误差率

计算发现 G 1 ( x ) G_1(x) G1(x)在阈值 v = 2.5 v=2.5 v=2.5时在训练数据集上的分类误差率最低(序号7、8、9分类错,其他都正确),故误差率 e 1 = P ( G 1 ( x i ) ̸ = y i ) = 0.3 e_1=P(G_1(x_i)\not=y_i)=0.3 e1=P(G1(xi)̸=yi)=0.3.
所以基本分类器为:
G 1 ( x ) = { 1 , x < 2.5 − 1 , x > 2.5 G_1(x)=\left\{ \begin{aligned} 1, & & x<2.5 \\ -1, & & x>2.5\\ \end{aligned} \right. G1(x)={1,1,x<2.5x>2.5

3.计算 G 1 ( x ) G_1(x) G1(x)的系数

a 1 = 1 2 log ⁡ 1 − e 1 e 1 = 0.4236 a_1=\frac{1}{2}\log\frac{1-e_1}{e_1}=0.4236 a1=21loge11e1=0.4236

4.更新数据的权值分布

D 2 = ( w 21 , w 22 , . . . , w 210 ) Z 1 = 7 ∗ 0.1 ∗ e x p ( − 0.4236 ) + 3 ∗ 0.1 ∗ e x p ( 0.4236 ) = 0.91651 w 21 = w 1 i Z 1 exp ⁡ ( − a 1 y i G 1 ( x i ) ) , i = 1 , 2 , . . . , 10 D 2 = ( 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.16667 , 0.16667 , 0.16667 , 0.07143 ) f 1 ( x ) = 0.4236 G 1 ( x ) \begin {aligned} & D_2=(w_{21},w_{22},...,w_{210}) \\ & \\ & Z_1=7*0.1*exp(-0.4236)+3*0.1*exp(0.4236)= 0.91651\\ & \\ & w_{21}=\frac{w_{1i}}{Z_1}\exp(-a_1y_iG_1(x_i)), \quad i=1,2,...,10\\ & \\ & D_2=(0.07143,0.07143,0.07143,0.07143,0.07143,0.07143,0.16667,0.16667,0.16667,0.07143) \\ & \\ & f_1(x)=0.4236G_1(x) \end{aligned} D2=(w21,w22,...,w210)Z1=70.1exp(0.4236)+30.1exp(0.4236=0.91651w21=Z1w1iexp(a1yiG1(xi)),i=1,2,...,10D2=(0.07143,0.07143,0.07143,0.07143,0.07143,0.07143,0.16667,0.16667,0.16667,0.07143)f1(x)=0.4236G1(x)

分类器 s i g n [ f 1 ( x ) ] sign[f_1(x)] sign[f1(x)]在训练数据集上有3个误分类点, D 2 D_2 D2的变化中可以看出分类错误的点权值被加大了

第二轮m=2:
1.数据权值分布

D 2 = ( 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.07143 , 0.16667 , 0.16667 , 0.16667 , 0.07143 ) D_2=(0.07143,0.07143,0.07143,0.07143,0.07143,0.07143,0.16667,0.16667,0.16667,0.07143) D2=(0.07143,0.07143,0.07143,0.07143,0.07143,0.07143,0.16667,0.16667,0.16667,0.07143)

2.计算分类误差率

计算发现 G 2 ( x ) G_2(x) G2(x)在阈值 v = 8.5 v=8.5 v=8.5时在训练数据集上的分类误差率最低,故误差率 e 2 = P ( G 2 ( x i ) ̸ = y i ) = 0.2143 e_2=P(G_2(x_i)\not=y_i)=0.2143 e2=P(G2(xi)̸=yi)=0.2143.
所以基本分类器为:
G 2 ( x ) = { 1 , x < 8.5 − 1 , x > 8.5 G_2(x)=\left\{ \begin{aligned} 1, & & x<8.5 \\ -1, & & x>8.5\\ \end{aligned} \right. G2(x)={1,1,x<8.5x>8.5

3.计算 G 1 ( x ) G_1(x) G1(x)的系数

a 2 = 1 2 log ⁡ 1 − e 2 e 2 = 0.2143 a_2=\frac{1}{2}\log\frac{1-e_2}{e_2}=0.2143 a2=21loge21e2=0.2143

4.更新数据的权值分布

D 3 = ( 0.0455 , 0.0455 , 0.0455 , 0.01667 , 0.01667 , 0.01667 , 0.1060 , 0.1060 , 0.1060 , 0.0455 ) f 2 ( x ) = 0.4236 G 1 ( x ) + 0.6496 G 2 ( x ) \begin {aligned} & D_3=(0.0455,0.0455,0.0455,0.01667,0.01667,0.01667,0.1060,0.1060,0.1060,0.0455) \\ & \\ & f_2(x)=0.4236G_1(x)+0.6496G_2(x) \end{aligned} D3=(0.0455,0.0455,0.0455,0.01667,0.01667,0.01667,0.1060,0.1060,0.1060,0.0455)f2(x)=0.4236G1(x)+0.6496G2(x)

分类器 s i g n [ f 3 ( x ) ] sign[f_3(x)] sign[f3(x)]在训练数据集上有3个误分类点。

第三轮m=3:
1.数据权值分布

D 3 = ( 0.0455 , 0.0455 , 0.0455 , 0.01667 , 0.01667 , 0.01667 , 0.1060 , 0.1060 , 0.1060 , 0.0455 ) D_3=(0.0455,0.0455,0.0455,0.01667,0.01667,0.01667,0.1060,0.1060,0.1060,0.0455) D3=(0.0455,0.0455,0.0455,0.01667,0.01667,0.01667,0.1060,0.1060,0.1060,0.0455)

2.计算分类误差率

G 3 ( x ) G_3(x) G3(x)在阈值 v = 5.5 v=5.5 v=5.5时在训练数据集上的分类误差率最低,故误差率 e 3 = P ( G 3 ( x i ) ̸ = y i ) = 0.1820 e_3=P(G_3(x_i)\not=y_i)=0.1820 e3=P(G3(xi)̸=yi)=0.1820.
所以基本分类器为:
G 3 ( x ) = { 1 , x > 5.5 − 1 , x < 5.5 G_3(x)=\left\{ \begin{aligned} 1, & & x>5.5 \\ -1, & & x<5.5\\ \end{aligned} \right. G3(x)={1,1,x>5.5x<5.5

3.计算 G 1 ( x ) G_1(x) G1(x)的系数

a 3 = 1 2 log ⁡ 1 − e 3 e 3 = 0.7514 a_3=\frac{1}{2}\log\frac{1-e_3}{e_3}=0.7514 a3=21loge31e3=0.7514

4.更新数据的权值分布

D 4 = ( 0.125 , 0.125 , 0.125 , 0.102 , 0.102 , 0.102 , 0.065 , 0.065 , 0.065 , 0.125 ) f 3 ( x ) = 0.4236 G 1 ( x ) + 0.6496 G 2 ( x ) + 0.7514 G 3 ( x ) \begin {aligned} & D_4=(0.125,0.125,0.125,0.102,0.102,0.102,0.065,0.065,0.065,0.125) \\ & \\ & f_3(x)=0.4236G_1(x)+0.6496G_2(x)+0.7514G_3(x) \end{aligned} D4=(0.125,0.125,0.125,0.102,0.102,0.102,0.065,0.065,0.065,0.125)f3(x)=0.4236G1(x)+0.6496G2(x)+0.7514G3(x)

分类器 s i g n [ f 3 ( x ) ] sign[f_3(x)] sign[f3(x)]在训练数据集上的误分类点个数为0。


Gradient Boosting


Note

1.如何体现Adaboost中误分类的权值得以扩大,而被正确分类样本的权值却得以缩小?
w m + 1 , 1 = w m i Z m exp ⁡ ( − a m y i G m ( x i ) ) , i = 1 , 2 , . . . , N w_{m+1,1}=\frac{w_{mi}}{Z_m}\exp(-a_my_iG_m(x_i)), \quad i=1,2,...,N wm+1,1=Zmwmiexp(amyiGm(xi)),i=1,2,...,N

上式可以写为:
w m + 1 , 1 = { w m i Z m exp ⁡ ( − a m ) , G m ( x i ) = y i w m i Z m exp ⁡ ( a m ) , G m ( x i ) ̸ = y i w_{m+1,1}=\left\{ \begin{aligned} \frac{w_{mi}}{Z_m}\exp(-a_m),& & G_m(x_i)=y_i \\ \\ \frac{w_{mi}}{Z_m}\exp(a_m),& & G_m(x_i)\not=y_i \\ \end{aligned} \right. wm+1,1=Zmwmiexp(am),Zmwmiexp(am),Gm(xi)=yiGm(xi)̸=yi
因此在分类器分类正确时,权值会非常小。

2.如何将弱分类器组成强分类器?
弱分类器通过 α m \alpha_m αm进行组合, α m \alpha_m αm表示在最终分类器中的重要性,随着 e m e_m em的减小而增大。


2019.9.2 补充Note

你可能感兴趣的:(机器学习)