费雪的线性判别分析(2)

《费雪的线性判别分析》分为两部分,这是第二部分,第一部分的连接如下:

  • 费雪的线性判别分析(1)

3. 计算判别阈值

如果要判别某个样本属于哪一类,必须计算出阈值 w 0 w_0 w0 ,求解方法有两种:

  1. 贝叶斯方法。此方法在另外一篇《线性判别分析》中详解
  2. 最小二乘法。此处演示此方法的求解过程

3.1 最小二乘法 [ 6 ] ^{[6]} [6]

关于最小二乘法的详细讲解,请阅读参考资料 [2] 的有关章节,在其中对最小二乘法通过多个角度给予了理论和应用的介绍。

将两个类别的线性边界写作:

g ( x i ) = w T x i + w 0 (42) g(\pmb{x}_i)=\pmb{w}^\text{T}\pmb{x}_i+w_0\tag{42} g(xi)=wTxi+w0(42)

相应的最小平方误差函数:

E = 1 2 ∑ i = 1 n ( g ( x i ) − r i ) 2 = 1 2 ∑ i = 1 n ( w T x i + w 0 − r i ) 2 (43) E=\frac{1}{2}\sum_{i=1}^n\left(g(\pmb{x}_i)-r_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(\pmb{w}^\text{T}\pmb{x}_i+w_0-r_i\right)^2\tag{43} E=21i=1n(g(xi)ri)2=21i=1n(wTxi+w0ri)2(43)

其中, r i r_i ri 是样本数据的类别标签,即样本类别的真实值。若 i ∈ C 1 i\in C_1 iC1 ,则 r i r_i ri 为正例,否则为负例,不妨分别假设两个类别的标签分别是:

r i , i ∈ C 1 = n n 1 , r i , i ∈ C 2 = − n n 2 (43-2) r_{i,i\in C_1}=\frac{n}{n_1},r_{i,i\in C_2}=-\frac{n}{n_2}\tag{43-2} ri,iC1=n1n,ri,iC2=n2n(43-2)

将(43)式分别对 w 0 w_0 w0 w \pmb{w} w 求导:

∂ E ∂ w 0 = ∑ i = 1 n ( w T x i + w 0 − r i ) (44) \frac{\partial E}{\partial w_0}=\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i+w_0-r_i)\tag{44} w0E=i=1n(wTxi+w0ri)(44)

∂ E ∂ w = ∑ i = 1 n ( w T x i + w 0 − r i ) x i (45) \frac{\partial E}{\partial\pmb{w}}=\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i+w_0-r_i)\pmb{x}_i\tag{45} wE=i=1n(wTxi+w0ri)xi(45)

令(44)式为零,即:

∂ E ∂ w 0 = ∑ i = 1 n ( w T x i + w 0 − r i ) = 0 ∑ i = 1 n ( w T x i ) + ∑ i = 1 n w 0 − ∑ i = 1 n r i = 0 ∑ i = 1 n ( w T x i ) + n w 0 − ∑ i = 1 n r i = 0 (46) \begin{split} &\frac{\partial E}{\partial w_0}=\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i+w_0-r_i)=0 \\ &\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i)+\sum_{i=1}^nw_0-\sum_{i=1}^nr_i=0 \\ &\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i)+nw_0-\sum_{i=1}^nr_i=0 \end{split}\tag{46} w0E=i=1n(wTxi+w0ri)=0i=1n(wTxi)+i=1nw0i=1nri=0i=1n(wTxi)+nw0i=1nri=0(46)

所以:

w 0 = − 1 n ∑ i = 1 n ( w T x i ) + 1 n ∑ i = 1 n r i = − w T ( 1 n ∑ i = 1 n x i ) + 1 n ∑ i = 1 n r i (47) \begin{split} w_0 &= -\frac{1}{n}\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i)+\frac{1}{n}\sum_{i=1}^nr_i \\ &=-\pmb{w}^\text{T}\left(\frac{1}{n}\sum_{i=1}^n\pmb{x}_i\right)+\frac{1}{n}\sum_{i=1}^nr_i \end{split}\tag{47} w0=n1i=1n(wTxi)+n1i=1nri=wT(n1i=1nxi)+n1i=1nri(47)

其中:

  • 1 n ∑ i = 1 n x i \frac{1}{n}\sum\limits_{i=1}^n\pmb{x}_i n1i=1nxi 是样本平均值(向量),记作 m \pmb{m} m
  • 前述(43-2)式 r i , i ∈ C 1 = n n 1 , r i , i ∈ C 2 = − n n 2 r_{i,i\in C_1}=\frac{n}{n_1},r_{i,i\in C_2}=-\frac{n}{n_2} ri,iC1=n1n,ri,iC2=n2n ,则 1 n ∑ i = 1 n r i = 1 n ( n 1 n n 1 − n 2 n n 2 ) = 0 \frac{1}{n}\sum\limits_{i=1}^nr_i=\frac{1}{n}(n_1\frac{n}{n_1}-n_2\frac{n}{n_2})=0 n1i=1nri=n1(n1n1nn2n2n)=0

所以,(47)最终得:

w 0 = − w T m (48) w_0=-\pmb{w}^\text{T}\pmb{m}\tag{48} w0=wTm(48)

而对于 w \pmb{w} w ,在前述最优化求解中,已经得到: w ∝ S W − 1 ( m 2 − m 1 ) \pmb{w}\propto\pmb{S}^{-1}_W(\pmb{m}_2-\pmb{m}_1) wSW1(m2m1) ,(如(41)式),又因为 w \pmb{w} w 的表示的是方向(单位向量),或者说直线的长度不影响边界,故可直接令:

w = S W − 1 ( m 2 − m 1 ) (49) \pmb{w}=\pmb{S}^{-1}_W(\pmb{m}_2-\pmb{m}_1)\tag{49} w=SW1(m2m1)(49)

于是,可以用:

w T x ≤ w T m (50) \pmb{w}^{\text{T}}\pmb{x}\le\pmb{w}^{\text{T}}\pmb{m}\tag{50} wTxwTm(50)

作为判别标准。

3.2 检验

x = m 1 \pmb{x}=\pmb{m}_1 x=m1 ,很显然,此样本属于 C 1 C_1 C1 类,利用(50)式对此结论进行检验:

w T x − w T m = w T ( x − m ) = w T ( m 1 − m ) = ( S W − 1 ( m 2 − m 1 ) ) T ( m 1 − m ) ( 将( 49 )式代入 ) = ( m 2 − m 1 ) T ( S W − 1 ) T ( m 1 − m ) (51) \begin{split} \pmb{w}^{\text{T}}\pmb{x}-\pmb{w}^{\text{T}}\pmb{m}&=\pmb{w}^{\text{T}}(\pmb{x}-\pmb{m})=\pmb{w}^{\text{T}}(\pmb{m}_1-\pmb{m}) \\ &=(\pmb{S}^{-1}_W(\pmb{m}_2-\pmb{m}_1))^\text{T}(\pmb{m}_1-\pmb{m})\quad(将(49)式代入) \\ &=(\pmb{m}_2-\pmb{m}_1)^\text{T}(\pmb{S}^{-1}_W)^\text{T}(\pmb{m}_1-\pmb{m}) \end{split}\tag{51} wTxwTm=wT(xm)=wT(m1m)=(SW1(m2m1))T(m1m)(将(49)式代入)=(m2m1)T(SW1)T(m1m)(51)

  • 由(18)式知: S W T = S W ⟹ ( S W − 1 ) T = S W − 1 \pmb{S}_W^{\text{T}}=\pmb{S}_W\Longrightarrow(\pmb{S}_W^{-1})^\text{T}=\pmb{S}_W^{-1} SWT=SW(SW1)T=SW1
  • m = 1 n ∑ i = 1 n x i = 1 n ( n 1 m 1 + n 2 m 2 ) \pmb{m}=\frac{1}{n}\sum\limits_{i=1}^n\pmb{x}_i=\frac{1}{n}(n_1\pmb{m}_1+n_2\pmb{m}_2) m=n1i=1nxi=n1(n1m1+n2m2)

于是,(51)式继续计算如下:

w T x − w T m = ( m 2 − m 1 ) T S W − 1 ( m 1 − 1 n ( n 1 m 1 + n 2 m 2 ) ) = ( m 2 − m 1 ) T S W − 1 ( n 2 n m 1 − n 2 n m 2 ) = − n 2 n ( m 2 − m 1 ) T S W − 1 ( m 2 − m 1 ) < 0 (52) \begin{split} \pmb{w}^{\text{T}}\pmb{x}-\pmb{w}^{\text{T}}\pmb{m} &=(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{S}^{-1}_W(\pmb{m}_1-\frac{1}{n}(n_1\pmb{m}_1+n_2\pmb{m}_2)) \\&=(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{S}^{-1}_W(\frac{n_2}{n}\pmb{m}_1-\frac{n_2}{n}\pmb{m}_2) \\&=-\frac{n_2}{n}(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{S}^{-1}_W(\pmb{m}_2-\pmb{m}_1)\lt0 \end{split}\tag{52} wTxwTm=(m2m1)TSW1(m1n1(n1m1+n2m2))=(m2m1)TSW1(nn2m1nn2m2)=nn2(m2m1)TSW1(m2m1)<0(52)

其中的 S W − 1 \pmb{S}_W^{-1} SW1 (半)正定。

检验成功。

3.3 用最小二乘法计算 w \pmb{w} w

用最小二乘法能够计算出 w 0 \pmb{w}_0 w0 ,也可以计算 w \pmb{w} w 。前面已经计算过了 w \pmb{w} w 的方向,这里再用最小二乘法计算,作为对此方法的深入理解。

在(45)式中,得到了 ∂ E ∂ w \frac{\partial E}{\partial\pmb{w}} wE ,令它等于零,则可以计算 w \pmb{w} w ,但是步骤比较繁琐,以下仅供参考。

由(17)式可得:

S j = ∑ i ∈ C j ( x i − m j ) ( x i − m j ) T , ( j = 1 , 2 ) = ∑ i ∈ C j x x i T − m j ∑ i ∈ C j x i T − ∑ i ∈ C j x i m j T + n j m j m j T = ∑ i ∈ C j x x i T − m j ( n j m j ) − ( n j m j ) m j T + n j m j m j T = ∑ i ∈ C j x x i T − m j ( n j m j ) (53) \begin{split} \pmb{S}_j&=\sum_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{x}_i-\pmb{m}_j)^\text{T},\quad(j=1,2) \\ &=\sum_{i\in C_j}\pmb{x}\pmb{x}_i^\text{T}-\pmb{m}_j\sum_{i\in C_j}\pmb{x}_i^\text{T}-\sum_{i\in C_j}\pmb{x}_i\pmb{m}_j^\text{T}+n_j\pmb{m}_j\pmb{m}_j^\text{T} \\ &=\sum_{i\in C_j}\pmb{x}\pmb{x}_i^\text{T}-\pmb{m}_j(n_j\pmb{m}_j)-(n_j\pmb{m}_j)\pmb{m}_j^\text{T}+n_j\pmb{m}_j\pmb{m}_j^\text{T} \\ &=\sum_{i\in C_j}\pmb{x}\pmb{x}_i^\text{T}-\pmb{m}_j(n_j\pmb{m}_j) \end{split}\tag{53} Sj=iCj(ximj)(ximj)T,(j=1,2)=iCjxxiTmjiCjxiTiCjximjT+njmjmjT=iCjxxiTmj(njmj)(njmj)mjT+njmjmjT=iCjxxiTmj(njmj)(53)

所以:

S W = S 1 + S 2 = ∑ i = 1 n x x i T − n 1 m 1 m 1 − n 2 m 2 m 2 (54) \pmb{S}_W=\pmb{S}_1+\pmb{S}_2=\sum_{i=1}^n\pmb{x}\pmb{x}_i^\text{T}-n_1\pmb{m}_1\pmb{m}_1-n_2\pmb{m}_2\pmb{m}_2\tag{54} SW=S1+S2=i=1nxxiTn1m1m1n2m2m2(54)

令(45)式的 ∂ E ∂ w = 0 \frac{\partial E}{\partial\pmb{w}}=0 wE=0 ,即:

∑ i = 1 n ( w T x i + w 0 − r i ) x i = 0 ∑ i = 1 n ( w T x i − w T m − r i ) x i = 0 ( 将( 48 )式代入 ) ∑ i = 1 n ( w T x i − w T m ) x i = ∑ i = 1 n r i x i ∑ i = 1 n x i ( x i T − m T ) w = ∑ i = 1 n r i x i (55) \begin{split} &\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i+w_0-r_i)\pmb{x}_i=0 \\ &\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i-\pmb{w}^\text{T}\pmb{m}-r_i)\pmb{x}_i=0\quad(将(48)式代入) \\ &\sum_{i=1}^n(\pmb{w}^\text{T}\pmb{x}_i-\pmb{w}^\text{T}\pmb{m})\pmb{x}_i=\sum_{i=1}^nr_i\pmb{x}_i \\ &\sum_{i=1}^n\pmb{x}_i(\pmb{x}_i^\text{T}-\pmb{m}^\text{T})\pmb{w}=\sum_{i=1}^nr_i\pmb{x}_i \end{split}\tag{55} i=1n(wTxi+w0ri)xi=0i=1n(wTxiwTmri)xi=0(将(48)式代入)i=1n(wTxiwTm)xi=i=1nrixii=1nxi(xiTmT)w=i=1nrixi(55)

下面对上式等号左右两边分别计算:

左边 = ∑ i = 1 n x i ( x i T − m T ) w = ( ∑ i = 1 n x i x i T − ∑ i = 1 n x i m T ) w = ( ∑ i = 1 n x i x i T − n m m T ) w = ( ∑ i = 1 n x i x i T − 1 n ( n 1 m 1 + n 2 m 2 ) ( n 1 m 1 + n 2 m 2 ) T ) w = ( ∑ i = 1 n x i x i T − n 1 m 1 m 1 T − n 2 m 2 m 2 T + n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T ) w = ( S W + n 1 n 2 n S B ) w ( 代入( 54 )式和( 19 )式 ) 右边 = ∑ i = 1 n r i x i = n n 1 ∑ i ∈ C 1 x i − n n 2 ∑ i ∈ C 2 x i = n ( m 1 − m 2 ) ∴ ( S W + n 1 n 2 n S B ) w = n ( m 1 − m 2 ) S W w + n 1 n 2 n S B w = n ( m 1 − m 2 ) S W w = − n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T w + n ( m 1 − m 2 ) = ( − n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T w − n ) ( m 2 − m 1 ) ∴ w = S W − 1 ( − n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T w − n ) ( m 2 − m 1 ) (56) \begin{split} 左边&=\sum_{i=1}^n\pmb{x}_i(\pmb{x}_i^\text{T}-\pmb{m}^\text{T})\pmb{w} \\&=\left(\sum_{i=1}^n\pmb{x}_i\pmb{x}_i^\text{T}-\sum_{i=1}^n\pmb{x}_i\pmb{m}^\text{T}\right)\pmb{w} \\&=\left(\sum_{i=1}^n\pmb{x}_i\pmb{x}_i^\text{T}-n\pmb{mm}^\text{T}\right)\pmb{w} \\&=\left(\sum_{i=1}^n\pmb{x}_i\pmb{x}_i^\text{T}-\frac{1}{n}(n_1\pmb{m}_1+n_2\pmb{m}_2)(n_1\pmb{m}_1+n_2\pmb{m}_2)^\text{T}\right)\pmb{w} \\&=\left(\sum_{i=1}^n\pmb{x}_i\pmb{x}_i^\text{T}-n_1\pmb{m}_1\pmb{m}_1^\text{T}-n_2\pmb{m}_2\pmb{m}_2^\text{T}+\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T}\right)\pmb{w} \\&=\left(\pmb{S}_W+\frac{n_1n_2}{n}\pmb{S}_B\right)\pmb{w}\quad(代入(54)式和(19)式) \\ 右边&=\sum_{i=1}^nr_i\pmb{x}_i \\&=\frac{n}{n_1}\sum_{i\in C_1}\pmb{x}_i-\frac{n}{n_2}\sum_{i\in C_2}\pmb{x}_i \\&=n(\pmb{m}_1-\pmb{m}_2) \\\\ \therefore&\quad \left(\pmb{S}_W+\frac{n_1n_2}{n}\pmb{S}_B\right)\pmb{w}=n(\pmb{m}_1-\pmb{m}_2) \\ &\pmb{S}_W\pmb{w}+\frac{n_1n_2}{n}\pmb{S}_B\pmb{w}=n(\pmb{m}_1-\pmb{m}_2) \\ \pmb{S}_W\pmb{w}&=-\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{w}+n(\pmb{m}_1-\pmb{m}_2) \\&=\left(-\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{w}-n\right)(\pmb{m}_2-\pmb{m}_1) \\\\ \therefore\quad\pmb{w}&=\pmb{S}^{-1}_W\left(-\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{w}-n\right)(\pmb{m}_2-\pmb{m}_1) \end{split}\tag{56} 左边右边SWww=i=1nxi(xiTmT)w=(i=1nxixiTi=1nximT)w=(i=1nxixiTnmmT)w=(i=1nxixiTn1(n1m1+n2m2)(n1m1+n2m2)T)w=(i=1nxixiTn1m1m1Tn2m2m2T+nn1n2(m2m1)(m2m1)T)w=(SW+nn1n2SB)w(代入(54)式和(19)式)=i=1nrixi=n1niC1xin2niC2xi=n(m1m2)(SW+nn1n2SB)w=n(m1m2)SWw+nn1n2SBw=n(m1m2)=nn1n2(m2m1)(m2m1)Tw+n(m1m2)=(nn1n2(m2m1)(m2m1)Twn)(m2m1)=SW1(nn1n2(m2m1)(m2m1)Twn)(m2m1)(56)

因为 ( − n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T w − n ) \left(-\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T}\pmb{w}-n\right) (nn1n2(m2m1)(m2m1)Twn) 是数量(标量),所以:

w ∝ S W − 1 ( m 2 − m 1 ) (57) \pmb{w}\propto\pmb{S}^{-1}_W(\pmb{m}_2-\pmb{m}_1)\tag{57} wSW1(m2m1)(57)

4. 多类别的判别分析 [ 6 ] ^{[6]} [6]

前面讨论的问题是基于 2.1 节中数据假设,即二分类问题,如果将上述两个类别下的类内散度和类间散度矩阵推广到多类别,就可以实现多类别的判别分析。

4.1 多类别的类内散度矩阵

(18)式定义了 x x x 空间两个类别的类内散度矩阵,将其定义方式可以直接推广到多类别的类内散度矩阵:

S W = ∑ j = 1 k S j (58) \pmb{S}_W=\sum_{j=1}^k\pmb{S}_j\tag{58} SW=j=1kSj(58)

其中:

  • S j = ∑ i ∈ C j ( x i − m j ) ( x i − m j ) T , j = 1 , 2 , ⋯   , k \pmb{S}_j=\sum\limits_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{x}_i-\pmb{m}_j)^\text{T},\quad{j=1,2,\cdots,k} Sj=iCj(ximj)(ximj)T,j=1,2,,k
  • m j = 1 n j ∑ i ∈ C j x i j = 1 , ⋯   , k \pmb{m}_j=\frac{1}{n_j}\sum\limits_{i\in C_j}\pmb{x}_i\quad{j=1,\cdots,k} mj=nj1iCjxij=1,,k ,共计有 k k k 个类别。
  • n = n 1 + ⋯ + n k n=n_1+\cdots+n_k n=n1++nk 表示总样本数等于每个类别样本数的和。

4.2 多类别的类间散度矩阵

多类别的类间散度矩阵,不能由(19)式直接推广。

m \pmb{m} m 表示 x x x 空间的全体样本的平均数(向量),即:
m = 1 n ∑ i = 1 n x i = 1 n ∑ j = 1 k ∑ i ∈ C j x i = 1 n ∑ j = 1 k ( n j m j ) (59) \pmb{m}=\frac{1}{n}\sum\limits_{i=1}^n\pmb{x}_i=\frac{1}{n}\sum\limits_{j=1}^k\sum_{i\in C_j}\pmb{x}_i=\frac{1}{n}\sum\limits_{j=1}^k(n_j\pmb{m}_j)\tag{59} m=n1i=1nxi=n1j=1kiCjxi=n1j=1k(njmj)(59)

对于所有的样本,仿照样本(有偏)方差的定义,可以定义针对 X X X 空间的所有样本的 Total scatter matrix(参考资料 [1] 中称为“全局散度矩阵”。愚以为,由于此概念是针对当前数据集所有样本而言——依然是抽样所得,如果用“全局”一词,容易引起与“总体”的错误联系,其真正含义是:本数据集所有样本散度矩阵):

S T = ∑ i = 1 n ( x i − m ) ( x i − m ) T (60) \pmb{S}_T=\sum_{i=1}^n(\pmb{x}_i-\pmb{m})(\pmb{x}_i-\pmb{m})^\text{T}\tag{60} ST=i=1n(xim)(xim)T(60)

将(60)式进一步写成:

S T = ∑ i = j k ∑ i ∈ C j ( x i − m j + m j − m ) ( x i − m j + m j − m ) T = ∑ j = 1 k ∑ i ∈ C j ( x i − m j ) ( x i − m j ) T + ∑ j = 1 k ∑ i ∈ C j ( m j − m ) ( m j − m ) T + ∑ j = 1 k ∑ i ∈ C j ( x i − m j ) ( m j − m ) T + ∑ j = 1 k ( m j − m ) ∑ i ∈ C j ( x i − m j ) T (61) \begin{split} \pmb{S}_T&=\sum_{i=j}^k\sum_{i\in C_j}(\pmb{x}_i-\pmb{m}_j+\pmb{m}_j-\pmb{m})(\pmb{x}_i-\pmb{m}_j+\pmb{m}_j-\pmb{m})^\text{T} \\&=\sum_{j=1}^k\sum_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{x}_i-\pmb{m}_j)^\text{T}+\sum_{j=1}^k\sum_{i\in C_j}(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T}\\&+\sum_{j=1}^k\sum_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{m}_j-\pmb{m})^\text{T}+\sum_{j=1}^k(\pmb{m}_j-\pmb{m})\sum_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)^\text{T} \end{split}\tag{61} ST=i=jkiCj(ximj+mjm)(ximj+mjm)T=j=1kiCj(ximj)(ximj)T+j=1kiCj(mjm)(mjm)T+j=1kiCj(ximj)(mjm)T+j=1k(mjm)iCj(ximj)T(61)

因为:

  • 由(58)式可得: ∑ j = 1 k ∑ i ∈ C j ( x i − m j ) ( x i − m j ) T = S W \sum\limits_{j=1}^k\sum\limits_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{x}_i-\pmb{m}_j)^\text{T}=\pmb{S}_W j=1kiCj(ximj)(ximj)T=SW
  • ∑ j = 1 k ∑ i ∈ C j ( m j − m ) ( m j − m ) T = ∑ j = 1 k n j ( m j − m ) ( m j − m ) T \sum\limits_{j=1}^k\sum\limits_{i\in C_j}(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T}=\sum\limits_{j=1}^kn_j(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T} j=1kiCj(mjm)(mjm)T=j=1knj(mjm)(mjm)T
  • 因为 ∑ i ∈ C j ( x i − m j ) = 0 \sum\limits_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)=0 iCj(ximj)=0 (每个样本与平均数的差求和,结果为 0 ),故得: ∑ j = 1 k ∑ i ∈ C j ( x i − m j ) ( m j − m ) T = ∑ j = 1 k ( m j − m ) ∑ i ∈ C j ( x i − m j ) T = 0 \sum\limits_{j=1}^k\sum\limits_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)(\pmb{m}_j-\pmb{m})^\text{T}=\sum\limits_{j=1}^k(\pmb{m}_j-\pmb{m})\sum\limits_{i\in C_j}(\pmb{x}_i-\pmb{m}_j)^\text{T}=0 j=1kiCj(ximj)(mjm)T=j=1k(mjm)iCj(ximj)T=0

所以,(61)式为:

S T = S W + ∑ j = 1 k n j ( m j − m ) ( m j − m ) T (62) \begin{split} \pmb{S}_T &= \pmb{S}_W+\sum_{j=1}^kn_j(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T} \end{split}\tag{62} ST=SW+j=1knj(mjm)(mjm)T(62)

令:

S B = ∑ j = 1 k n j ( m j − m ) ( m j − m ) T (63) \pmb{S}_B=\sum_{j=1}^kn_j(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T}\tag{63} SB=j=1knj(mjm)(mjm)T(63)

即为多类别的类间散度矩阵

对于(63)式,如果 k = 2 k=2 k=2 ,即为二类别下的类间散度矩阵:

S B 2 = n 1 ( m 1 − m ) ( m 1 − m ) T + n 2 ( m 2 − m ) ( m 2 − m ) T ∵ m 1 − m = m 1 − 1 n ( n 1 m 1 + n 2 m 2 ) = n 2 n ( m 1 − m 2 ) m 2 − m = m 2 − 1 n ( n 1 m 1 + n 2 m 2 ) = n 1 n ( m 1 − m 2 ) ∴ S B 2 = n 1 n 2 n ( m 2 − m 1 ) ( m 2 − m 1 ) T (64) \begin{split} \pmb{S}_{B_2}&=n_1(\pmb{m}_1-\pmb{m})(\pmb{m}_1-\pmb{m})^\text{T}+n_2(\pmb{m}_2-\pmb{m})(\pmb{m}_2-\pmb{m})^\text{T} \\ \because&\quad\pmb{m}_1-\pmb{m}=\pmb{m}_1-\frac{1}{n}(n_1\pmb{m}_1+n_2\pmb{m}_2)=\frac{n_2}{n}(\pmb{m}_1-\pmb{m}_2) \\ &\quad\pmb{m}_2-\pmb{m}=\pmb{m}_2-\frac{1}{n}(n_1\pmb{m}_1+n_2\pmb{m}_2)=\frac{n_1}{n}(\pmb{m}_1-\pmb{m}_2) \\ \therefore&\quad\pmb{S}_{B_2}=\frac{n_1n_2}{n}(\pmb{m}_2-\pmb{m}_1)(\pmb{m}_2-\pmb{m}_1)^\text{T} \end{split}\tag{64} SB2=n1(m1m)(m1m)T+n2(m2m)(m2m)Tm1m=m1n1(n1m1+n2m2)=nn2(m1m2)m2m=m2n1(n1m1+n2m2)=nn1(m1m2)SB2=nn1n2(m2m1)(m2m1)T(64)

(64)式最终得到的二类别下的类间散度矩阵和(19)式相比,多了系数 n 1 n 2 n \frac{n_1n_2}{n} nn1n2 ,因为它是常数,不会影响对 J ( w ) J(\pmb{w}) J(w) 最大化求解。

4.3 多类别样本下的费雪准则

假设由 q q q 个单位向量 w 1 , ⋯   , w q \pmb{w}_1,\cdots,\pmb{w}_q w1,,wq 作为 X X X 空间样本数据 x ∈ R d \pmb{x}\in\mathbb{R}^d xRd 投影的超平面(直线)方向,得到:

y l = w l T x , ( l = 1 , ⋯   , q ) (65) y_l = \pmb{w}_l^\text{T}\pmb{x},\quad(l=1,\cdots,q)\tag{65} yl=wlTx,(l=1,,q)(65)

写成矩阵形式:

y = [ y 1 ⋮ y q ] = [ w 1 T x ⋮ w q T x ] = [ w 1 ⋯ w q ] T x = W T x (66) \pmb{y}=\begin{bmatrix}y_1\\\vdots\\y_q\end{bmatrix}=\begin{bmatrix}\pmb{w}_1^\text{T}\pmb{x}\\\vdots\\\pmb{w}_q^\text{T}\pmb{x}\end{bmatrix}=\begin{bmatrix}\pmb{w}_1&\cdots&\pmb{w}_q\end{bmatrix}^\text{T}\pmb{x}=\pmb{W}^\text{T}\pmb{x}\tag{66} y= y1yq = w1TxwqTx =[w1wq]Tx=WTx(66)

其中 W = [ w 1 ⋯ w q ] \pmb{W}=\begin{bmatrix}\pmb{w}_1&\cdots&\pmb{w}_q\end{bmatrix} W=[w1wq] d × q d\times q d×q 矩阵。

由此,得到了 X X X 空间的样本 x 1 , ⋯   , x n \pmb{x}_1,\cdots,\pmb{x}_n x1,,xn 投影到 w l , ( 1 ≤ l ≤ q ) \pmb{w}_l,(1\le l\le q) wl,(1lq) 上的投影,即得到 Y Y Y 空间的数据 y i , ( i = 1 , ⋯   , n ) \pmb{y}_i,(i=1,\cdots,n) yi,(i=1,,n)

y i = W T x i , ( i = 1 , ⋯   , n ) (67) \pmb{y}_i=\pmb{W}^\text{T}\pmb{x}_i,\quad(i=1,\cdots,n)\tag{67} yi=WTxi,(i=1,,n)(67)

仿照 X X X 空间计算平均值(向量)的方法,计算 Y Y Y 空间投影数据的平均值:

m ^ j = 1 n j ∑ i ∈ C j y i = 1 n j ∑ i ∈ C j W T x i = W T m j , ( j = 1 , ⋯   , k ) (68) \hat{\pmb{m}}_j=\frac{1}{n_j}\sum_{i\in C_j}\pmb{y}_i=\frac{1}{n_j}\sum_{i\in C_j}\pmb{W}^\text{T}\pmb{x}_i=\pmb{W}^\text{T}\pmb{m}_j, \quad(j=1,\cdots,k)\tag{68} m^j=nj1iCjyi=nj1iCjWTxi=WTmj,(j=1,,k)(68)

m ^ = 1 n ∑ j = 1 k n j m ^ j = 1 n ∑ j = 1 k n j W T m j = W T m (69) \hat{\pmb{m}}=\frac{1}{n}\sum_{j=1}^kn_j\hat{\pmb{m}}_j=\frac{1}{n}\sum_{j=1}^kn_j\pmb{W}^\text{T}\pmb{m}_j=\pmb{W}^\text{T}\pmb{m}\tag{69} m^=n1j=1knjm^j=n1j=1knjWTmj=WTm(69)

从而定义 Y Y Y 空间的类内散度矩阵和类间散度矩阵

S ^ W = ∑ j = 1 k ∑ i ∈ C j ( y i − m ^ j ) ( y i − m ^ j ) T (70) \hat{\pmb{S}}_W=\sum_{j=1}^k\sum_{i\in C_j}(\pmb{y}_i-\hat{\pmb{m}}_j)(\pmb{y}_i-\hat{\pmb{m}}_j)^\text{T}\tag{70} S^W=j=1kiCj(yim^j)(yim^j)T(70)

S ^ B = ∑ j = 1 k n j ( m ^ j − m ^ ) ( m ^ j − m ^ ) T (71) \hat{\pmb{S}}_B=\sum_{j=1}^kn_j(\hat{\pmb{m}}_j-\hat{\pmb{m}})({\hat{\pmb{m}}}_j-\hat{\pmb{m}})^\text{T}\tag{71} S^B=j=1knj(m^jm^)(m^jm^)T(71)

将(67)(68)代入到(70),得到:

S ^ W = ∑ j = 1 k ∑ i ∈ C k ( W T x i − W T m j ) ( W T x i − W T m j ) T = ∑ j = 1 k ∑ i ∈ C k W T ( x i − m j ) ( x i − m j ) T W = W T S W W ( 根据( 58 )式 ) (72) \begin{split} \hat{\pmb{S}}_W &=\sum_{j=1}^k\sum_{i\in C_k}(\pmb{W}^\text{T}\pmb{x}_i-\pmb{W}^\text{T}\pmb{m}_j)(\pmb{W}^\text{T}\pmb{x}_i-\pmb{W}^\text{T}\pmb{m}_j)^\text{T} \\&=\sum_{j=1}^k\sum_{i\in C_k}\pmb{W}^\text{T}(\pmb{x}_i-\pmb{m}_j)(\pmb{x}_i-\pmb{m}_j)^\text{T}\pmb{W} \\&=\pmb{W}^\text{T}\pmb{S}_W\pmb{W}\quad(根据(58)式) \end{split}\tag{72} S^W=j=1kiCk(WTxiWTmj)(WTxiWTmj)T=j=1kiCkWT(ximj)(ximj)TW=WTSWW(根据(58)式)(72)

将(68)(69)带入到(71),得到:

S ^ B = ∑ j = 1 k n j ( W T m j − W T m ) ( W T m j − W T m ) T = ∑ j = 1 k n j W T ( m j − m ) ( m j − m ) T W = W T S B W ( 根据( 63 )式 ) (73) \begin{split} \hat{\pmb{S}}_B&=\sum_{j=1}^kn_j(\pmb{W}^\text{T}\pmb{m}_j-\pmb{W}^\text{T}\pmb{m})(\pmb{W}^\text{T}\pmb{m}_j-\pmb{W}^\text{T}\pmb{m})^\text{T} \\&=\sum_{j=1}^kn_j\pmb{W}^\text{T}(\pmb{m}_j-\pmb{m})(\pmb{m}_j-\pmb{m})^\text{T}\pmb{W} \\&=\pmb{W}^\text{T}\pmb{S}_B\pmb{W}\quad(根据(63)式) \end{split}\tag{73} S^B=j=1knj(WTmjWTm)(WTmjWTm)T=j=1knjWT(mjm)(mjm)TW=WTSBW(根据(63)式)(73)

由此,多类别下的费雪准则,其目标函数有两种表示方式:

  • 第一种:用矩阵的迹表示

    J 1 ( W ) = trace ( S ^ W − 1 S ^ B ) = trace ( ( W T S W W ) − 1 ( W T S B W ) ) (74) J_1(\pmb{W})=\text{trace}\left(\hat{\pmb{S}}^{-1}_W\hat{\pmb{S}}_B\right)=\text{trace}\left((\pmb{W}^\text{T}\pmb{S}_W\pmb{W})^{-1}(\pmb{W}^\text{T}\pmb{S}_B\pmb{W})\right)\tag{74} J1(W)=trace(S^W1S^B)=trace((WTSWW)1(WTSBW))(74)

  • 第二种:用行列式表示

    J 2 ( W ) = ∣ S ^ B ∣ ∣ S ^ W ∣ = ∣ W T S B W ∣ ∣ W T S W W ∣ (75) J_2(\pmb{W})=\frac{|\hat{\pmb{S}}_B|}{|\hat{\pmb{S}}_W|}=\frac{|\pmb{W}^\text{T}\pmb{S}_B\pmb{W}|}{|\pmb{W}^\text{T}\pmb{S}_W\pmb{W}|}\tag{75} J2(W)=S^WS^B=WTSWWWTSBW(75)

不论以上哪种形式,最终均可得到如下最优化条件:

S B w l = λ l S W w l , ( l = 1 , ⋯   , q ) (76) \pmb{S}_B\pmb{w}_l=\lambda_l\pmb{S}_W\pmb{w}_l,\quad(l=1,\cdots,q)\tag{76} SBwl=λlSWwl,(l=1,,q)(76)

由上式得: S W − 1 S B w l = λ l w l \pmb{S}^{-1}_W\pmb{S}_B\pmb{w}_l=\lambda_l\pmb{w}_l SW1SBwl=λlwl 。参考(30)式之后的推导, λ l \lambda_l λl 为广义特征值, w l \pmb{w}_l wl 是广义特征向量。以(74)式为例,可得如下结论(详细推导过程,请见参考资料 [7] 中的推导过程)

J 1 ( W ) = λ 1 + ⋯ + λ q (77) J_1(\pmb{W})=\lambda_1+\cdots+\lambda_q\tag{77} J1(W)=λ1++λq(77)

因特征值非负,故 q = rank ( S ^ W − 1 S ^ B ) q=\text{rank}(\hat{\pmb{S}}^{-1}_W\hat{\pmb{S}}_B) q=rank(S^W1S^B)

又因为 m ^ \hat{\pmb{m}} m^ m ^ 1 , ⋯   , m ^ k \hat{\pmb{m}}_1,\cdots,\hat{\pmb{m}}_k m^1,,m^k 的线性组合,故:

rank ( S ^ W − 1 S ^ B ) = rank ( S ^ B ) = dim span { m ^ 1 − m ^ , ⋯   , m ^ k − m ^ } ≤ k − 1 (78) \text{rank}(\hat{\pmb{S}}^{-1}_W\hat{\pmb{S}}_B)=\text{rank}(\hat{\pmb{S}}_B)=\text{dim}\text{ span}\{\hat{\pmb{m}}_1-\hat{\pmb{m}},\cdots,\hat{\pmb{m}}_k-\hat{\pmb{m}}\}\le k-1\tag{78} rank(S^W1S^B)=rank(S^B)=dim span{m^1m^,,m^km^}k1(78)

q ≤ k − 1 q\le k-1 qk1

给定包含 k ≥ 2 k\ge2 k2 个类别的样本,多类别判别分析所能产生有效线性特征总数最多是 k − 1 k-1 k1 ,即降维的最大特征数。

参考资料

[1]. 周志华. 机器学习. 北京:清华大学出版社

[2]. 齐伟. 机器学习数学基础[M]. 北京:电子工业出版社

[3]. 拉格朗日乘数法

[4]. 广义特征值与极小极大原理。以下为此参考文献部分内容摘抄:

1、定义:设 A \pmb{A} A B \pmb{B} B n n n 阶方阵,若存在数 λ \lambda λ ,使得方程 A x = λ B x \pmb{Ax}=\lambda\pmb{Bx} Ax=λBx 存在非零解,则称 λ \lambda λ A \pmb{A} A 相对于 B \pmb{B} B 的广义特征值, x \pmb{x} x A \pmb{A} A 相对于 B \pmb{B} B 的属于广义特征值 λ \lambda λ 的特征向量。

  • B = I \pmb{B}=\pmb{I} BI(单位矩阵)时,广义特征值问题退化为标准特征值问题。

  • 特征向量是非零的

  • 广义特征值的求解

    ( A − λ B ) x = 0 (\pmb{A}-\lambda\pmb{B})\pmb{x}=\pmb{0} (AλB)x=0 或者 ( λ B − A ) x = 0 ( \lambda\pmb{B}-\pmb{A})\pmb{x}=\pmb{0} (λBA)x=0

    特征方程: det ( A − λ B ) = 0 \text{det}(\pmb{A}-\lambda\pmb{B})=0 det(AλB)=0

    求得 λ \lambda λ 后代回原方程 A x = λ B x \pmb{Ax}=\lambda\pmb{Bx} Ax=λBx 可求出 x \pmb{x} x

2、等价表述

B \pmb{B} B 正定,且可逆,即 B − 1 \pmb{B}^{-1} B1 存在,则: B − 1 A x = λ x \pmb{B}^{-1}\pmb{Ax}=\lambda\pmb{x} B1Ax=λx ,广义特征值问题化为了标准特征值问题。

3、广义瑞丽商

A \pmb{A} A B \pmb{B} B n n n 阶厄米矩阵(Hermitian matrix,或译为“艾米尔特矩阵”、“厄米特矩阵”等),且 B \pmb{B} B 正定,则:

R ( x ) = x H A x x H B x , ( x ≠ 0 ) R(\pmb{x})=\frac{\pmb{x}^\text{H}\pmb{Ax}}{\pmb{x}^\text{H}\pmb{Bx}},(\pmb{x}\ne0) R(x)=xHBxxHAx,(x=0) A \pmb{A} A 相对于 B \pmb{B} B 的瑞丽商。

[5]. 谢文睿,秦州. 机器学习公式详解. 北京:人民邮电出版社

你可能感兴趣的:(数学基础,人工智能,机器学习,线性代数)