数据挖掘与分析课程笔记(Chapter 20)

数据挖掘与分析课程笔记

  • 参考教材:Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.

文章目录

  1. 数据挖掘与分析课程笔记(目录)
  2. 数据挖掘与分析课程笔记(Chapter 1)
  3. 数据挖掘与分析课程笔记(Chapter 2)
  4. 数据挖掘与分析课程笔记(Chapter 5)
  5. 数据挖掘与分析课程笔记(Chapter 7)
  6. 数据挖掘与分析课程笔记(Chapter 14)
  7. 数据挖掘与分析课程笔记(Chapter 15)
  8. 数据挖掘与分析课程笔记(Chapter 20)
  9. 数据挖掘与分析课程笔记(Chapter 21)

笔记目录

  • 数据挖掘与分析课程笔记
  • 文章目录
  • Chapter 20: Linear Discriminant Analysis
    • 20.1 Normal LDA
    • 20.2 Kernel LDA:


Chapter 20: Linear Discriminant Analysis

Set up D = { ( x i , y i ) } i = 1 n \mathbf{D}=\{(\mathbf{x}_i,y_i) \}_{i=1}^n D={(xi,yi)}i=1n, 其中 y i = 1 , 2 y_i=1,2 yi=1,2(或 ± 1 \pm 1 ±1 等), D 1 = { x i ∣ y i = 1 } \mathbf{D}_1=\{\mathbf{x}_i|y_i=1 \} D1={xiyi=1} D 2 = { x i ∣ y i = 2 } \mathbf{D}_2=\{\mathbf{x}_i|y_i=2 \} D2={xiyi=2}

Goal:寻找向量 w ∈ R d \mathbf{w}\in \mathbb{R}^d wRd (代表直线方向)使得 D 1 , D 2 \mathbf{D}_1,\mathbf{D}_2 D1,D2 的“平均值”距离最大且“总方差”最小。

20.1 Normal LDA

w ∈ R d , w T w = 1 \mathbf{w} \in \mathbb{R}^d,\mathbf{w}^T\mathbf{w}=1 wRd,wTw=1,则 x i \mathbf{x}_i xi w \mathbf{w} w 方向上的投影为 x i ′ = ( w T x i w T u ) w = a i w , a i = w T x i \mathbf{x}_{i}^{\prime}=\left(\frac{\mathbf{w}^{T} \mathbf{x}_{i}}{\mathbf{w}^{T} \mathbf{u}}\right) \mathbf{w}=a_{i} \mathbf{w},a_{i}=\mathbf{w}^{T} \mathbf{x}_{i} xi=(wTuwTxi)w=aiw,ai=wTxi

D 1 \mathbf{D}_1 D1 中数据在 w \mathbf{w} w 上的投影平均值为:( ∣ D 1 ∣ = n 1 |\mathbf{D}_1|=n_1 D1=n1
m 1 : = 1 n 1 ∑ x i ∈ D 1 a i = μ 1 T w m_1:=\frac{1}{n_1}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_1}a_i=\boldsymbol{\mu}_1^T\mathbf{w} m1:=n11xiD1ai=μ1Tw
投影平均值等于平均值的投影。

类似地: D 2 \mathbf{D}_2 D2 中数据在 w \mathbf{w} w 上的投影平均值为:
m 2 : = 1 n 2 ∑ x i ∈ D 2 a i = μ 2 T w m_2:=\frac{1}{n_2}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_2}a_i=\boldsymbol{\mu}_2^T\mathbf{w} m2:=n21xiD2ai=μ2Tw
目标之一:寻找 w \mathbf{w} w 使得 ( m 1 − m 2 ) 2 (m_1-m_2)^2 (m1m2)2 最大。

对于 D i \mathbf{D}_i Di,定义:
s i 2 = ∑ x k ∈ D i ( a k − m i ) 2 s_i^2=\sum\limits_{\mathbf{x}_k\in \mathbf{D}_i}(a_k-m_i)^2 si2=xkDi(akmi)2
注意: s i 2 = n i σ i 2   ( ∣ D i ∣ = n i ) s_i^2=n_i\sigma^2_i\ (|D_i|=n_i) si2=niσi2 (Di=ni)

Goal:Fisher LDA目标函数:
max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2} wmaxJ(w)=s12+s22(m1m2)2
注意: J ( w ) = J ( w 1 , w 2 , ⋯   , w d ) J(\mathbf{w})=J(w_1,w_2,\cdots,w_d) J(w)=J(w1,w2,,wd)
( m 1 − m 2 ) 2 = ( w T ( μ 1 − μ 2 ) ) 2 = w T ( ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T ) w = w T B w \begin{aligned} \left(m_{1}-m_{2}\right)^{2} &=\left(\mathbf{w}^{T}\left(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}\right)\right)^{2} \\ &=\mathbf{w}^{T}\left(\left(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}\right)\left(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}\right)^{T}\right) \mathbf{w} \\ &=\mathbf{w}^{T} \mathbf{B} \mathbf{w} \end{aligned} (m1m2)2=(wT(μ1μ2))2=wT((μ1μ2)(μ1μ2)T)w=wTBw

B \mathbf{B} B 被称为类间扩散矩阵

s 1 2 = ∑ x i ∈ D 1 ( a i − m 1 ) 2 = ∑ x i ∈ D 1 ( w T x i − w T μ 1 ) 2 = ∑ x i ∈ D 1 ( w T ( x i − μ 1 ) ) 2 = w T ( ∑ x i ∈ D 1 ( x i − μ 1 ) ( x i − μ 1 ) T ) w = w T S 1 w \begin{aligned} s_{1}^{2} &=\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left(a_{i}-m_{1}\right)^{2} \\ &=\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left(\mathbf{w}^{T} \mathbf{x}_{i}-\mathbf{w}^{T} \boldsymbol{\mu}_{1}\right)^{2} \\ &=\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left(\mathbf{w}^{T}\left(\mathbf{x}_{i}-\boldsymbol{\mu}_{1}\right)\right)^{2} \\ &=\mathbf{w}^{T}\left(\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left(\mathbf{x}_{i}-\boldsymbol{\mu}_{1}\right)\left(\mathbf{x}_{i}-\boldsymbol{\mu}_{1}\right)^{T}\right) \mathbf{w} \\ &=\mathbf{w}^{T} \mathbf{S}_{1} \mathbf{w} \end{aligned} s12=xiD1(aim1)2=xiD1(wTxiwTμ1)2=xiD1(wT(xiμ1))2=wT(xiD1(xiμ1)(xiμ1)T)w=wTS1w

S 1 \mathbf{S}_{1} S1 被称为 D 1 \mathbf{D}_1 D1 的扩散矩阵 S 1 = n 1 Σ 1 \mathbf{S}_{1}=n_1\Sigma_1 S1=n1Σ1

类似地, s 2 2 = w T S 2 w s_{2}^{2}=\mathbf{w}^{T} \mathbf{S}_{2} \mathbf{w} s22=wTS2w

S = S 1 + S 2 \mathbf{S}=\mathbf{S}_{1}+\mathbf{S}_{2} S=S1+S2,则
max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B w w T S w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSwwTBw

注意:
d d w J ( w ) = 2 B w ( w T S w ) − 2 S w ( w T B w ) ( w T S w ) 2 = 0 \frac{d}{d\mathbf{w}}J(\mathbf{w})=\frac{2\mathbf{B}\mathbf{w}(\mathbf{w}^T\mathbf{S}\mathbf{w})-2\mathbf{S}\mathbf{w}(\mathbf{w}^T\mathbf{B}\mathbf{w})}{(\mathbf{w}^T\mathbf{S}\mathbf{w})^2}=\mathbf{0} dwdJ(w)=(wTSw)22Bw(wTSw)2Sw(wTBw)=0
即有:
B w ( w T S w ) = S w ( w T B w ) B w = S w ⋅ w T B w w T S w B w = J ( w ) ⋅ S w ( ∗ ) \begin{aligned} \mathbf{B}\mathbf{w}(\mathbf{w}^T\mathbf{S}\mathbf{w})&=\mathbf{S}\mathbf{w}(\mathbf{w}^T\mathbf{B}\mathbf{w})\\ \mathbf{B}\mathbf{w}&=\mathbf{S}\mathbf{w}\cdot\frac{\mathbf{w}^{T} \mathbf{B} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S} \mathbf{w}}\\ \mathbf{B}\mathbf{w}&=J(\mathbf{w})\cdot\mathbf{S} \mathbf{w}\quad (*) \end{aligned} Bw(wTSw)BwBw=Sw(wTBw)=SwwTSwwTBw=J(w)Sw()
S − 1 \mathbf{S}^{-1} S1 存在,则
S − 1 B w = J ( w ) ⋅ w \mathbf{S}^{-1}\mathbf{B}\mathbf{w}=J(\mathbf{w})\cdot\mathbf{w} S1Bw=J(w)w
故要求最大 J ( w ) J(\mathbf{w}) J(w) ,只需 S − 1 B \mathbf{S}^{-1}\mathbf{B} S1B 的最大特征值, w \mathbf{w} w 为其特征向量。

☆ 不求特征向量求出 w \mathbf{w} w 的方法

B = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T \mathbf{B}=(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T} B=(μ1μ2)(μ1μ2)T 代入 ( ∗ ) (*) ()
( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T w = J ( w ) ⋅ S w S − 1 ( μ 1 − μ 2 ) [ ( μ 1 − μ 2 ) T w J ( w ) ] = w \begin{aligned} (\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T}\mathbf{w} &=J(\mathbf{w})\cdot\mathbf{S} \mathbf{w}\\ \mathbf{S}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})[\frac{(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T}\mathbf{w}}{J(\mathbf{w})}]&=\mathbf{w} \end{aligned} (μ1μ2)(μ1μ2)TwS1(μ1μ2)[J(w)(μ1μ2)Tw]=J(w)Sw=w
故只需计算 S − 1 ( μ 1 − μ 2 ) \mathbf{S}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2}) S1(μ1μ2),再单位化。

20.2 Kernel LDA:

事实1:如果 ( S ϕ − 1 B ϕ ) w = λ w \left(\mathbf{S}_{\phi}^{-1} \mathbf{B}_{\phi}\right) \mathbf{w}=\lambda \mathbf{w} (Sϕ1Bϕ)w=λw,那么 w = ∑ j = 1 n a j ϕ ( x j ) \mathbf{w}=\sum\limits_{j=1}^na_j\phi(\mathbf{x}_j) w=j=1najϕ(xj),证明见讲稿最后两页。

a = ( a 1 , ⋯   , a n ) T \mathbf{a}=(a_1,\cdots,a_n)^T a=(a1,,an)T 是“事实1”中的向量。

下面将 max ⁡ w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B ϕ w w T S ϕ w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B}_{\phi} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S}_{\phi} \mathbf{w}} wmaxJ(w)=s12+s22(m1m2)2=wTSϕwwTBϕw 的问题转化为 max ⁡ G ( a ) \max G(\mathbf{a}) maxG(a) s.t. 使用 K \mathbf{K} K 能求解。

注意到:
m i = w T μ i ϕ = ( ∑ j = 1 n a j ϕ ( x j ) ) T ( 1 n i ∑ x i ∈ D i ϕ ( x k ) ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j ϕ ( x j ) T ϕ ( x k ) = 1 n i ∑ j = 1 n ∑ x k ∈ D i a j K ( x j , x k ) = a T m i \begin{aligned} m_{i}=\mathbf{w}^{T} \boldsymbol{\mu}_{i}^{\phi} &=\left(\sum_{j=1}^{n} a_{j} \phi\left(\mathbf{x}_{j}\right)\right)^{T}\left(\frac{1}{n_{i}} \sum_{\mathbf{x}_{i} \in \mathbf{D}_{i}} \phi\left(\mathbf{x}_{k}\right)\right) \\ &=\frac{1}{n_{i}} \sum_{j=1}^{n} \sum_{\mathbf{x}_{k} \in \mathbf{D}_{i}} a_{j} \phi\left(\mathbf{x}_{j}\right)^{T} \phi\left(\mathbf{x}_{k}\right) \\ &=\frac{1}{n_{i}} \sum_{j=1}^{n} \sum_{\mathbf{x}_{k} \in \mathbf{D}_{i}} a_{j} K\left(\mathbf{x}_{j}, \mathbf{x}_{k}\right) \\ &=\mathbf{a}^{T} \mathbf{m}_{i} \end{aligned} mi=wTμiϕ=(j=1najϕ(xj))T(ni1xiDiϕ(xk))=ni1j=1nxkDiajϕ(xj)Tϕ(xk)=ni1j=1nxkDiajK(xj,xk)=aTmi
其中,
m i = 1 n i ( ∑ x k ∈ D i K ( x 1 , x k ) ∑ x k ∈ D i K ( x 2 , x k ) ⋮ ∑ x k ∈ D i K ( x n , x k ) ) n × 1 \mathbf{m}_{i}=\frac{1}{n_{i}}\left(\begin{array}{c} \sum\limits_{\mathbf{x}_{k} \in \mathbf{D}_{i}} K\left(\mathbf{x}_{1}, \mathbf{x}_{k}\right) \\ \sum\limits_{\mathbf{x}_{k} \in \mathbf{D}_{i}} K\left(\mathbf{x}_{2}, \mathbf{x}_{k}\right) \\ \vdots \\ \sum\limits_{\mathbf{x}_{k} \in \mathbf{D}_{i}} K\left(\mathbf{x}_{n}, \mathbf{x}_{k}\right) \end{array}\right)_{n\times 1} mi=ni1 xkDiK(x1,xk)xkDiK(x2,xk)xkDiK(xn,xk) n×1

( m 1 − m 2 ) 2 = ( w T μ 1 ϕ − w T μ 2 ϕ ) 2 = ( a T m 1 − a T m 2 ) 2 = a T ( m 1 − m 2 ) ( m 1 − m 2 ) T a = a T M a \begin{aligned} \left(m_{1}-m_{2}\right)^{2} &=\left(\mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}-\mathbf{w}^{T} \boldsymbol{\mu}_{2}^{\phi}\right)^{2} \\ &=\left(\mathbf{a}^{T} \mathbf{m}_{1}-\mathbf{a}^{T} \mathbf{m}_{2}\right)^{2} \\ &=\mathbf{a}^{T}\left(\mathbf{m}_{1}-\mathbf{m}_{2}\right)\left(\mathbf{m}_{1}-\mathbf{m}_{2}\right)^{T} \mathbf{a} \\ &=\mathbf{a}^{T} \mathbf{M a} \end{aligned} (m1m2)2=(wTμ1ϕwTμ2ϕ)2=(aTm1aTm2)2=aT(m1m2)(m1m2)Ta=aTMa
M \mathbf{M} M 被称为核类间扩散矩阵)
s 1 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) − w T μ 1 ϕ ∥ 2 = ∑ x i ∈ D 1 ∥ w T ϕ ( x i ) ∥ 2 − 2 ∑ x i ∈ D 1 w T ϕ ( x i ) ⋅ w T μ 1 ϕ + ∑ x i ∈ D 1 ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 ∥ ∑ j = 1 n a j ϕ ( x j ) T ϕ ( x i ) ∥ 2 ) − 2 ⋅ n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 + n 1 ⋅ ∥ w T μ 1 ϕ ∥ 2 = ( ∑ x i ∈ D 1 a T K i K i T a ) − n 1 ⋅ a T m 1 m 1 T a = a T ( ( ∑ x i ∈ D 1 K i K i T ) − n 1 m 1 m 1 T ) a = a T N 1 a \begin{aligned} s_{1}^{2} &=\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left\|\mathbf{w}^{T} \phi\left(\mathbf{x}_{i}\right)-\mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}\right\|^{2} \\ &=\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left\|\mathbf{w}^{T} \phi\left(\mathbf{x}_{i}\right)\right\|^{2}-2 \sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}} \mathbf{w}^{T} \phi\left(\mathbf{x}_{i}\right) \cdot \mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}+\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left\|\mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}\right\|^{2} \\ &=\left(\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}}\left\|\sum_{j=1}^{n} a_{j} \phi\left(\mathbf{x}_{j}\right)^{T} \phi\left(\mathbf{x}_{i}\right)\right\|^{2}\right)-2 \cdot n_{1} \cdot\left\|\mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}\right\|^{2}+n_{1} \cdot\left\|\mathbf{w}^{T} \boldsymbol{\mu}_{1}^{\phi}\right\|^{2}\\ &=\left(\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}} \mathbf{a}^{T} \mathbf{K}_{i} \mathbf{K}_{i}^{T} \mathbf{a}\right)-n_{1} \cdot \mathbf{a}^{T} \mathbf{m}_{1} \mathbf{m}_{1}^{T} \mathbf{a}\\ &=\mathbf{a}^{T}\left(\left(\sum_{\mathbf{x}_{i} \in \mathbf{D}_{1}} \mathbf{K}_{i} \mathbf{K}_{i}^{T}\right)-n_{1} \mathbf{m}_{1} \mathbf{m}_{1}^{T}\right) \mathbf{a} \\ &=\mathbf{a}^{T} \mathbf{N}_{1} \mathbf{a} \end{aligned} s12=xiD1 wTϕ(xi)wTμ1ϕ 2=xiD1 wTϕ(xi) 22xiD1wTϕ(xi)wTμ1ϕ+xiD1 wTμ1ϕ 2= xiD1 j=1najϕ(xj)Tϕ(xi) 2 2n1 wTμ1ϕ 2+n1 wTμ1ϕ 2=(xiD1aTKiKiTa)n1aTm1m1Ta=aT((xiD1KiKiT)n1m1m1T)a=aTN1a
类似地,令 N 2 = ( ∑ x i ∈ D 2 K i K i T − n 2 m 2 m 2 T ) \mathbf{N}_2=\left(\sum\limits_{\mathbf{x}_{i} \in \mathbf{D}_{2}} \mathbf{K}_{i} \mathbf{K}_{i}^{T}-n_{2} \mathbf{m}_{2} \mathbf{m}_{2}^{T}\right) N2=(xiD2KiKiTn2m2m2T)

s 1 2 + s 2 2 = a T ( N 1 + N 2 ) a = a T N a s_1^2+s_2^2=\mathbf{a}^{T} (\mathbf{N}_{1}+\mathbf{N}_{2}) \mathbf{a}=\mathbf{a}^{T}\mathbf{N} \mathbf{a} s12+s22=aT(N1+N2)a=aTNa

故: J ( w ) = a T M a a T N a : = G ( a ) J(\mathbf{w})=\frac{\mathbf{a}^{T}\mathbf{M} \mathbf{a}}{\mathbf{a}^{T}\mathbf{N} \mathbf{a}}:=G(\mathbf{a}) J(w)=aTNaaTMa:=G(a)

类似 20.1, M a = λ N a \mathbf{M} \mathbf{a}=\lambda\mathbf{N} \mathbf{a} Ma=λNa

  • N − 1 \mathbf{N} ^{-1} N1 存在, N − 1 M a = λ a \mathbf{N}^{-1} \mathbf{M} \mathbf{a}=\lambda \mathbf{a} N1Ma=λa λ \lambda λ N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的最大特征值, a \mathbf{a} a 是相应的特征向量。

  • N − 1 \mathbf{N} ^{-1} N1 不存在,MATLAB 求广义逆

最后考查 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1,即

( ∑ j = 1 n a j ϕ ( x j ) ) T ( ∑ i = 1 n a i ϕ ( x i ) ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i ϕ ( x j ) T ϕ ( x i ) = 1 ∑ j = 1 n ∑ i = 1 n a j a i K ( x i , x j ) = 1 a T K a = 1 \begin{aligned} (\sum\limits_{j=1}^na_j\phi(\mathbf{x}_j))^T(\sum\limits_{i=1}^na_i\phi(\mathbf{x}_i))&=1\\ \sum\limits_{j=1}^n\sum\limits_{i=1}^na_ja_i\phi(\mathbf{x}_j)^T\phi(\mathbf{x}_i)&=1\\ \sum\limits_{j=1}^n\sum\limits_{i=1}^na_ja_iK(\mathbf{x}_i,\mathbf{x}_j)&=1\\ \mathbf{a}^T\mathbf{K}\mathbf{a}&=1 \end{aligned} (j=1najϕ(xj))T(i=1naiϕ(xi))j=1ni=1najaiϕ(xj)Tϕ(xi)j=1ni=1najaiK(xi,xj)aTKa=1=1=1=1
求出 N − 1 M \mathbf{N}^{-1} \mathbf{M} N1M 的特征向量 a \mathbf{a} a 后, a ← a a T K a \mathbf{a}\leftarrow \frac{\mathbf{a}}{\sqrt{\mathbf{a}^T\mathbf{K}\mathbf{a}}} aaTKa a 以保证 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1


你可能感兴趣的:(数学,数据挖掘,python)