logistics regression

逻辑回归

回归:输入输出均为连续变量;
分类:输出为离散变量;
联合概率计算最大似然函数,即调整当前超参数,使之符合训练数据的概率最大。

评价回归函数

设置超参数,描述联合概率:
g ( w T x ) = 1 1 + e − z = 1 1 + e w T x { P ( y = 1 ) = g ( w T x ) P ( y = 0 ) = 1 − g ( w T x ) ⇒ P ( T r u e ) = ( g ( w , x i ) ) y i ∗ ( 1 − g ( w i , x i ) ) 1 − y i ⇒ L ( w ⃗ ) = ∏ i = 1 m P ( T r u e ) ⇒ L o s s ( w ⃗ ) = − 1 m L ( w ⃗ ) \begin{alignedat} a&g(w^Tx) = \frac {1}{1+e^{-z}} = \frac {1}{1+e^{w^Tx}}\\ &\begin{cases} P(y=1) &= g(w^Tx)\\ P(y=0) &= 1-g(w^Tx) \end{cases}\\ \Rightarrow &P( True ) = (g(w,x_i))^{y_i}*(1-g(w_i,xi))^{1-y_i}\\ \Rightarrow &L(\vec w) = \prod_{i=1}^mP(True)\\ \Rightarrow &Loss(\vec w) = -{1\over m}L(\vec w) \end{alignedat} g(wTx)=1+ez1=1+ewTx1{P(y=1)P(y=0)=g(wTx)=1g(wTx)P(True)=(g(w,xi))yi(1g(wi,xi))1yiL(w )=i=1mP(True)Loss(w )=m1L(w )
其中,y是真实值。P表示当前超参数时,各情况概率,用以评价当前超参数。此时损失函数描述了变量w的变化规律
推导似然函数 L 及损失函数:
h θ ( x ) = g ( θ ; x ) = 1 1 + e θ T x L ( θ ) = ∏ i = 1 m ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i ⇒ l o g L ( θ ) = ∑ i = 1 m y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ⇒ δ δ θ j l o g L ( θ ) = − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) δ δ θ h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) δ δ θ j h θ ( x i ) ) = − 1 m ∑ i = 1 m [ y i 1 h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ] δ δ θ j h θ ( x i ) = − 1 m ∑ i = 1 m [ y i 1 h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ] h θ ( x i ) ( 1 − h θ ( x i ) ) δ δ θ j θ T x i = − 1 m ∑ i = 1 m [ y i ( 1 − h θ ( x i ) ) − ( 1 − y i ) h θ ( x i ) ] δ δ θ j θ T x i = − 1 m ∑ i = 1 m [ y i ( 1 − h θ ( x i ) ) − ( 1 − y i ) h θ ( x i ) ] x i j = 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \begin{alignedat}a h_\theta(x) &= g(\theta; x) = \frac{1}{1+e^{\theta ^T x}} \\ L(\theta) &= \prod_{i=1}^{m}(h_\theta(x_i))^{y_i}(1-h_{\theta}(x_i))^{1-y_i}\\ \Rightarrow logL(\theta) &= \sum_{i=1}^my_ilog(h_{\theta}(x_i))+(1-y_i)log(1-h_{\theta}(x_i))\\ \Rightarrow \frac{\delta}{\delta_{\theta_j}}logL(\theta) &= -\frac{1}{m}\sum_{i=1}^m( y_i\frac{1}{h_\theta(x_i)}\frac{\delta}{\delta_{\theta}}h_\theta(x_i) - (1-y_i)\frac{1}{1-h_{\theta}(x_i)}\frac{\delta}{\delta_{\theta_j}}h_\theta(x_i) )\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] \frac{\delta}{\delta_{\theta_j}}h_\theta(x_i)\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] h_\theta(x_i)(1-h_\theta(x_i))\frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}] \frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}]{x_i}_j\\ &= \frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j\\ \end{alignedat} hθ(x)L(θ)logL(θ)δθjδlogL(θ)=g(θ;x)=1+eθTx1=i=1m(hθ(xi))yi(1hθ(xi))1yi=i=1myilog(hθ(xi))+(1yi)log(1hθ(xi))=m1i=1m(yihθ(xi)1δθδhθ(xi)(1yi)1hθ(xi)1δθjδhθ(xi))=m1i=1m[yihθ(xi)1(1yi)1hθ(xi)1]δθjδhθ(xi)=m1i=1m[yihθ(xi)1(1yi)1hθ(xi)1]hθ(xi)(1hθ(xi))δθjδθTxi=m1i=1m[yi(1hθ(xi))(1yi)hθ(xi)]δθjδθTxi=m1i=1m[yi(1hθ(xi))(1yi)hθ(xi)]xij=m1i=1m(hθ(xi)yi)xij

更新超参数

上例中求得了针对变量的**偏导数**,实际变量变化时候,更新方向也要依据偏导数进行更新:
θ j = θ j − α 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \theta_j = \theta_j-\alpha\frac1{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j θj=θjαm1i=1m(hθ(xi)yi)xij

多分类的softmax

其中的概率函数表示:
h θ ( x ( i ) ) = [ p ( y ( i ) = 1 ∣ x ( i ) ; θ ) p ( y ( i ) = 2 ∣ x ( i ) ; θ ) . . p ( y ( i ) = k ∣ x ( i ) ; θ ) ; ] = 1 ∑ j = 1 k e θ j T x ( i ) [ e θ 1 T x ( i ) e θ 2 T x ( i ) . e θ k T x ( i ) ] \begin{alignedat}a h_\theta(x^{(i)}) &= \begin{bmatrix} p(y^{(i)} = 1|x^{(i)};\theta)\\ p(y^{(i)} = 2|x^{(i)};\theta)\\ .\\. p(y^{(i)} = k|x^{(i)};\theta); \end{bmatrix} &= {1 \over {\sum_{j=1}^k}e^{\theta^T_jx^{(i)}}} \begin{bmatrix} e^{\theta^T_1x^{(i)}}\\ e^{\theta^T_2x^{(i)}}\\ .\\ e^{\theta^T_kx^{(i)}}\\ \end{bmatrix} \end{alignedat} hθ(x(i))=p(y(i)=1x(i);θ)p(y(i)=2x(i);θ)..p(y(i)=kx(i);θ);=j=1keθjTx(i)1eθ1Tx(i)eθ2Tx(i).eθkTx(i)

你可能感兴趣的:(alg)