分类与回归梯度下降公式推导

逻辑回归梯度下降公式推导

    • 1.相关函数公式及求导
      • 1.1.线性回归公式
      • 1.2.sigmoid函数
    • 2. 逻辑回归
      • 2.1损失函数公式【交叉熵公式】
      • 2.2 求导
      • 2.2 逻辑回归梯度下降公式
    • 3. 线性回归
      • 3.1损失函数公式【MSE公式】
      • 3.2 线性回归梯度下降公式

1.相关函数公式及求导

  • 链式推导法则
    ∂ J ( θ ) ∂ θ = ∂ J ( h ) ∂ h ∗ ∂ h ( z ) ∂ z ∗ ∂ z ( θ ) ∂ θ \begin{aligned} \frac{\partial J_{(\theta)}}{\partial \theta}=\frac{\partial J_{(h)}}{\partial h}* \frac{\partial h_{(z)}}{\partial z}* \frac{\partial z_{(\theta)}}{\partial \theta} \end{aligned} θJ(θ)=hJ(h)zh(z)θz(θ)

1.1.线性回归公式

  • 原函数
    z ( θ ) = θ T x z_{(\theta)}= \theta^T x z(θ)=θTx
  • 求导过程
    ∂ z ( θ ) ∂ θ = ∂ θ T x ∂ θ = x \begin{aligned} \frac{\partial z_{(\theta)}}{\partial \theta_{}}&=\frac{\partial \theta^T x}{\partial \theta} \\&=x \end{aligned} θz(θ)=θθTx=x

1.2.sigmoid函数

  • 原函数
    h ( z ) = 1 1 + e − z \begin{aligned} h_{(z)} &= \frac{1}{1 +e^{-z}} \end{aligned} h(z)=1+ez1
  • 求导过程

∂ h ( z ) ∂ z = ∂ ( 1 1 + e − z ) ∂ z = 0 − ∂ ( 1 + e − z ) ( 1 + e − z ) 2 = e − z ( 1 + e − z ) 2 = 1 + e − z − 1 ( 1 + e − z ) ( 1 + e − z ) = 1 + e − z − 1 ( 1 + e − z ) . 1 ( 1 + e − z ) = [ 1 − 1 ( 1 + e − z ) ] . 1 ( 1 + e − z ) = ( 1 − h ( z ) ) ∗ h ( z ) \begin{aligned} \frac{\partial h_{(z)}}{\partial z}&= \frac{\partial(\frac{1}{1 +e^{-z}})}{\partial z} \\&=\frac{0-\partial(1 +e^{-z})}{(1 +e^{-z})^2} \\&=\frac{e^{-z}}{(1 +e^{-z})^2} \\&=\frac{1+e^{-z}-1}{(1 +e^{-z})(1 +e^{-z})} \\&=\frac{1+e^{-z}-1}{(1 +e^{-z})}.\frac{1}{(1 +e^{-z})} \\&=[1-\frac{1}{(1 +e^{-z})}].\frac{1}{(1 +e^{-z})} \\&=(1-h_{(z)})*h_{(z)} \end{aligned} zh(z)=z(1+ez1)=(1+ez)20(1+ez)=(1+ez)2ez=(1+ez)(1+ez)1+ez1=(1+ez)1+ez1.(1+ez)1=[1(1+ez)1].(1+ez)1=(1h(z))h(z)

2. 逻辑回归

2.1损失函数公式【交叉熵公式】

  • 原函数
    L o s s ( w ) = − 1 m [ ∑ i = 1 m ( y i ∗ l o g h ( z ) + ( 1 − y i ) ∗ l o g ( 1 − h ( z ) ) ] \begin{aligned} Loss_{(w)} &= -\frac{1}{m}\lbrack\sum_{i=1}^{m}{(y^i*log^{h(z)}+(1-y^i)*log^{(1-h(z)}})\rbrack \end{aligned} Loss(w)=m1[i=1m(yilogh(z)+(1yi)log(1h(z))]
  • 求偏导过程
    ∂ L o s s ( h ) ∂ h = − ∂ 1 m [ ∑ i = 1 m ( y i ∗ l o g h ( z ) + ( 1 − y i ) ∗ l o g ( 1 − h ( z ) ) ] ∂ h = − 1 m [ ∑ i = 1 m ( y i ∗ 1 h ( z ) + ( 1 − y i ) ∗ 1 1 − h ( z ) ∗ ( − 1 ) ) ] = − 1 m [ ∑ i = 1 m ( y i h ( z ) + ( 1 − y i ) 1 − h ( z ) ∗ ( − 1 ) ) ] = − 1 m [ ∑ i = 1 m ( y i ∗ ( 1 − h ( z ) ) + ( y i − 1 ) ∗ h ( z ) h ( z ) ∗ ( 1 − h ( z ) ) ) ] = − 1 m [ ∑ i = 1 m ( y i − h ( z ) h ( z ) ∗ ( 1 − h ( z ) ) ) ] = 1 m [ ∑ i = 1 m ( h ( z ) − y i h ( z ) ∗ ( 1 − h ( z ) ) ) ] \begin{aligned} \frac{\partial Loss_{(h)}}{\partial h}&=-\frac{\partial\frac{1}{m}\lbrack\sum_{i=1}^{m}{(y^i*log^{h(z)}+(1-y^i)*log^{(1-h(z)}})\rbrack}{\partial h} \\ &= -\frac{1}{m}\lbrack\sum_{i=1}^{m}({y^i}* \frac{1}{h{(z)}}+{(1-y^i)}* \frac{1}{1-h{(z)}}*(-1))\rbrack \\ &= -\frac{1}{m}\lbrack\sum_{i=1}^{m}(\frac{y^i}{h{(z)}}+ \frac{(1-y^i)}{1-h{(z)}}*(-1))\rbrack \\ &= -\frac{1}{m}\lbrack\sum_{i=1}^{m}(\frac{y^i*(1-h_{(z)})+(y^i-1)*h_{(z)}}{h_{ (z)}*(1-h_{(z)})})\rbrack \\ &= -\frac{1}{m}\lbrack\sum_{i=1}^{m}(\frac{y^i-h_{(z)}}{h_{ (z)}*(1-h_{(z)})})\rbrack \\ &= \frac{1}{m}\lbrack\sum_{i=1}^{m}(\frac{h_{(z)}-y^i}{h_{ (z)}*(1-h_{(z)})})\rbrack \end{aligned} hLoss(h)=hm1[i=1m(yilogh(z)+(1yi)log(1h(z))]=m1[i=1m(yih(z)1+(1yi)1h(z)1(1))]=m1[i=1m(h(z)yi+1h(z)(1yi)(1))]=m1[i=1m(h(z)(1h(z))yi(1h(z))+(yi1)h(z))]=m1[i=1m(h(z)(1h(z))yih(z))]=m1[i=1m(h(z)(1h(z))h(z)yi)]

2.2 求导

∂ L o s s ( θ ) ∂ θ j = ∂ L o s s ( h ) ∂ h ∗ ∂ h ( z ) ∂ z ∗ ∂ z ( θ ) ∂ θ j = 1 m [ ∑ i = 1 m ( h ( z ) − y i h ( z ) ∗ ( 1 − h ( z ) ) ) ] ∗ ( 1 − h ( z ) ) ∗ h ( z ) ∗ x j i = 1 m ∑ i = 1 m ( h ( z ) − y i ) ∗ x j i \begin{aligned} \frac{\partial Loss_{(\theta)}}{\partial \theta_{j}}&=\frac{\partial Loss_{(h)}}{\partial h}* \frac{\partial h_{(z)}}{\partial z}* \frac{\partial z_{(\theta)}}{\partial \theta_{j}} \\ &= \frac{1}{m}\lbrack\sum_{i=1}^{m}(\frac{h_{(z)}-y^i}{h_{ (z)}*(1-h_{(z)})})\rbrack*(1-h_{(z)})*h_{(z)}*x^i_{j} \\ &= \frac{1}{m}\sum_{i=1}^{m}(h_{(z)}-y^i)*x^i_{j} \end{aligned} θjLoss(θ)=hLoss(h)zh(z)θjz(θ)=m1[i=1m(h(z)(1h(z))h(z)yi)](1h(z))h(z)xji=m1i=1m(h(z)yi)xji

2.2 逻辑回归梯度下降公式

  • θ \theta θ求偏导
    θ j : = θ j − α ∗ ∂ L o s s ( θ ) ∂ θ j : = θ j − α ∗ 1 m ∑ i = 1 m ( h ( z ) − y i ) ∗ x j i \begin{aligned} \theta_{j}&:=\theta_{j}-\alpha*\frac{\partial Loss_{(\theta)}}{\partial \theta_{j}} \\&:=\theta_{j}-\alpha*\frac{1}{m}\sum_{i=1}^{m}(h_{(z)}-y^i)*x^i_{j} \end{aligned} θj:=θjαθjLoss(θ):=θjαm1i=1m(h(z)yi)xji

3. 线性回归

3.1损失函数公式【MSE公式】

  • 原函数
    L o s s ( θ ) = 1 2 ( z ( θ ) − y ) 2 \begin{aligned} Loss_{(\theta)} &=\frac{1}{2} (z_{(\theta)}-y)^2 \end{aligned} Loss(θ)=21(z(θ)y)2
  • 求导过程
    ∂ L o s s ( θ ) ∂ θ j = ∂ 1 2 ( z ( θ ) − y i ) 2 ∂ θ j = 1 2 ∗ 2 ∗ ( z ( θ j ) − y i ) ∗ ∂ z ( θ j ) = ( z ( θ j ) − y i ) ∗ x j i \begin{aligned} \frac{\partial Loss_{(\theta)}}{\partial \theta_{j}}&= \frac{\partial \frac{1}{2} (z_{(\theta)}-y^i)^2}{\partial \theta_{j}} \\ &= \frac{1}{2}*2*(z_{(\theta_{j})}-y^i)*\partial z_{(\theta_{j})} \\ &= (z_{(\theta_{j})}-y^i)*x^i_{j} \end{aligned} θjLoss(θ)=θj21(z(θ)yi)2=212(z(θj)yi)z(θj)=(z(θj)yi)xji

3.2 线性回归梯度下降公式

θ j : = θ j − α ∗ ∂ L o s s ( θ ) ∂ θ j : = θ j − α ∗ ( z ( θ j ) − y i ) ∗ x j i \begin{aligned} \theta_{j}&:=\theta_{j}-\alpha*\frac{\partial Loss_{(\theta)}}{\partial \theta_{j}} \\&:=\theta_{j}-\alpha* (z_{(\theta_{j})}-y^i)*x^i_{j} \end{aligned} θj:=θjαθjLoss(θ):=θjα(z(θj)yi)xji

你可能感兴趣的:(机器学习笔记,逻辑回归,人工智能)