梯度下降法推导:逻辑回归二分类问题

文章目录

  • 数据集格式
  • 基于线性回归+sigmoid实现二分类的表达式
  • 链式法则求导
    • 链式表达式
    • 求解 ∂ L ∂ g \frac{\partial L}{\partial g} gL
    • 求解 ∂ g ∂ σ \frac{\partial g}{\partial \sigma} σg
    • 求解 ∂ σ ∂ z \frac{\partial \sigma}{\partial z} zσ
    • 求解 ∂ z ∂ w \frac{\partial z}{\partial w} wz
    • 求解 ∂ z ∂ b \frac{\partial z}{\partial b} bz
  • 最终表达式
    • 梯度表达式
    • 梯度更新表达式
      • w w w更新表达式
      • b b b更新表达式

数据集格式

在机器学习里数据集格式一般如下:
i i i个样本特征和标签写作:
x i = ( x 1 i , x 2 i , x 3 i , . . . , x d i ) T ∈ R d y i ∈ R x^i=(x_1^i,x_2^i,x_3^i,...,x_d^i)^T \in R^d \\ y^i \in R xi=(x1i,x2i,x3i,...,xdi)TRdyiR
完整的数据集可以写作:
X = [ x 1 , x 2 , … , x n ] = [ x 1 1 x 1 2 … x 1 n x 2 1 x 2 2 … x 2 n ⋮ ⋮ ⋱ ⋮ x d 1 x d 2 … x d n ] ∈ R d ∗ n y = [ y 1 , y 2 , … , y n ] ∈ R n X=[ x^1,x^2,\ldots,x^n] \\ = \begin{bmatrix} x^1_1& x^2_1 &\ldots &x^n_1 \\ x^1_2& x^2_2 &\ldots &x^n_2 \\ \vdots& \vdots & \ddots & \vdots\\ x^1_d& x^2_d &\ldots &x^n_d \\ \end{bmatrix} \in R^{d*n} \\ y=[ y^1,y^2,\ldots,y^n] \in R^n X=[x1,x2,,xn]= x11x21xd1x12x22xd2x1nx2nxdn Rdny=[y1,y2,,yn]Rn

基于线性回归+sigmoid实现二分类的表达式

对于单个样本
z = w T x + b = w 1 x 1 + w 2 x 2 + … + w d x d + b z = w^Tx+b \\ = w_1x_1+ w_2x_2+\ldots+w_dx_d+b z=wTx+b=w1x1+w2x2++wdxd+b
使用 s i g m o i d sigmoid sigmoid函数实现输出为 0 − 1 0-1 01之间,从而实现二分类, s i g m o i d sigmoid sigmoid函数表达式如下
σ ( z ) = 1 1 + e − z = e z 1 + e z \sigma (z) = \frac{1}{1+e^{-z}} = \frac{e^z}{1+e^z} σ(z)=1+ez1=1+ezez
使用 c r o s s − e n t r o p y cross-entropy crossentropy 作为损失函数,对于二分类问题,其表达式为
g ( z i ) = − y i log ⁡ ( σ ( z i ) ) − ( 1 − y i ) log ⁡ ( 1 − σ ( z i ) ) g(z^i)=-y^i \log{(\sigma(z^i))}-(1-y^i) \log{(1-\sigma(z^i))} g(zi)=yilog(σ(zi))(1yi)log(1σ(zi))
则损失函数可写作
L = 1 n ∑ i = 1 n ( g ( z i ) ) = 1 n ∑ i = 1 n ( − y i log ⁡ ( σ ( z i ) ) − ( 1 − y i ) log ⁡ ( 1 − σ ( z i ) ) ) L=\frac{1}{n} \sum_{i=1}^{n}(g(z^i))=\frac{1}{n} \sum_{i=1}^{n}(-y^i \log{(\sigma(z^i))}-(1-y^i) \log{(1-\sigma(z^i))}) L=n1i=1n(g(zi))=n1i=1n(yilog(σ(zi))(1yi)log(1σ(zi)))

链式法则求导

链式表达式

求解 w w w b b b的导数需要使用链式求导法则
求导公式如下:
∂ L ∂ w i = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ w i ∂ L ∂ b = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ b \frac{\partial L}{\partial w_i} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial w_i} \\ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial b} wiL=gLσgzσwizbL=gLσgzσbz

求解 ∂ L ∂ g \frac{\partial L}{\partial g} gL

L L L关于 g g g的表达式可写作
L = 1 n ∑ i = 1 n ( g ) = g L=\frac{1}{n} \sum_{i=1}^{n}(g)=g L=n1i=1n(g)=g
因此
∂ L ∂ g = 1 \frac{\partial L}{\partial g}=1 gL=1

求解 ∂ g ∂ σ \frac{\partial g}{\partial \sigma} σg

g g g关于 σ \sigma σ的表达式可写作
g = − y log ⁡ ( σ ) − ( 1 − y ) log ⁡ ( 1 − σ ) g=-y \log{(\sigma)}-(1-y) \log{(1-\sigma)} g=ylog(σ)(1y)log(1σ)
则可得
∂ g ∂ σ = ∂ ( − y log ⁡ ( σ ) − ( 1 − y ) log ⁡ ( 1 − σ ) ) ∂ σ = − y ∂ log ⁡ ( σ ) ∂ σ − ( 1 − y ) ∂ log ⁡ ( 1 − σ ) ∂ σ = − y σ + 1 − y 1 − σ \frac{\partial g}{\partial \sigma} = \frac{\partial (-y \log{(\sigma)}-(1-y) \log{(1-\sigma)})}{\partial \sigma} \\ = -y \frac{\partial \log{(\sigma)}}{\partial \sigma} -(1-y) \frac{\partial \log{(1-\sigma)}}{\partial \sigma} \\ =-\frac{y}{\sigma} + \frac {1-y}{1-\sigma} σg=σ(ylog(σ)(1y)log(1σ))=yσlog(σ)(1y)σlog(1σ)=σy+1σ1y

求解 ∂ σ ∂ z \frac{\partial \sigma}{\partial z} zσ

σ \sigma σ关于 z z z的表达式可写作
σ ( z ) = 1 1 + e − z = e z 1 + e z \sigma (z) = \frac{1}{1+e^{-z}} = \frac{e^z}{1+e^z} σ(z)=1+ez1=1+ezez

∂ σ ∂ z = ∂ ( 1 1 + e − z ) ∂ z = − 1 ( 1 + e − z ) 2 × e − z × ( − 1 ) = e − z ( 1 + e − z ) 2 = σ ( 1 − σ ) \frac{\partial \sigma}{\partial z}=\frac{\partial (\frac{1}{1+e^{-z}}) }{\partial z} \\ =-\frac{1}{(1+e^{-z})^2}\times e^{-z} \times (-1) \\ =\frac{e^{-z} }{(1+e^{-z})^2}=\sigma(1-\sigma) zσ=z(1+ez1)=(1+ez)21×ez×(1)=(1+ez)2ez=σ(1σ)

求解 ∂ z ∂ w \frac{\partial z}{\partial w} wz

z z z关于 w w w的表达式为 z = w T x + b z = w^Tx+b z=wTx+b
则可得
∂ z ∂ w i = x i , i = 1 , 2 , … , d \frac{\partial z}{\partial w_i}=x_i,i=1,2,\ldots,d wiz=xi,i=1,2,,d

求解 ∂ z ∂ b \frac{\partial z}{\partial b} bz

z z z关于 w w w的表达式为 z = w T x + b z = w^Tx+b z=wTx+b
则可得
∂ z ∂ b = 1 \frac{\partial z}{\partial b}=1 bz=1

最终表达式

梯度表达式

∂ L ∂ w i = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ w i = 1 × ( − y σ + 1 − y 1 − σ ) × σ ( 1 − σ ) × x i = x i ( − y ( 1 − σ ) + σ ( 1 − y ) ) = x i ( σ − y ) ∂ L ∂ b = ∂ L ∂ g ∂ g ∂ σ ∂ σ ∂ z ∂ z ∂ b = 1 × ( − y σ + 1 − y 1 − σ ) × σ ( 1 − σ ) × 1 = − y ( 1 − σ ) + σ ( 1 − y ) = σ − y \frac{\partial L}{\partial w_i} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial w_i} \\ = 1\times(-\frac{y}{\sigma} + \frac {1-y}{1-\sigma} ) \times \sigma(1-\sigma) \times x_i \\ = x_i(-y(1-\sigma)+\sigma(1-y)) \\ = x_i(\sigma-y) \\ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial g} \frac{\partial g}{\partial \sigma} \frac{\partial \sigma}{\partial z} \frac{\partial z}{\partial b} \\ = 1\times(-\frac{y}{\sigma} + \frac {1-y}{1-\sigma} ) \times \sigma(1-\sigma) \times 1 \\ = -y(1-\sigma)+\sigma(1-y) \\ = \sigma-y \\ wiL=gLσgzσwiz=1×(σy+1σ1y)×σ(1σ)×xi=xi(y(1σ)+σ(1y))=xi(σy)bL=gLσgzσbz=1×(σy+1σ1y)×σ(1σ)×1=y(1σ)+σ(1y)=σy

梯度更新表达式

w w w更新表达式

因为
∂ L ∂ w i = x i ( σ − y ) \frac{\partial L}{\partial w_i} =x_i(\sigma-y) wiL=xi(σy)
则梯度更新表达式为
w i = w i − η ∂ L ∂ w i = w i − η x i ( σ − y ) w_i=w_i-\eta\frac{\partial L}{\partial w_i} \\ = w_i-\eta x_i(\sigma-y) wi=wiηwiL=wiηxi(σy)

[ w 1 w 2 ⋮ w d ] = [ w 1 w 2 ⋮ w d ] − η ( σ − y ) [ x 1 x 2 ⋮ x d ] \begin{bmatrix} w_1\\ w_2\\ \vdots \\ w_d\\ \end{bmatrix}=\begin{bmatrix} w_1\\ w_2\\ \vdots \\ w_d\\ \end{bmatrix}-\eta(\sigma-y)\begin{bmatrix} x_1\\ x_2\\ \vdots \\ x_d\\ \end{bmatrix} w1w2wd = w1w2wd η(σy) x1x2xd

w = w − η ( σ − y ) x w=w-\eta(\sigma-y)x w=wη(σy)x

b b b更新表达式

因为
∂ L ∂ b = σ − y \frac{\partial L}{\partial b} =\sigma-y bL=σy
则梯度更新表达式为
b = b − η ∂ L ∂ b = b − η ( σ − y ) b=b-\eta\frac{\partial L}{\partial b} \\ = b-\eta(\sigma-y) b=bηbL=bη(σy)

你可能感兴趣的:(深度学习笔记,逻辑回归,分类,机器学习)