逻辑回归的梯度下降公式详细推导过程

逻辑回归的梯度下降公式

逻辑回归的代价函数公式如下:

J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right] J(θ)=m1[i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]

其梯度下降公式如下:

θ j : = θ j − α ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_{j}:=\theta_{j}-\alpha \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θj:=θjαi=1m(hθ(x(i))y(i))xj(i)

详细推导过程

公式回顾

Sigmoid公式及其求导(详细推导过程):

g ( x ) = 1 1 + e − x g(x)=\frac{1}{1+e^{-x}} g(x)=1+ex1

g ′ ( x ) = g ( x ) ( 1 − g ( x ) ) g^{\prime}(x)=g(x)(1-g(x)) g(x)=g(x)(1g(x))

推导过程如下:

∂ J ( θ ) ∂ θ J = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ( i ) ) ∂ h θ ( x i ) ∂ θ j − ( 1 − y ( i ) ) 1 1 − h θ ( x ( i ) ) ∂ h θ ( x i ) ∂ θ j ) = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ⋅ ∂ g ( θ T x ( i ) ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ⋅ g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) x j ( i ) = − 1 m ∑ i = 1 m ( y ( i ) ( 1 − g ( θ T x ( i ) ) − ( 1 − y ( i ) ) g ( θ T x ( i ) ) ) ⋅ x j ( i ) = − 1 m ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) ⋅ x j ( i ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x j ( i ) \begin{aligned} \frac{\partial J(\theta)}{\partial \theta_{J}} &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{h_{\theta}\left(x^{(i)}\right)} \frac{\partial h_{\theta}\left(x^{i}\right)}{\partial \theta_{j}}-\left(1-y^{(i)}\right) \frac{1}{1-h_{\theta}\left(x^{(i)}\right)} \frac{\partial h_{\theta}\left(x^{i}\right)}{\partial \theta_{j}}\right) \\\\ &= -\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{g\left(\theta^{T} x^{(i)}\right)}-\left(1-y^{(i)}\right) \frac{1}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot \frac{\partial g\left(\theta^{T} x^{(i)}\right)}{\partial \theta_{j}}\\\\ &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)} \frac{1}{g\left(\theta^{T} x^{(i)}\right)}-\left(1-y^{(i)}\right) \frac{1}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot g\left(\theta^{T} x^{(i)}\right)\left(1-g\left(\theta^{T} x^{(i)}\right)\right)x_{j}^{(i)} \\\\ & =-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}\left(1-g\left(\theta^{T} x^{(i)}\right)-\left(1-y^{(i)}\right) g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)}\right. \\\\ &=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}-g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)} \\\\ &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x_{j}^{(i)} \end{aligned} θJJ(θ)=m1i=1m(y(i)hθ(x(i))1θjhθ(xi)(1y(i))1hθ(x(i))1θjhθ(xi))=m1i=1m(y(i)g(θTx(i))1(1y(i))1g(θTx(i))1)θjg(θTx(i))=m1i=1m(y(i)g(θTx(i))1(1y(i))1g(θTx(i))1)g(θTx(i))(1g(θTx(i)))xj(i)=m1i=1m(y(i)(1g(θTx(i))(1y(i))g(θTx(i)))xj(i)=m1i=1m(y(i)g(θTx(i)))xj(i)=m1i=1m(hθ(x(i))y(i))xj(i)




参考资料

考研必备数学公式大全: https://blog.csdn.net/zhaohongfei_358/article/details/106039576

Sigmoid函数求导过程:https://blog.csdn.net/zhaohongfei_358/article/details/119274445

你可能感兴趣的:(机器学习)