Softmax Regression损失函数的求导

softmax regression 代价函数:

J ( θ ) = − 1 m [ ∑ i = 1 m ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] J(\theta) = -\frac{1}{m}\left[\sum_{i=1}^{m}\sum_{j=1}^{k}1\{y^{(i)}=j\}log \frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right] J(θ)=m1[i=1mj=1k1{y(i)=j}logl=1keθlTX(i)eθjTX(i)]
其中,1{y(i)=j}表示的是当y(i)属于类别j时,1{y(i)=j}=1, 否则,1{y(i)=j}=0.


对损失函数求导:

∇ θ j J ( θ ) = − 1 m ∑ i = 1 m [ ∇ θ j ∑ j = 1 k 1 { y ( i ) = j } l o g e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) e θ j T X ( i ) ⋅ ( − e θ j T X ( i ) ⋅ X ( i ) ⋅ e θ j T X ( i ) ( ∑ l = 1 k e θ l T X ( i ) ) 2 + e θ j T X ( i ) ⋅ X ( i ) ∑ l = 1 k e θ l T X ( i ) ) ] = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) − e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ⋅ X ( i ) ] \nabla_{\theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\left[\nabla_{\theta_j}\sum_{j=1}^{k}1\{y^{(i)}=j\}log \frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right] \\=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}X^{(i)}}}{e^{\theta_j^TX^{(i)}}} ⋅\left(-\frac{e^{{\theta_j^T}X^{(i)}}⋅X^{(i)}⋅{e^{{\theta_j^T}{X^{(i)}}}}}{\left({\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}\right)^2} + \frac{e^{{\theta_j^T}X^{(i)}}⋅X^{(i)}}{{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}}\right)\right]\\=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}-e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}⋅X^{(i)}\right] θjJ(θ)=m1i=1m[θjj=1k1{y(i)=j}logl=1keθlTX(i)eθjTX(i)]=m1i=1m1{y(i)=j}eθjTX(i)l=1keθlTX(i)(l=1keθlTX(i))2eθjTX(i)X(i)eθjTX(i)+l=1keθlTX(i)eθjTX(i)X(i)=m1i=1m[1{y(i)=j}l=1keθlTX(i)l=1keθlTX(i)eθjTX(i)X(i)]


第一步求导中,首先是对数函数的求导,此处将log看成ln:

( l n x ) ′ = 1 x (lnx)' = \frac{1}{x} (lnx)=x1
其次是对指数函数的求导:

( e x ) ′ = e x (e^x)'=e^x (ex)=ex

另外对?_j_的求导是针对?中的某一项j,所以其他非j的?项求导后为0.
这一步还有看不懂的可以参考一下求导例题:
Softmax Regression损失函数的求导_第1张图片

而对于每一个样本,估计其所属的类别的概率为:

P ( y ( i ) = j ∣ X ( i ) ; θ ) = e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) P(y^{(i)}=j|X^{(i)};\theta)=\frac{e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}} P(y(i)=jX(i);θ)=l=1keθlTX(i)eθjTX(i)

所以最终的结果是:

∇ θ j J ( θ ) = − 1 m ∑ i = 1 m [ 1 { y ( i ) = j } ⋅ ∑ l = 1 k e θ l T X ( i ) − e θ j T X ( i ) ∑ l = 1 k e θ l T X ( i ) ⋅ X ( i ) ] = − 1 m ∑ i = 1 m [ X ( i ) ⋅ ( 1 { y ( i ) = j } − P ( y ( i ) = j ∣ X ( i ) ; θ ) ) ] \nabla_{\theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\left[1\{y^{(i)}=j\}⋅\frac{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}-e^{{\theta_j^T}{X^{(i)}}}}{\sum_{l=1}^ke^{{\theta_l^T}{X^{(i)}}}}⋅X^{(i)}\right]\\=-\frac{1}{m}\sum_{i=1}^{m}\left[X^{(i)}⋅\left(1\{y^{(i)}=j\}-P(y^{(i)}=j|X^{(i)};\theta)\right)\right] θjJ(θ)=m1i=1m[1{y(i)=j}l=1keθlTX(i)l=1keθlTX(i)eθjTX(i)X(i)]=m1i=1m[X(i)(1{y(i)=j}P(y(i)=jX(i);θ))]
此处?_j_表示的是一个向量。
通过梯度下降法的公式可心更新如下:

θ j = θ j + α ∇ θ j J ( θ ) \theta_j = \theta_j+\alpha\nabla_{\theta_j}J(\theta) θj=θj+αθjJ(θ)

更详细了解Softmax回归可参考Ufldl教程。

你可能感兴趣的:(机器学习,Softmax,求导)