逻辑斯谛回归(对数几率回归)

文章目录

  • LR简介
  • 损失函数
  • 参考

LR简介

逻辑斯谛回归是一种经典的线性分类方法,又被称为对数几率回归,其属于对数线性模型。

线性回归完成了数据的拟合,我们通过引入一个 s i g m o i d sigmoid sigmoid函数,即可在线性回归模型的基础上实现分类。

sigmoid函数定义如下
y = 1 1 + e − z y = \frac{1}{1 + e^{-z}} y=1+ez1

逻辑斯谛回归(对数几率回归)_第1张图片

以二分类任务为例,取 y ∈ { 0 , 1 } y\in \{0,1\} y{0,1},我们定义二项逻辑斯谛回归模型为如下条件概率分布:

P ( Y = 1 ∣ x ) = exp ⁡ ( w ⋅ x + b ) 1 + exp ⁡ ( w ⋅ x + b ) P ( Y = 0 ∣ x ) = 1 1 + exp ⁡ ( w ⋅ x + b ) P(Y=1|x) = \frac{\exp(w\cdot x + b)}{1 + \exp(w\cdot x + b)}\\ P(Y=0|x) = \frac{1}{1 + \exp(w\cdot x + b)} P(Y=1x)=1+exp(wx+b)exp(wx+b)P(Y=0x)=1+exp(wx+b)1

一个事件的几率是指该事件发生的概率与不发生的概率的比值,如果事件发生的概率为 p p p,则该事件的几率为 p 1 − p \frac{p}{1-p} 1pp,则该事件的对数几率即为:
log ⁡ p 1 − p \log \frac{p}{1-p} log1pp

考虑逻辑斯谛回归模型,
log ⁡ P ( Y = 1 ∣ x ) 1 − P ( Y = 1 ∣ x ) = w ⋅ x + b \log \frac{P(Y=1|x)}{1-P(Y=1|x)} = w\cdot x + b log1P(Y=1x)P(Y=1x)=wx+b

也就是说,输出 Y = 1 Y=1 Y=1的对数几率是输入 x x x的线性函数。

损失函数

对于给定的训练数据集,我们采用极大似然估计法来估计模型的参数,似然函数为:
∏ i = 1 N [ P ( y i = 1 ∣ x i ) ] y i [ 1 − P ( y i = 1 ∣ x i ) ] 1 − y i \prod_{i=1}^N[P(y_i=1|x_i)]^{y_i}[1-P(y_i=1|x_i)]^{1-y_i} i=1N[P(yi=1xi)]yi[1P(yi=1xi)]1yi
对数似然函数为:
L ( w , b ) = ∑ i = 1 N [ y i log ⁡ P ( y i = 1 ∣ x i ) + ( 1 − y i ) log ⁡ ( 1 − P ( y i = 1 ∣ x i ) ) ] = ∑ i = 1 N [ y i log ⁡ P ( y i = 1 ∣ x i ) 1 − P ( y i = 1 ∣ x i ) + log ⁡ ( 1 − P ( y i = 1 ∣ x i ) ) ] = ∑ i = 1 N [ y i ( w ⋅ x i + b ) − log ⁡ ( 1 + exp ⁡ ( w ⋅ x + b ) ) ] \begin{aligned} L(w,b) &= \sum_{i=1}^N[y_i\log P(y_i=1|x_i) + (1-y_i) \log (1- P(y_i=1|x_i))]\\ & = \sum_{i=1}^N[y_i\log \frac{P(y_i=1|x_i)}{1-P(y_i=1|x_i)} + \log (1-P(y_i=1|x_i)) ]\\ & = \sum_{i=1}^N[y_i(w\cdot x_i + b) - \log (1 + \exp(w\cdot x + b))] \end{aligned} L(w,b)=i=1N[yilogP(yi=1xi)+(1yi)log(1P(yi=1xi))]=i=1N[yilog1P(yi=1xi)P(yi=1xi)+log(1P(yi=1xi))]=i=1N[yi(wxi+b)log(1+exp(wx+b))]

然后对 L ( w , b ) L(w,b) L(w,b)取极大值,即可得到 w w w的估计值,通常情况下,我们将其转化为求解极小值问题.

L ( w , b ) = − ∑ i = 1 N [ y i ( w ⋅ x i + b ) − log ⁡ ( 1 + exp ⁡ ( w ⋅ x + b ) ) ] L(w,b) = -\sum_{i=1}^N[y_i(w\cdot x_i + b) - \log (1 + \exp(w\cdot x + b))] L(w,b)=i=1N[yi(wxi+b)log(1+exp(wx+b))]
我们通常采用的方法是梯度下降法以及牛顿法.

我们用 θ \theta θ来替代参数.

梯度下降法的参数更新为:
θ ← θ − α ∂ L ( θ ) ∂ θ \theta \gets \theta - \alpha \frac{\partial L(\theta)}{\partial \theta} θθαθL(θ)
牛顿法的迭代形式为:
θ t + 1 = θ t − ( ∂ 2 L ( θ ) ∂ θ 2 ) − 1 ∂ L ( θ ) ∂ θ \theta^{t+1} = \theta^{t} - (\frac{\partial^2L(\theta)}{\partial \theta^2})^{-1} \frac{\partial L(\theta)}{\partial \theta} θt+1=θt(θ22L(θ))1θL(θ)

采用向量形式表示则为:
θ t + 1 = θ t − ( ∂ 2 L ( θ ) ∂ θ ∂ θ T ) − 1 ∂ L ( θ ) ∂ θ \theta^{t+1} = \theta^t - (\frac{\partial^2L(\theta)}{\partial \theta\partial\theta^T})^{-1}\frac{\partial L(\theta)}{\partial \theta} θt+1=θt(θθT2L(θ))1θL(θ)

下面我们来推导关于 θ \theta θ的一阶和二阶导数:

对于代价函数采取如下形式考虑,
L ( θ ) = − [ y log ⁡ y ^ + ( 1 − y ) log ⁡ ( 1 − y ^ ) ] L(\theta) = - [y\log \hat{y} + (1-y)\log(1-\hat{y})] L(θ)=[ylogy^+(1y)log(1y^)]
其中, y ^ = 1 1 + exp ⁡ ( − z ) , z = θ T x \hat{y} = \frac{1}{1 + \exp(-z)},z = \theta^Tx y^=1+exp(z)1z=θTx.

根据链式求导法则
∂ L ∂ θ = ∂ L ∂ z ∂ z ∂ θ \frac{\partial L}{\partial \theta} = \frac{\partial L}{\partial z}\frac{\partial z}{\partial \theta} θL=zLθz
∂ L ∂ z = ∂ L ∂ y ^ ∂ y ^ ∂ z \frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}}\frac{\partial \hat{y}}{\partial z} zL=y^Lzy^
我们有
∂ L ∂ y ^ = y ^ − y y ^ ( 1 − y ^ ) \frac{\partial L}{\partial \hat{y}} = \frac{\hat{y} - y}{\hat{y}(1-\hat{y})} y^L=y^(1y^)y^y
∂ y ^ ∂ z = ( 1 1 + exp ⁡ ( − z ) ) ′ = 1 1 + exp ⁡ ( − z ) − 1 ( 1 + exp ⁡ ( − z ) ) 2 = y ^ ( 1 − y ^ ) \begin{aligned} \frac{\partial \hat{y}}{\partial z} &= \Big(\frac{1}{1 + \exp(-z)}\Big)'\\ & = \frac{1}{1+\exp(-z)} - \frac{1}{\big(1 + \exp(-z)\big)^2}\\ & = \hat{y}(1-\hat{y}) \end{aligned} zy^=(1+exp(z)1)=1+exp(z)1(1+exp(z))21=y^(1y^)
因此,我们得到了
∂ L ∂ z = ∂ L ∂ y ^ ∂ y ^ ∂ z = y ^ − y \frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}}\frac{\partial \hat{y}}{\partial z} = \hat{y} - y zL=y^Lzy^=y^y
那么
∂ L ∂ θ = ∂ L ∂ z ∂ z ∂ θ = ( y ^ − y ) ∂ z ∂ θ = ( y ^ − y ) x \begin{aligned} \frac{\partial L}{\partial \theta} &= \frac{\partial L}{\partial z}\frac{\partial z}{\partial \theta}\\ & = (\hat{y} - y)\frac{\partial z}{\partial \theta}\\ & = (\hat{y} - y)x \end{aligned} θL=zLθz=(y^y)θz=(y^y)x
这里均由增广表示,因此我们得到了迭代公式
θ ^ = θ − α ∂ L ∂ θ = θ − α ( y ^ − y ) x \begin{aligned} \hat{\theta} &= \theta - \alpha\frac{\partial L}{\partial \theta} \\ & = \theta - \alpha(\hat{y}- y)x \end{aligned} θ^=θαθL=θα(y^y)x

二阶导数同理我们可以得到
∂ 2 L ( θ ) ∂ θ ∂ θ T = x x T y ^ ( 1 − y ^ ) \frac{\partial^2L(\theta)}{\partial \theta\partial\theta^T} = xx^T\hat{y}(1-\hat{y}) θθT2L(θ)=xxTy^(1y^)

参考

知乎-对数几率回归
李航-统计机器学习
周志华-机器学习

你可能感兴趣的:(机器学习)