手推 逻辑回归(logistic regression)

logistic 回归

#TOC

一、构造hypothesis假设函数

Logistic Regression 可以看做是一个 线性回归(Linear Regression) 经过一个sigmod激活函数 的结果。
线性回归方程:$ \theta_0 + \theta_1x_1+ \theta_2x_2+…+ \theta_n*n_n = \theta^T * x$
sigmoid函数: g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1
所以hypothesis 函数 h θ ( x ) = 1 1 + e − θ T x h_\theta (x) = \frac {1}{1+e^{-\theta^Tx}} hθ(x)=1+eθTx1
h θ ( x ) h_\theta (x) hθ(x)表示为样本预测正例的概率
即:
KaTeX parse error: No such environment: equation at position 9: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ \left\{ \begin…

CSDN无法解析的公式

可以将公式(1)合并成:
(2) P ( y ∣ x ; θ ) = h θ ( x ( i ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) P(y|x;\theta) = h_\theta(x^{(i)})^{y^{(i)}} (1-h_\theta(x^{(i)}))^{1- y^{(i)}} \tag{2} P(yx;θ)=hθ(x(i))y(i)(1hθ(x(i)))1y(i)(2)

二、构造损失函数

下面介绍两种不同的构造假设函数的方法:
第一种是来源于NG的机器学习课程;
第二种是以概率的方式通过极大似然来构造代价函数

直接构造损失函数

构造代价函数:
KaTeX parse error: No such environment: equation at position 23: …heta) = \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ \left\{ \begin…
[外链图片转存失败(img-NybE3yxJ-1565846168527)(media/15655785068654/15657637829206.jpg)]
如图一:
当y = 1,
若假设函数预测结果为1。 则代价函数为0;
当假设函数预测结果越接近0时,其代价就越大。
当 y = 0时同理
将两式化归一起:
C o s t ( θ ) = − 1 m ∑ i = 1 m y ( i ) l o g h θ ( x ) + ( 1 − y ( i ) ) l o g 1 − h θ ( x ) Cost(\theta) = -\frac1m\sum_{i=1}^m y^{(i)}log^{h_\theta(x)} + (1-y^{(i)})log^{1 - h_\theta(x)} Cost(θ)=m1i=1my(i)loghθ(x)+(1y(i))log1hθ(x)

使用概率知识,通过极大似然构造损失函数

优化的目标是是的 P ( y ∣ x ; θ ) P(y|x;\theta) P(yx;θ)的预测值最接近观测值。即使得:
∏ i = 1 m P ( y i ∣ x ; θ ) \prod_{i = 1}^m P(y_i|x;\theta) i=1mP(yix;θ)的值取得最大。
构造似然函数:
L ( θ ) = ∏ i = 1 m P ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m ( h θ ( x ) ) y ( i ) ( 1 − h θ ( x ) ) 1 − y ( i ) L( \theta ) = \prod_{i = 1}^mP(y^{(i)}|x^{(i)};\theta) = \prod_{i = 1}^m(h_\theta(x))^{y^{(i)}}(1 - h_\theta(x) )^{1-y^{(i)}} L(θ)=i=1mP(y(i)x(i);θ)=i=1m(hθ(x))y(i)(1hθ(x))1y(i)
取对数:
l ( θ ) = ∑ i = 1 m l o g ( ( h θ ( x ) ) y ( i ) ( 1 − h θ y ( i ) ( x ) ) 1 − y ( i ) ) = ∑ i = 1 m l o g ( h θ ( x ) ) y ( i ) + l o g ( 1 − h θ ( x ) ) 1 − y ( i ) = ∑ i = 1 m y ( i ) l o g h θ ( x ) + ( 1 − y ( i ) ) l o g 1 − h θ ( x ) l(\theta ) = \sum_{i = 1}^m log((h_\theta(x))^{y^{(i)}}(1 - h_\theta^{y^{(i)}}(x) )^{1-y^{(i)}}) \\ =\sum_{i = 1}^m log^{(h_\theta(x))^{y^{(i)}}} + log^{(1 - h_\theta(x) )^{1-y^{(i)}}} \\= \sum_{i=1}^m y^{(i)}log^{h_\theta(x)} + (1-y^{(i)})log^{1 - h_\theta(x)} l(θ)=i=1mlog((hθ(x))y(i)(1hθy(i)(x))1y(i))=i=1mlog(hθ(x))y(i)+log(1hθ(x))1y(i)=i=1my(i)loghθ(x)+(1y(i))log1hθ(x)
将求对数似然函数的极大值转变成求解代价函数的极小值:
令: J ( θ ) = − 1 m l ( θ ) J(\theta) = - \frac1ml(\theta) J(θ)=m1l(θ)
损失函数最终形式:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) l o g h θ ( x ) + ( 1 − y ( i ) ) l o g 1 − h θ ( x ) J(\theta) = -\frac1m\sum_{i=1}^m y^{(i)}log^{h_\theta(x)} + (1-y^{(i)})log^{1 - h_\theta(x)} J(θ)=m1i=1my(i)loghθ(x)+(1y(i))log1hθ(x)

三、损失函数优化(梯度下降)

sigmod 函数求导性质:
g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) g'(z) = g(z)(1-g(z)) g(z)=g(z)(1g(z))

Δ θ = ∂ J ( θ ) ∂ θ = − 1 m ∑ i = 1 m y h θ ( x ) h θ ( x ) ( 1 − h θ ( x ) ) x + ( 1 − y i ) 1 − h θ ( x ) ∗ − 1 ∗ h θ ( x ) ( 1 − h θ ( x ) ) x = − 1 m ∑ i = 1 m y ( 1 − h θ ( x ) ) x − ( 1 − y ) h θ ( x ) x = − 1 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x ( i ) \Delta\theta = \frac{\partial J(\theta)}{\partial \theta} \\ = -\frac1m \sum_{i=1}^{m}\frac{y}{h_\theta(x)}h_\theta(x)(1-h_\theta(x))x+ \frac{(1-y_i)}{1-h_\theta(x)}*-1*h_\theta(x)(1-h_\theta(x))x \\ = -\frac1m \sum_{i=1}^{m}{y}(1-h_\theta(x))x - {(1-y)}h_\theta(x) x\\ = -\frac1m \sum_{i=1}^{m}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)} Δθ=θJ(θ)=m1i=1mhθ(x)yhθ(x)(1hθ(x))x+1hθ(x)(1yi)1hθ(x)(1hθ(x))x=m1i=1my(1hθ(x))x(1y)hθ(x)x=m1i=1m(y(i)hθ(x(i)))x(i)

参数更新:
θ j = θ j − α Δ θ j = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) \theta_j = \theta_j - \alpha\Delta\theta_j = \theta_j - \alpha\frac1m \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)} θj=θjαΔθj=θjαm1i=1m(hθ(x(i))y(i))x(i)

四、正则

L1 正则:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) l o g h θ ( x ) + ( 1 − y ( i ) ) l o g 1 − h θ ( x ) + λ ∣ ∣ θ ∣ ∣ J(\theta) = -\frac1m\sum_{i=1}^m y^{(i)}log^{h_\theta(x)} + (1-y^{(i)})log^{1 - h_\theta(x)} + \lambda||\theta|| J(θ)=m1i=1my(i)loghθ(x)+(1y(i))log1hθ(x)+λθ
L2 正则:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) l o g h θ ( x ) + ( 1 − y ( i ) ) l o g 1 − h θ ( x ) + λ ∣ ∣ ∣ θ ∣ ∣ 2 J(\theta) = -\frac1m\sum_{i=1}^m y^{(i)}log^{h_\theta(x)} + (1-y^{(i)})log^{1 - h_\theta(x)} + \lambda|||\theta ||^2 J(θ)=m1i=1my(i)loghθ(x)+(1y(i))log1hθ(x)+λθ2

五、logistic Regression 为什么要使用 sigmod 函数

指数分布族概念:可以表示为指数形式的概率分布。
f X ( x ∣ θ ) = h ( x ) e x p ( η ( θ ) ∗ T ( x ) − A ( θ ) ) f_X(x|\theta) = h(x)exp(\eta(\theta)*T(x) - A(\theta)) fX(xθ)=h(x)exp(η(θ)T(x)A(θ))

将伯努利分布化成指数分布族的形式:
KaTeX parse error: No such environment: equation at position 9: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ \begin{split} …

手推 逻辑回归(logistic regression)_第1张图片

因此可以得出:
η = l o g ϕ 1 − ϕ \eta = log{\frac{\phi}{1-\phi}} η=log1ϕϕ
可以化简出:
ϕ = 1 1 + e − η \phi = \frac1{1+e^{-\eta}} ϕ=1+eη1

logistic 模型的前置概率估计是伯努利分布

为什么不用均方差值作为损失函数吗?

因为使用均方差作为损失函数是非凸函数。

你可能感兴趣的:(手推LR,LR为什么要用sigmod,为什么不用均方差作为代价函数,机器学习)