深度学习-线性回归推导

线性表示

f ( x ) = θ 0 + θ 1 x 1 + ⋯ + θ n x n \Large f(x) = \theta_0 + \theta_1x_1 + \cdots+\theta_nx_n f(x)=θ0+θ1x1++θnxn

记作
f ( x ) = [ θ 0 θ 1 ⋯ θ n ] × [ 1 x 1   ⋮ x n ] = θ T x \Large f(x) = \left[ \begin{matrix} \theta_0 &\theta_1&\cdots &\theta_n\end{matrix}\right] \times \left[ \begin{matrix}1\\x_1 \ \\ \vdots\\x_n\end{matrix}\right] = \mathbf{\theta}^T\mathbf{x} f(x)=[θ0θ1θn]×1x1 xn=θTx
考虑误差
f ( x ) = θ T x + ϵ \Large f(x) = \mathbf{\theta}^T\mathbf{x} + \Large{\epsilon}^{} f(x)=θTx+ϵ
对于每一组数据
y i = θ T x i + ϵ i \Large y_i = \theta^Tx_i + \Large\epsilon_i yi=θTxi+ϵi
其中误差
ϵ i = y i − θ T x 1 \Large \epsilon_i = y_i - \theta^Tx_1 ϵi=yiθTx1

高斯曲线

f ( x ) = 1 2 π σ e − ( x − μ ) 2 2 σ 2 \Large f(x) = \frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{(x - \mu)^2}{2 \sigma ^2}}} f(x)=2π σ1e2σ2(xμ)2

  • μ \mu μ:期望
  • σ \sigma σ:标准差
  • σ 2 \sigma^2 σ2:方差

一般认为误差 ϵ \Large \epsilon ϵ服从均值为零 μ = 0 \mu = 0 μ=0的高斯分布
f ( x ) = 1 2 π σ e − ϵ 2 2 σ 2 \Large f(x) = \frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{\Large{\epsilon}^2}{2 \sigma ^2}}} f(x)=2π σ1e2σ2ϵ2
( 5 ) (5) (5)带入
f ( x ) = 1 2 π σ e − ( y i − θ T x i ) 2 2 σ 2 \Large f(x) = \frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{\Large{(y_i - \theta^Tx_i)}^2}{2 \sigma ^2}}} f(x)=2π σ1e2σ2(yiθTxi)2
就统计而言,这个描述了数值的分布,对后续预测进行指导,那就是概率 P P P了。

同时,强调 θ \theta θ的作用关系,也就是条件概率,我们这样进行表示
P ( y i ∣ x i ; θ ) = 1 2 π σ e − ( y i − θ T x i ) 2 2 σ 2 \Large P(y_i|x_i;\theta) =\frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{\Large{(y_i - \theta^Tx_i)}^2}{2 \sigma ^2}}} P(yixi;θ)=2π σ1e2σ2(yiθTxi)2

似然函数

L ( θ ) = ∏ i = 1 n p ( y i ∣ x i ; θ ) = ∏ i = 1 n 1 2 π σ e − ( y i − θ T x i ) 2 2 σ 2 \Large L(\theta) = \overset{n}{\prod _{i =1}}p(y_i|x_i;\theta) = \overset{n}{\prod _{i =1}}\frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{\Large{(y_i - \theta^Tx_i)}^2}{2 \sigma ^2}}} L(θ)=i=1np(yixi;θ)=i=1n2π σ1e2σ2(yiθTxi)2

为了简化计算,我们进行对数操作
log ⁡ ( L ( θ ) ) = ∑ i = 1 m 1 2 π σ e − ( y i − θ T x i ) 2 2 σ 2 = m log ⁡ 1 2 π σ − 1 σ 2 ⋅ 1 2 ∑ i = 1 m ( y i − θ T x i ) 2 \Large \log(L(\theta)) = \overset{m}{\sum _{i=1}}\frac{1}{\sqrt{2\pi}\sigma}\Large{e^{-\frac{\Large{(y_i - \theta^Tx_i)}^2}{2 \sigma ^2}}} \\ \quad \\ \quad \\ \Large = m\log\frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{{\sigma}^2} \cdot \frac{1}{2}\overset{m}{\sum _{i=1}}\left( y_i - \theta^Tx_i \right)^2 log(L(θ))=i=1m2π σ1e2σ2(yiθTxi)2=mlog2π σ1σ2121i=1m(yiθTxi)2
为了求得最大概率,也就是求解这个式子的极值,抛开固定的值,不定项为
J ( θ ) = 1 2 ∑ i = 1 m ( f θ ( x i ) − y i ) 2 = 1 2 ( X θ − y ) T ( X θ − y ) \Large J(\theta) = \frac{1}{2} \overset{m}{\sum_{i=1}}\left(f_{\theta(x_i)} - y_i\right)^2 \\ \quad \\ \quad \\ \Large = \frac{1}{2}(X\theta - y)^T(X\theta - y) J(θ)=21i=1m(fθ(xi)yi)2=21(Xθy)T(Xθy)
求导推算
▽ θ J ( θ ) = ▽ θ ( 1 2 ( X θ − y ) T ( X θ − y ) ) = ▽ θ ( 1 2 ( θ T X T − y ) ( X θ − y ) ) = ▽ θ ( 1 2 ( θ T X T X θ − θ T X T y − y T X θ + y T y ) ) = 1 2 ( X T X θ − X T y − ( y T X ) T ) = X T X θ − X T y \Large \bigtriangledown_\theta J(\theta) = \bigtriangledown_\theta (\frac{1}{2}(X\theta - y)^T(X\theta - y)) \quad \\ \quad \\ \Large = \bigtriangledown_\theta(\frac{1}{2}(\theta^TX^T - y)(X\theta - y)) \quad \\ \quad \\ \Large = \bigtriangledown_\theta(\frac12 (\theta^TX^TX\theta - \theta^TX^Ty - y^TX\theta + y^Ty)) \quad \\ \quad \\ \Large = \frac12(X^TX\theta - X^Ty - (y^TX)^T) \\ \quad \\ \quad \\ \Large = X^TX\theta - X^Ty θJ(θ)=θ(21(Xθy)T(Xθy))=θ(21(θTXTy)(Xθy))=θ(21(θTXTXθθTXTyyTXθ+yTy))=21(XTXθXTy(yTX)T)=XTXθXTy
令导数为零,求解 θ \Large \theta θ
X T X θ − X T y = 0 ⇒ X T X θ = X T y ⇒ θ = ( X T X ) − 1 X T y \Large X^TX\theta - X^Ty = 0 \\ \quad \\ \quad \\ \Large \Rightarrow \qquad X^TX\theta = X^Ty \\ \quad \\ \quad \\ \Large \Rightarrow \qquad \theta = (X^TX)^{-1}X^Ty XTXθXTy=0XTXθ=XTyθ=(XTX)1XTy

你可能感兴趣的:(深度学习)