主要内容:
1.模型数学表达式
2.模型目标函数
3.求模型参数
极大似然估计(MLE)
贝叶斯最大后验估计(MAP)
h_{\theta}(x) = \sum_{i=0}^{x}{\theta_{i}x_{i}} = \theta^{T}x
2.目标函数:(或者叫“损失函数”,就是度量预测值和真实值的差距)
J(\theta) = \frac{1}{2}\sum_{i=1}^{m}{(h_{\theta}(x_{i}) - y_{i})^{2}}
a.解释1:直观的理解,就是计算“预测值”和“真实值”的距离,但是因为有正负,所以一般就取差的平方。
y_{i} = \theta^{T}x_{i} + \varepsilon_{i},
假设
\varepsilon_{i} \sim N(0, \sigma^{2})
(均值为0的高斯分布)
y_{i} = \theta^{T}x_{i} + \varepsilon_{i},
\varepsilon_{i} = y_{i} - \theta^{T}x_{i}
,
p(\varepsilon_{i}) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(\varepsilon_{i})^{2}}{2\sigma^{2}})
,
p(y_{i}|x_{i};\theta) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y_{i}-\theta^{T}x_{i})^{2}}{2\sigma^{2}})
,
L(\theta) = \prod_{i=1}^{m}p(y_{i}|x_{i};\theta)
= \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y_{i}-\theta^{T}x_{i})^{2}}{2\sigma^{2}})
= log\prod_{i=1}^{m}\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y_{i}-\theta^{T}x_{i})^{2}}{2\sigma^{2}})
= \sum_{i=1}^{m}\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y_{i}-\theta^{T}x_{i})^{2}}{2\sigma^{2}})
= mlog\frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{\sigma^{2}}\cdot\frac{1}{2}\sum_{i=1}^{m} (y_{i}-\theta^{T}x_{i})^{2}
因为
mlog\frac{1}{\sqrt{2\pi}\sigma}
是常数,舍去
所以
J(\theta) = \frac{1}{2}\sum_{i=1}^{m} (\theta^{T}x_{i}-y_{i})^{2}
=\frac{1}{2}(X\theta - y)^{T}(X\theta -y)
= \frac{1}{2}(\theta^{T}X^{T} - y^{T})(X\theta - y)
= \frac{1}{2}(\theta^{T}X^{T}X\theta - \theta^{T}X^{T}y - y^{T}X\theta + y^{T}y)
\frac{dJ(\theta)}{d\theta} = \frac{1}{2}(2X^{T}X\theta - X^{T}y -(y^{T}X)^{T})
X^{T}X\theta - X^{T}y = 0
\theta = (X^{T}X )^{-1}X^{T}y
\theta = (X^{T}X + \lambda I)^{-1}X^{T}y
-给定
X,Y,
推导
\theta
的后验概率:
p(\theta|Y,X) = \frac{p(\theta, Y|X)}{p(Y|X)}
-从后验概率里找一个概率最大的
\theta
出来
后验概率
p(\theta|Y,X) = \frac{p(\theta, Y|X)}{p(Y|X)}
,
先验概率
p(\theta|X) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{\theta^{2}}{2\gamma^{2}}),
假设
\theta|X \sim N(0, \gamma^{2})
p(\theta|Y,X) = \frac{p(\theta, Y|X)}{p(Y|X)} = \frac{p(Y|\theta,X)p(\theta|X)}{\int p(Y|\theta,X)p(\theta|X)d\theta}
p(Y|X)
和
\theta
无关,所以最大后验概率等价于:
p(Y|\theta,X)p(\theta|X)
的最大
令
L(\theta) = -p(Y|\theta,X)p(\theta|X)
,所以只需要求该函数的最小值:
l(\theta) = -logL(\theta)
= - log(p(Y|\theta,X)p(\theta|X))
=-\sum_{i=1}^{m}{log(p(y_{i}|\theta,x_{i}}) - log(p(\theta|X))
= -\sum_{i=1}^{m}{log\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y_{i}-\theta^{T}x_{i})^{2}}{2\sigma^{2}}}) - log\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{\theta^{2}}{2\gamma^{2}})
= \frac{1}{\sigma^{2}} \cdot \frac{1}{2}\sum_{i=1}^{m}{(y_{i}-\theta^{T}x_{i})^{2}} + \frac{1}{\gamma^{2}}\theta^{T}\theta +C
J(\theta) = \frac{1}{2}\sum_{i=1}^{m}{(y_{i}-\theta^{T}x_{i})^{2}} + \lambda\theta^{2}\theta, (\lambda = \frac{\sigma^{2}}{\gamma^{2}})
\theta = (X^{T}X + \lambda I)^{-1}X^{T}y