吴恩达机器学习课程学习笔记——Week 2
本人学习笔记汇总 合订本
✓ 课程网址 standford machine learning
参考资源
- 课程笔记
- python版作业
学习提纲
下载Matlab或者Otave,参考链接。
如果使用python完成作业,则可以跳过本节内容。
x j i x_j^i xji denotes the j t h jth jth feature of x i x^i xi, which is the i t h ith ith training example.
上标代表样本序号,下标代表特征序号。
The hypothesis also needs to be updated:
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n hθ(x)=θ0+θ1x1+θ2x2+⋯+θnxn
for convenience of notation, define x 0 = 1 x_0 = 1 x0=1.
为了方便额外添加一个维度, x 0 = 1 x_0 = 1 x0=1:
x = [ x 0 , x 1 , ⋯ , x n ] T ∈ R n + 1 x = [x_0, x_1, \cdots, x_n ]^T \in R^{n+1} x=[x0,x1,⋯,xn]T∈Rn+1
θ = [ θ 0 , θ 1 , ⋯ , θ n ] T ∈ R n + 1 \theta = [\theta_0, \theta_1, \cdots, \theta_n ]^T \in R^{n+1} θ=[θ0,θ1,⋯,θn]T∈Rn+1
then we have 从而得到:
h θ ( x ) = θ T x h_\theta(x) = \theta^T x hθ(x)=θTx
⬆️ is called Multivariate Linear Regression 上面的公式称为多变量线性回归。
Gradient Descent有许多trick,第一个是特征伸缩。
Try to scale features into approximately the same order 把所有特征的数值伸缩到差不多大小的范围内。
Mean Normalization 均值归一化
是 feature scaling 的一种常见手段
更新公式
x 1 ← x 1 − μ 1 s 1 x_1 \leftarrow \frac{x_1 -\mu_1}{s_1} x1←s1x1−μ1
Gradient Descent有许多trick,第二个是调整学习率。
J ( θ ) J(\theta) J(θ) should decrease after every iteration if gradient descent is working correctly.
如果梯度下降算法正确,损失函数应该每次迭代后都会下降。
Mathematically, for a sufficiently small α \alpha α , J ( θ ) J(\theta) J(θ) should decrease on every iteration
从数学上可以证明,如果选择充分小的 α \alpha α, 损失函数是每次迭代后都会下降的。
Normally, people decide whether to stop by looking the plot instead of a threshold
通常,决定训练是否结束是通过看图(*Ang本人更喜欢)或者设置一个阈值来实现的。
Summary 总结
quadratic, cubic function 二次,三次函数
importance to scale features
解析法求解 θ \theta θ
concretely 具体来说
take the partial derivative of J with respect to every parameter of θ \theta θ in J
the optimal solution is
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)−1XTy
A comparison between gradient descent and normal equation
m training examples and n features
What if X T X X^TX XTX is not invertible? (i.e. singular or degenerate)
如果 X T X X^TX XTX是不可逆矩阵,该如何计算解析解?
In octave, pinv gives the right θ \theta θ even X T X X^T X XTX is not invertible
1.如果使用Octave,则可以用pinv求解,无需考虑 X T X X^T X XTX i不可逆的情况。
2.如果需要计算解析解,考虑下面两种情况
Problems that X T X X^T X XTX is sometimes non-invertible:
Solution 解决方案