[Machinie Learning] 吴恩达机器学习课程笔记——Week2

Machine Learning by Andrew Ng

吴恩达机器学习课程学习笔记——Week 2
本人学习笔记汇总 合订本
✓ 课程网址 standford machine learning
参考资源

  • 课程笔记
  • python版作业

学习提纲

  • Environment Setup
  • Multivariate Linear Regression
  • Computing Parameters Analytically
  • Submitting Programming Assignments
  • Octave/Matlab Tutorial

Environment Setup

下载Matlab或者Otave,参考链接。
如果使用python完成作业,则可以跳过本节内容。

Multivariate Linear Regression 多变量线性回归

1.Multiple Features 多特征

x j i x_j^i xji denotes the j t h jth jth feature of x i x^i xi, which is the i t h ith ith training example.
上标代表样本序号,下标代表特征序号。

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第1张图片

The hypothesis also needs to be updated:

h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n hθ(x)=θ0+θ1x1+θ2x2++θnxn

for convenience of notation, define x 0 = 1 x_0 = 1 x0=1.
为了方便额外添加一个维度, x 0 = 1 x_0 = 1 x0=1

x = [ x 0 , x 1 , ⋯   , x n ] T ∈ R n + 1 x = [x_0, x_1, \cdots, x_n ]^T \in R^{n+1} x=[x0,x1,,xn]TRn+1

θ = [ θ 0 , θ 1 , ⋯   , θ n ] T ∈ R n + 1 \theta = [\theta_0, \theta_1, \cdots, \theta_n ]^T \in R^{n+1} θ=[θ0,θ1,,θn]TRn+1

then we have 从而得到:

h θ ( x ) = θ T x h_\theta(x) = \theta^T x hθ(x)=θTx

⬆️ is called Multivariate Linear Regression 上面的公式称为多变量线性回归。

2.Gradient Descent for Multiple Variables

Gradient Descent更新的写法:
[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第2张图片

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第3张图片

3.Gradient Descent in Practice 1 - Feature Scaling

Gradient Descent有许多trick,第一个是特征伸缩。

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第4张图片

Try to scale features into approximately the same order 把所有特征的数值伸缩到差不多大小的范围内。

  • make gradient descent run faster and converge 目的是为了梯度下降更快,更容易收敛。

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第5张图片

Mean Normalization 均值归一化

是 feature scaling 的一种常见手段

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第6张图片

更新公式

x 1 ← x 1 − μ 1 s 1 x_1 \leftarrow \frac{x_1 -\mu_1}{s_1} x1s1x1μ1

4.Gradient Descent in Practice 2 - Learning Rate

Gradient Descent有许多trick,第二个是调整学习率。

J ( θ ) J(\theta) J(θ) should decrease after every iteration if gradient descent is working correctly.
如果梯度下降算法正确,损失函数应该每次迭代后都会下降。

Mathematically, for a sufficiently small α \alpha α , J ( θ ) J(\theta) J(θ) should decrease on every iteration
从数学上可以证明,如果选择充分小的 α \alpha α, 损失函数是每次迭代后都会下降的。

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第7张图片

Normally, people decide whether to stop by looking the plot instead of a threshold
通常,决定训练是否结束是通过看图(*Ang本人更喜欢)或者设置一个阈值来实现的。

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第8张图片

Summary 总结

  • too small, slow convergence 学习率过小导致收敛慢
  • too large, may no decrease on every iteration or may not converge 学习率过大,导致梯度不一定每次迭代后下降,或者导致最终无法收敛
  • try with [.001, 0.01, 0.1, 1], then [0.005 0.05 0.5] … 可以“尝试”出比较好的学习率
    [Machinie Learning] 吴恩达机器学习课程笔记——Week2_第9张图片

5.Features and Polynomial Regression 多项式回归

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第10张图片
quadratic, cubic function 二次,三次函数
importance to scale features

  • 多项式回归中,特征伸缩显得更加重要
    [Machinie Learning] 吴恩达机器学习课程笔记——Week2_第11张图片
    square root function 平方根函数
    [Machinie Learning] 吴恩达机器学习课程笔记——Week2_第12张图片

Computing Parameters Analytically 计算解析解

解析法求解 θ \theta θ

1.Normal Equation 正规方程

concretely 具体来说
take the partial derivative of J with respect to every parameter of θ \theta θ in J

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第13张图片

the optimal solution is

θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)1XTy

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第14张图片
A comparison between gradient descent and normal equation
m training examples and n features

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第15张图片

2.Normal Equation Non-invertibility 正规方程-不可逆

What if X T X X^TX XTX is not invertible? (i.e. singular or degenerate)
如果 X T X X^TX XTX是不可逆矩阵,该如何计算解析解?

In octave, pinv gives the right θ \theta θ even X T X X^T X XTX is not invertible
1.如果使用Octave,则可以用pinv求解,无需考虑 X T X X^T X XTX i不可逆的情况。
[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第16张图片

2.如果需要计算解析解,考虑下面两种情况

Problems that X T X X^T X XTX is sometimes non-invertible:

  • redundant features (e.g. x 1 x_1 x1 is correlated with x 2 x_2 x2 in a way that holds x 1 = 2.5 ∗ x 2 x_1 = 2.5 * x_2 x1=2.5x2)
    存在冗余特征
  • too many features ( e.g. m ≤ n m \le n mn)
    特征数n大于样本数m

Solution 解决方案

  • Delete some features or use regularization
    删除一些特征或者使用regularization(未完待续)

[Machinie Learning] 吴恩达机器学习课程笔记——Week2_第17张图片

Submitting Programming Assignments

略。

Octave/Matlab Tutorial

略。

你可能感兴趣的:(Machine,Learning,人工智能,深度学习,machine,learning)