吴恩达机器学习课程学习笔记——Week 1
学习提纲
Machine Learning Definition 机器学习定义
ML Definition
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Machine Learning Algorithm 机器学习的分类
Supervised Learning 有监督学习
teach the machine to learn, given right answers
给定标注的答案,让机器学习到经验。
有监督学习分类:
Unsupervised Learning 无监督学习
ask the machine to find the structure of an unlabeled data set
automatically find the structure of the data 让机器自动学习到数据的结构
The cocktail party 鸡尾酒派对问题
an unsupervised learning can separate different sources of voices
让机器把派对上多个叠加的声音区分开
( x i , y i ) (x^i , y ^i) (xi,yi) denotes a training example 训练样本的数学表示方式
Model Representation 模型表示
h means hypothesis
Univariate linear regression 单变量线性回归
Cost Function
Idea
Chose θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1 s.t. h θ ( x ) h_\theta (x) hθ(x) is close to y
define the cost function as
Squared error cost function 平方差误差
our goal is
m i n i m i z e θ 1 , θ 2 1 2 m Σ i = 1 m ( h θ ( x i ) − y i ) 2 minimize_{\theta_1, \theta_2} \quad \frac{1}{2m} \Sigma_{i=1}^m (h_\theta (x^i) -y^i)^2 minimizeθ1,θ22m1Σi=1m(hθ(xi)−yi)2
Our goal is to minimize the cost function and find the global minimum
A contour plot/figure to visualize the cost function
Gradient descent 梯度下降法
a general algorithm to minimize the function
Intuition
derivative 导数
the (partial) derivative term
the slope of a line 斜率
Learning rate 学习率的重要性
If initialized at a local optima
固定的学习率就可以让模型收敛
Gradient descent can converge to a local minima even with a fixed learning rate. Because the derivate term is becoming smaller when approaching the local minima
Gradient Descent for Linear Regression
convex function: a bow shaped function 凸函数
A convex function always converge to global minimum when using gradient descent with an appropriate learning rate (there is not local minima)
following the trajectory, it reaches the global minimum
Above is called Batch Gradient Descent
Each step of gradient descent we use all training examples
Matrix Elements (entries of matrix)
Vector: an n by 1 matrix 向量是n行1列的矩阵
1-indexed vs 0-indexed 两种写法
Capital case for matrices A B C
Lower case for vectors a b c
Addition and Scalar Multiplication
Matrix Multiplication Properties
not commutative 不可交换
A ∗ B ≠ B ∗ A A*B\neq B*A A∗B=B∗A
associative 可结合
( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A*B)*C= A * (B*C) (A∗B)∗C=A∗(B∗C)
Inverse And Transpose
The inverse of A is denoted as A − 1 A^{-1} A−1
A non-square matrix does not have an inverse matrix.
For a square matrix that does not have an inverse, it is called singular or degenerate 不可逆矩阵
The transposition of A is denoted as A T A^T AT