[Machinie Learning] 吴恩达机器学习课程笔记——Week1

Machine Learning by Andrew Ng

吴恩达机器学习课程学习笔记——Week 1

学习提纲

  • Introduction
  • Model and Cost Function
  • Parameter Learning
  • Linear Algebra Review

Introduction

Machine Learning Definition 机器学习定义

ML Definition
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine Learning Algorithm 机器学习的分类

  • Supervised Learning
  • Unsupervised Learning
  • other: RL, recommender systems

Supervised Learning 有监督学习

teach the machine to learn, given right answers
给定标注的答案,让机器学习到经验。

有监督学习分类:

  • Regression
    回归 predict a continuous value
  • Classification
    分类 predict a discrete value

Regression 回归
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第1张图片

Classification 分类
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第2张图片
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第3张图片

Unsupervised Learning 无监督学习

ask the machine to find the structure of an unlabeled data set
automatically find the structure of the data 让机器自动学习到数据的结构



The cocktail party 鸡尾酒派对问题

an unsupervised learning can separate different sources of voices
让机器把派对上多个叠加的声音区分开

[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第4张图片

Model and Cost Function

( x i , y i ) (x^i , y ^i) (xi,yi) denotes a training example 训练样本的数学表示方式



Model Representation 模型表示
h means hypothesis
Univariate linear regression 单变量线性回归

[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第5张图片

Cost Function

Idea
Chose θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1 s.t. h θ ( x ) h_\theta (x) hθ(x) is close to y

define the cost function as
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第6张图片
Squared error cost function 平方差误差

our goal is
m i n i m i z e θ 1 , θ 2 1 2 m Σ i = 1 m ( h θ ( x i ) − y i ) 2 minimize_{\theta_1, \theta_2} \quad \frac{1}{2m} \Sigma_{i=1}^m (h_\theta (x^i) -y^i)^2 minimizeθ1,θ22m1Σi=1m(hθ(xi)yi)2

Our goal is to minimize the cost function and find the global minimum

A contour plot/figure to visualize the cost function
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第7张图片

Parameter Learning

Gradient descent 梯度下降法

a general algorithm to minimize the function

Intuition

  • Learning rate
  • Simultaneously Update All Parameters
    一个容易出错的点,注意是同时去更新所有的参数
    [Machinie Learning] 吴恩达机器学习课程笔记——Week1_第8张图片

derivative 导数
the (partial) derivative term
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第9张图片

the slope of a line 斜率

Learning rate 学习率的重要性

  • A small lr can lead to a slow converge
  • A large lr can lead to failure of converge or even diverge
    [Machinie Learning] 吴恩达机器学习课程笔记——Week1_第10张图片

If initialized at a local optima
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第11张图片

固定的学习率就可以让模型收敛

Gradient descent can converge to a local minima even with a fixed learning rate. Because the derivate term is becoming smaller when approaching the local minima

[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第12张图片

Gradient Descent for Linear Regression

[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第13张图片

convex function: a bow shaped function 凸函数

A convex function always converge to global minimum when using gradient descent with an appropriate learning rate (there is not local minima)

following the trajectory, it reaches the global minimum
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第14张图片

Above is called Batch Gradient Descent

Each step of gradient descent we use all training examples

Linear Algebra Review 线性代数

Matrix
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第15张图片

Matrix Elements (entries of matrix)
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第16张图片

Vector: an n by 1 matrix 向量是n行1列的矩阵
1-indexed vs 0-indexed 两种写法
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第17张图片

Capital case for matrices A B C
Lower case for vectors a b c

Addition and Scalar Multiplication
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第18张图片
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第19张图片

Matrix Vector Multiplication
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第20张图片
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第21张图片

Matrix Matrix Multiplication
[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第22张图片

[Machinie Learning] 吴恩达机器学习课程笔记——Week1_第23张图片

Matrix Multiplication Properties

not commutative 不可交换

A ∗ B ≠ B ∗ A A*B\neq B*A AB=BA

associative 可结合

( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A*B)*C= A * (B*C) (AB)C=A(BC)

Inverse And Transpose

The inverse of A is denoted as A − 1 A^{-1} A1

A non-square matrix does not have an inverse matrix.

For a square matrix that does not have an inverse, it is called singular or degenerate 不可逆矩阵

The transposition of A is denoted as A T A^T AT

你可能感兴趣的:(Machine,Learning,人工智能,python,machine,learning)