coursera Machine learning Andrew NG 笔记(一)

看到不少推荐Andrew Ng的机器学习的课程,所以在coursera上注册了开始学。2016年1月15日

1. Introduction

1. machine learning definition

Arthur Samuel(1959): Machine learning is a field of study that gives computers the ability to learn without explicitly programmed.

Tom Mitchell(1998)(CMU) :Well-posed learning problem:A computer program is said to learn from experience E with respect to some task T, and some performance measure P, if its performance T, as measured by P, improves with exprience E.

2. Main two types of machine learning

(1)supervised learning(监督学习)

(2)unsupervised learning(非监督学习)

其他还有reinforcement learning(比如check playing),recommender system

3. Supervised learning

(1) refers to the fact that we gave the algorithm a data set in which “right answers” were given.

(2)主要分为regression(回归,主要output 为连续的)和 classification(分类,output是离散的)问题,regression 可以为线性,可以为非线性。

(3) SVM(support vector machine,支持向量机)

4. Unsupervised learning

(1) 没有标签(label),没有告诉right answer

(2) Clustering problem(聚类问题)
E.g Google news group automatically cluster news stories into groups about the same topic.

(3) Applications: Genes; organizing large computing clusters; social network analysis; market segmentation; astronomical data analysis

(4)Cocktail party problem
分离两个不同来源但叠加在一起的声音input1和input2
svd: single value decomposition

2. Linear regression with one variable

Cost Function

1. Model Representation

traning set
m = # of traning examples

(x, y) a single traning example
()
hθ x = θ0 + θ1 x1

2. Cost Function

  1. Fit the best line to training examples
  2. 线性回归 Hypothesis: hθ ( x )= θ0 + θ1 x1
  3. idea: 将x带入 hθ ( x )得到estimated y,
    因此minimize J( θ0 , θ1 ) = 12m i=1m(hθ(x(i))y(i))2

cost function= J( θ0 , θ1 ) = 12m i=1m(hθ(x(i))y(i))2

3. Gradient Descent

  1. gradient descent是解决线性回归问题的一种方法,不断重复以下步骤,使得J逐渐收敛至最小值
    θj:=θjαδδθjJ(θ0,θ1)
    每一步 θ0,θ1 都是同时update
  2. 需要注意的问题
    选择合适大小的 α
  3. 对于超过两个变量的cost function同样适用
    可以归纳为
    repeat{
    θj:=θjα1m i=1m(hθ(x(i))y(i))x(i)j }
    注意也是同时更新的
  4. 技巧
    (1)feature scaling
    (2)Mean normalization
    (3)怎么判断gradient descent是适用的?
    plots can be helpful

4. Linear regression and normal equations

θRn+1 , J(θ0,θ1,......,θn) = 12m i=1m(hθ(x(i))y(i))2
对每一个 θ 求偏微分
set δδθj(θ)=0 (for every j)
solve for θ0,θ1,......,θn

可以推导出 θ=(XTX)1XTY

5. Normal equation和gradient descent 的对比

当变量很多时,如 n106 的时候,适合用gradientdescent
但n较小时,需要选择 α ,需要很多部才能收敛,效果不如normal equation

你可能感兴趣的:(学习笔记)