introduction to machine learning-1

Course Requirements and Grading

Lab(30%)

  • Python

  • Synthetic data

  • 2 deliverables, distributed over moodle


Theory exercises(0/20)

  • close to the end(early December)

Final exam(70%)

  • Theory questions(judgement-oriented)

  • Simulate running algorithms by hand


Meeting hours

  • Office:104B, 68-72 Gower street

  • Meeting hours: Tuesday, 14:00-15:00


Prerequisites:

Linear Algebra; Calculus; Probability; Programming


Machine Learning

data -> maodel ->prediction


Least squares model

least squares solution for linear regression

: probleim dimension, e.g. 1D, 2D( can visualize)

: training set size

Training set: input-output pairs where,,   generally can be

: weight,

: noise

other notation:

Remark: ";" represent column vector


linear regression model

that is

Loss function:

goal:

Least squares solution for linear regression:


Generalized linear regression model

where can be other form besides ( if , and , it is just the linear regression model )

If , then it is k-th degree ploynomial fitting

If the highest order of is 2, then it is second-order polynomials fitting

set where can be other form besides ( if , and , it is just the linear regression model )

If , then it is k-th degree ploynomial fitting
If the highest order of is 2, then it is second-order polynomials fitting
set
then the model is:

Least squares solution for generalized linear regression:


approximations

If (e.g. 30 points, 2 dimensions): overdetermined system
If (e.g. 30 points, 3000 dimensions): underdetermined system (overfitting)


How to control complexity( Regularized linear regression)

1.use vector norm (L2, L1, Lp norm) to measure residual vector
Remark: different norm represent different regularized linear regression, here we use L2 norm

2.rewrite loss function:
this is ridge regression, a.k.a, L2-regularized linear regression
Remark: is "hyperparameter", select with cross-validation(use cross-validation for diff values of -- pick value minimizes cross-validation error)

Cross-validation: least glorious, most effective of all methods (teacher said)

3.Least squares solution for ridge regression:

你可能感兴趣的:(introduction to machine learning-1)