1、背景
最近在看UFLDL Tutorial的时候,他说“最好有一点机器学习基础(具体而言,熟悉监督学习,逻辑斯特回归以及梯度下降法的思想),如果您不熟悉这些,我们建议您先去机器学习课程中去学习,并完成其中的第II,III,IV章节(即到逻辑斯特回归)。”SO,我便到了这个课程去看了看,并且看完了视频。这个就是网页主页:
其中一共有9章,每一章都有个exercise,用matlab/octave编程练习,前6章有视频。下面就是课程结构:
|——I. INTRODUCTION
|——Welcome
|——What is Machine Learning?
|——Supervised Learning Introduction
|——Unsupervised Learning Introduction
|——Installing Octave
|——II. LINEAR REGRESSION I
|——Supervised Learning Introduction
|——Model Representation
|——Cost Function
|——Gradient Descent
|——Gradient Descent for Linear Regression
|——Vectorized Implementation
|——Exercise 2
|——III. LINEAR REGRESSION II
|——Feature Scaling
|——Learning Rate
|——Features and Polynomial Regression
|——Normal Equations
|——Exercise 3
|——IV. LOGISTIC REGRESSION
|——Classification
|——Model
|——Optimization Objective I
|——Optimization Objective II
|——Gradient Descent
|——Newton's Method I
|——Newton's Method II
|——Gradient Descent vs Newton's Method
|——Exercise 4
|——V. REGULARIZATION
|——The Problem Of Overfitting
|——Optimization Objective
|——Common Variations
|——Regularized Linear Regression
|——Regularized Logistic Regression
|——Exercise 5
|——VI. NAIVE BAYES
|——Generative Learning Algorithms
|——Text Classification
|——Exercise 6
|——VII.
|——Exercise 7
|——VIII.
|——Exercise 8
|——IX.
|——Exercise 9
但是因为网页是全英的,视频也没有字幕,所以需要一点英语基础,但吴恩达老师的课里面大都看他写的东西就能够明白讲什么了,非常通俗易懂,形象生动。惟一一点不好就是:这个网站比较早(2010-2012),现在应该是不更新也不维护了。现在想学机器学习,都去coursera的machine learning或者网易云课堂有中英文字幕的吴恩达的机器学习,这两个都是对应的。好了,不多说了,我是照着上面那个学的,没想系统学机器学习,就掌握一点最前面的知识点。
2、正题
这个是我的pdf版的笔记,提取码:n8zn。感兴趣的可以对照着看视频,里面练习的时候也可以参考一下,里面有我的一些理论的推导。
这门课需要如下基础知识:
线性代数(矩阵运算)
高等数学(求导)
概率论(概率分布)
matlab编程
里面Exercise 2 - 5我都自己编通过了,这里是我的代码,练习题还挺好的,不仅有理论提示,还有参考solution解决方案,如果你的结果和他的不一样你就可以debugg,直到完全一样。最好自己先看视频,先编一下再看参考代码。
2.1、myex2.m
clear all
x = load('ex2x.dat');
y = load('ex2y.dat');
figure
plot(x,y,'o');
xlabel('Age in years'),ylabel('Height in meters');
m=length(y);
x=[ones(m,1),x];
alpha = 0.07;
theta=[0;0];
theta_record = theta;
%%%%%%%%%%%%%%%%%%%%%%%%%
%这是第一次写的,有点小问题,因为在改变了theta1后就用上改变theta2
% for iteration = 1:1500
% for j=1:length(theta)
% temp = 0;
% for i=1:m
% temp = temp + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
% end
% theta(j) = theta(j) - alpha/m*temp;
% end
% theta_record = [theta_record,theta];
%
% end
%%%%%%%%%%%%%%%%%%%%%%%%
for iteration = 1:1500
temp = zeros(length(theta),1);
for j=1:length(theta)
for i=1:m
temp(j) = temp(j) + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
end
end
for n=1:length(theta)
theta(n) = theta(n) - alpha/m*temp(n);
end
theta_record = [theta_record,theta];
end
hold on
plot(x(:,2),x*theta,'-');
legend('Training data','Linear Regression')
theta
%%%Prediction%%%%
age1 = [1,3.5];
age2 = [1,7];
height_predict_1 = age1 * theta;
height_predict_2 = age2 * theta;
fprintf('The boy of age 3.5,his height is %f\n',height_predict_1);
fprintf('The boy of age 7.0,his height is %f\n',height_predict_2);
theta0_vals = linspace(-3, 3, 100);
theta1_vals = linspace(-1, 1, 100);
J_vals = zeros(length(theta0_vals), length(theta1_vals)); % initialize Jvals to 100x100 matrix of 0's
for i = 1:length(theta0_vals)
for j = 1:length(theta1_vals)
t = [theta0_vals(i); theta1_vals(j)];
J_vals(i,j) = (0.5/m).*(sum((x*t-y).^2));
end
end
% Plot the surface plot
% Because of the way meshgrids work in the surf command, we need to
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
figure;
surf(theta0_vals, theta1_vals, J_vals)
xlabel('\theta_0'); ylabel('\theta_1')
% to see the approach to the global optimum more apparent
figure;
% Plot the cost function with 15 contours spaced logarithmically
% between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 2, 15));
xlabel('\theta_0'); ylabel('\theta_1');
运行结果:
theta =
0.7502
0.0639The boy of age 3.5,his height is 0.973742
The boy of age 7.0,his height is 1.197334
2.2、myex3.m
这个写的不好,刚开始写的时候没感觉,你们可以改成循环的(之后的几次练习我都贯彻了“简洁”的原则)
clear all;close all; clc
x = load('ex3x.dat');
y = load('ex3y.dat');
m = length(y);
x = [ones(m,1),x];xx=x;yy=y;
% 因为x数据scale不一,对其归一化处理
sigma = std(x);
mu = mean(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);
MAX_ITR = 100;
%% alpha = 0.01
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
alpha = 0.01; %% My initial learning rate %%
J = zeros(MAX_ITR, 1);
%% alpha = 0.03
theta1 = zeros(size(x(1,:)))';
alpha1 = 0.03;
J1 = zeros(MAX_ITR, 1);
%% alpha = 0.1
theta2 = zeros(size(x(1,:)))';
alpha2 = 0.1;
J2 = zeros(MAX_ITR, 1);
%% alpha = 0.3
theta3 = zeros(size(x(1,:)))';
alpha3 = 0.3;
J3 = zeros(MAX_ITR, 1);
%% alpha = 1
theta4 = zeros(size(x(1,:)))';
alpha4 = 1;
J4 = zeros(MAX_ITR, 1);
%% alpha = 1.3
theta5 = zeros(size(x(1,:)))';
alpha5 = 1.3;
J5 = zeros(MAX_ITR, 1);
%% alpha = 1.4
theta6 = zeros(size(x(1,:)))';
alpha6 = 1.4;
J6 = zeros(MAX_ITR, 1);
for num_iterations = 1:MAX_ITR
%% alpha = 0.01
J(num_iterations) = (0.5/m).*(x * theta - y)'*(x * theta - y);%% Calculate my cost function %%
grad = (1/m).* x' * ((x * theta) - y);
theta = theta - alpha .* grad; %% Result of gradient descent update %%
%% alpha = 0.03
J1(num_iterations) = (0.5/m).*(x * theta1 - y)'*(x * theta1 - y);
grad1 = (1/m).* x' * ((x * theta1) - y);
theta1 = theta1 - alpha1 .* grad1;
%% alpha = 0.1
J2(num_iterations) = (0.5/m).*(x * theta2 - y)'*(x * theta2 - y);
grad2 = (1/m).* x' * ((x * theta2) - y);
theta2 = theta2 - alpha2 .* grad2;
%% alpha = 0.3
J3(num_iterations) = (0.5/m).*(x * theta3 - y)'*(x * theta3 - y);
grad3 = (1/m).* x' * ((x * theta3) - y);
theta3 = theta3 - alpha3 .* grad3;
%% alpha = 1
J4(num_iterations) = (0.5/m).*(x * theta4 - y)'*(x * theta4 - y);
grad4 = (1/m).* x' * ((x * theta4) - y);
theta4 = theta4 - alpha4 .* grad4;
%% alpha = 1.3
J5(num_iterations) = (0.5/m).*(x * theta5 - y)'*(x * theta5 - y);
grad5 = (1/m).* x' * ((x * theta5) - y);
theta5 = theta5 - alpha5 .* grad5;
%% alpha = 1.4
J6(num_iterations) = (0.5/m).*(x * theta6 - y)'*(x * theta6 - y);
grad6 = (1/m).* x' * ((x * theta6) - y);
theta6 = theta6 - alpha6 .* grad6;
end
fprintf('Finally,gradient descent with alpha= %.1f,after %d iterations,get:\n',alpha4,MAX_ITR);
final_theta = theta4
pdc_obj = [1,1650,3];
pdc_obj(2) = (pdc_obj(2) - mu(2))./ sigma(2);
pdc_obj(3) = (pdc_obj(3) - mu(3))./ sigma(3);
prediction = pdc_obj * final_theta
% Using normal equations to calculate theta:
fprintf('Finally,using normal equations,get:\n');
NE_theta = inv(xx'*xx)*(xx')*yy;
NE_theta
prediction = [1,1650,3] * NE_theta
% now plot J
% technically, the first J starts at the zero-eth iteration
% but Matlab/Octave doesn't have a zero index
figure;
plot(0:49, J(1:50), 'b-','LineWidth',2); %% alpha = 0.01
xlabel('Number of iterations');
ylabel('Cost J');
hold on;
plot(0:49, J1(1:50), 'r-','LineWidth',2); %% alpha = 0.03
plot(0:49, J2(1:50), 'g-','LineWidth',2); %% alpha = 0.1
plot(0:49, J3(1:50), 'k-','LineWidth',2); %% alpha = 0.3
plot(0:49, J4(1:50), 'b--','LineWidth',2); %% alpha = 1
plot(0:49, J5(1:50), 'r--','LineWidth',2); %% alpha = 1.3
legend('0.01', '0.03','0.1','0.3','1','1.3');
hold off;
figure;
plot(0:49, J6(1:50), 'r--','LineWidth',2); %% alpha = 1.4
xlabel('Number of iterations');
ylabel('Cost J');
legend('1.4');
运行结果:
这里不放了,去工作空间看比较准确
2.3、myex4
clear all; close all; clc
% in this code, almost use vectorized implement
x = load('ex4x.dat');
y = load('ex4y.dat');
m = length(y);
% Add intercept term to x
x = [ones(m, 1), x];
% find returns the indices of the
% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);
% Assume the features are in the 2nd and 3rd
% columns of x
plot(x(pos, 2), x(pos,3), '+'); hold on
plot(x(neg, 2), x(neg, 3), 'ro');
xlabel('Exam 1 score');
ylabel('Exam 2 score');
% To define sigmoid function through an inline expression:
g = inline('1.0 ./ (1.0 + exp(-z))');
% Usage: To find the value of the sigmoid
% evaluated at 2, call g(2),z can be a vector.
MAX_ITR = 7;
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
J = zeros(MAX_ITR, 1);
for num_iterations = 1:MAX_ITR
% calculate coss J, vectorized implement
G = g(x * theta);
G1 = 1 - G;
S = log(G);
V = log(G1);
J(num_iterations) = (-1.0/m) .* (y' * S + (1 - y)' * V); % logistic regression cost function J
% update theta
grad_J = (1/m) .* x' * (G - y); % J gradient
H = 0; % Hessian matrix initial
for i = 1:m
H = H + (1/m) .* G(i) * G1(i) .* (x(i,:)' * x(i,:));
end
theta = theta - inv(H) * grad_J; % use Newton's Method to update theta
end
theta
pro = 1-g([1,20,80] * theta);
fprintf('the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:%.3f\n',pro);
% plot the decision boundary line
plot(x(:,2),-((theta(1)*x(:,1)+theta(2)*x(:,2))/(theta(3))));
xlim([10,70]);ylim([40,100]);
legend('Admitted','Not admitted','Decision boundary');
hold off;
figure;
plot(0:MAX_ITR-1,J,'b--');hold on;
plot(0:MAX_ITR-1,J,'r*');
xlabel('Iteration');ylabel('J');
运行结果:
theta =
-16.3787
0.1483
0.1589
the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:0.668
2.4
2.4.1、myex5_1.m
clear all; close all; clc
% Regularization linear regression
% Using Normal Equations
x = load('ex5Linx.dat');
y = load('ex5Liny.dat');
x_nointercept = x;
m = length(y);
figure
plot(x_nointercept,y,'ro','MarkerFaceColor','r');hold on;
% Add intercept term to x
x = [ones(m, 1), x, x.^2, x.^3, x.^4, x.^5];
n = length(x(1,:));
lambda = [0,1,10];
theta_normal = zeros(n,length(lambda));
norm_theta = zeros(1,length(lambda));
for i=1:length(lambda)
theta_normal(:,i) = (x' * x + lambda(i) * diag([0,ones(1,n-1)]))\x' * y;
norm_theta(i) = norm(theta_normal(:,i));
end
theta_normal
norm_theta
x_test = linspace(-1,1,50)';
x_test = [ones(length(x_test), 1), x_test, x_test.^2, x_test.^3, x_test.^4, x_test.^5];
for i=1:length(lambda)
plot(x_test(:,2),x_test * theta_normal(:,i),'--');
end
hold off;
legend('Training data','5th order fit,\lambda=0','5th order fit,\lambda=1','5th order fit,\lambda=10');
运行结果:
theta_normal =
0.4725 0.3976 0.5205
0.6814 -0.4207 -0.1825
-1.3801 0.1296 0.0606
-5.9777 -0.3975 -0.1482
2.4417 0.1753 0.0743
4.7371 -0.3394 -0.1280norm_theta =
8.1687 0.8098 0.5931
2.4.2、myex5_2.m
clear all; close all; clc
% Regularization logistic regression
% Using Newton's Method
x = load('ex5Logx.dat');
y = load('ex5Logy.dat');
m = length(y);
x_expand = map_feature(x(:,1),x(:,2));
% Find the indices for the 2 classes
pos = find(y); neg = find(y == 0);
% plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
% hold on
% plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
% xlabel('u');ylabel('v');legend('y=1','y=0');hold off;
% Newton's Method Iterations
g = inline('1.0 ./ (1.0 + exp(-z))');
MAX_ITR = 15;
theta = zeros(size(x_expand(1,:)))'; % initialize fitting parameters
lambda = [0,1,10];
J = zeros(MAX_ITR, length(lambda));
for choose_lambda = 1:length(lambda)
for num_iterations = 1:MAX_ITR
% Calculate coss J, vectorized implement
G = g(x_expand * theta);
G1 = 1 - G;
S = log(G);
V = log(G1);
% Regularized logistic regression cost function J
% Add the regularization term
J(num_iterations,choose_lambda) = (-1.0/m) .* (y' * S + (1 - y)' * V) + (lambda(choose_lambda)/(2*m)).* (theta(2:end)' * theta(2:end));
% Update theta
grad_J_before = (1/m) .* x_expand' * (G - y); % J gradient
extra_theta = [0;(lambda(choose_lambda)/m) .* theta(2:end)];
grad_J = grad_J_before + extra_theta;
H = 0; % Hessian matrix initial
for i = 1:m
H = H + (1/m) .* G(i) * G1(i) .* (x_expand(i,:)' * x_expand(i,:));
end
H = H + (lambda(choose_lambda)/m) .* diag([0,ones(1,length(theta)-1)]);
theta = theta - H \ grad_J; % use Newton's Method to update theta
end
norm_theta(choose_lambda) = norm(theta);
% Plot decision boundary
% Define the ranges of the grid
u = linspace(-1, 1.5, 200);
v = linspace(-1, 1.5, 200);
% Initialize space for the values to be plotted
z = zeros(length(u), length(v));
% Evaluate z = theta*x over the grid
for i = 1:length(u)
for j = 1:length(v)
% Notice the order of j, i here!
z(j,i) = map_feature(u(i), v(j))*theta;
end
end
% Because of the way that contour plotting works
% in Matlab, we need to transpose z, or
% else the axis orientation will be flipped!
z = z';
% Plot z = 0 by specifying the range [0, 0]
figure
plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
hold on
plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
xlabel('u');ylabel('v');
contour(u,v,z, [0, 0], 'LineWidth', 2)
legend('y = 1', 'y = 0', 'Decision boundary');
hold off;
title(sprintf('\\lambda = %g', lambda(choose_lambda)), 'FontSize', 14);
end
J
norm_theta
fprintf('Want to see detailed value,go to workspace!\n')
运行结果:
其中我还得再学习的理论方面包括(估计有点难度,就怕找不到合适的资料,谁有资源也可以推荐一下):
logistic regression的使用
Normal Equations(包括正则化前后的)
多元牛顿迭代法的理论支撑(包括正则化前后的)
---可以到这里看看教程,对前两个问题会有些解释(2019/7/21更新)-----
有问题的也可以加Q询问