Openclassroom Machine Learning

1、背景

最近在看UFLDL Tutorial的时候,他说“最好有一点机器学习基础(具体而言,熟悉监督学习,逻辑斯特回归以及梯度下降法的思想),如果您不熟悉这些,我们建议您先去机器学习课程中去学习,并完成其中的第II,III,IV章节(即到逻辑斯特回归)。”SO,我便到了这个课程去看了看,并且看完了视频。这个就是网页主页:

Machine Learning

其中一共有9章,每一章都有个exercise,用matlab/octave编程练习,前6章有视频。下面就是课程结构:

|——I. INTRODUCTION
   |——Welcome
   |——What is Machine Learning?
   |——Supervised Learning Introduction
   |——Unsupervised Learning Introduction
   |——Installing Octave
|——II. LINEAR REGRESSION I
   |——Supervised Learning Introduction
   |——Model Representation
   |——Cost Function
   |——Gradient Descent
   |——Gradient Descent for Linear Regression
   |——Vectorized Implementation
   |——Exercise 2
|——III. LINEAR REGRESSION II
   |——Feature Scaling
   |——Learning Rate
   |——Features and Polynomial Regression
   |——Normal Equations
   |——Exercise 3
|——IV. LOGISTIC REGRESSION
   |——Classification
   |——Model
   |——Optimization Objective I
   |——Optimization Objective II
   |——Gradient Descent
   |——Newton's Method I
   |——Newton's Method II
   |——Gradient Descent vs Newton's Method
   |——Exercise 4
|——V. REGULARIZATION
   |——The Problem Of Overfitting
   |——Optimization Objective
   |——Common Variations
   |——Regularized Linear Regression
   |——Regularized Logistic Regression
   |——Exercise 5
|——VI. NAIVE BAYES
   |——Generative Learning Algorithms
   |——Text Classification
   |——Exercise 6
|——VII. 
   |——Exercise 7
|——VIII. 
   |——Exercise 8
|——IX. 
   |——Exercise 9

但是因为网页是全英的,视频也没有字幕,所以需要一点英语基础,但吴恩达老师的课里面大都看他写的东西就能够明白讲什么了,非常通俗易懂,形象生动。惟一一点不好就是:这个网站比较早(2010-2012),现在应该是不更新也不维护了。现在想学机器学习,都去coursera的machine learning或者网易云课堂有中英文字幕的吴恩达的机器学习,这两个都是对应的。好了,不多说了,我是照着上面那个学的,没想系统学机器学习,就掌握一点最前面的知识点

2、正题

这个是我的pdf版的笔记,提取码:n8zn。感兴趣的可以对照着看视频,里面练习的时候也可以参考一下,里面有我的一些理论的推导。

这门课需要如下基础知识:

线性代数(矩阵运算)

高等数学(求导)

概率论(概率分布)

matlab编程

里面Exercise 2 - 5我都自己编通过了,这里是我的代码,练习题还挺好的,不仅有理论提示,还有参考solution解决方案,如果你的结果和他的不一样你就可以debugg,直到完全一样。最好自己先看视频,先编一下再看参考代码。

2.1、myex2.m

clear all
x = load('ex2x.dat');
y = load('ex2y.dat');
figure
plot(x,y,'o');
xlabel('Age in years'),ylabel('Height in meters');
m=length(y);
x=[ones(m,1),x];
alpha = 0.07;
theta=[0;0];
theta_record = theta;
%%%%%%%%%%%%%%%%%%%%%%%%%
%这是第一次写的,有点小问题,因为在改变了theta1后就用上改变theta2
% for iteration = 1:1500
%     for j=1:length(theta)
%         temp = 0;
%         for i=1:m
%             temp = temp + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
%         end
%         theta(j) = theta(j) - alpha/m*temp;
%     end
%     theta_record = [theta_record,theta];
%     
% end
%%%%%%%%%%%%%%%%%%%%%%%%
for iteration = 1:1500
    temp = zeros(length(theta),1);
    for j=1:length(theta)
        
        for i=1:m
            temp(j) = temp(j) + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
        end
    end
    for n=1:length(theta)
        theta(n) = theta(n) - alpha/m*temp(n);
    end
    theta_record = [theta_record,theta];
end
hold on
plot(x(:,2),x*theta,'-');
legend('Training data','Linear Regression')
theta
%%%Prediction%%%%
age1 = [1,3.5];
age2 = [1,7];
height_predict_1 = age1 * theta;
height_predict_2 = age2 * theta;
fprintf('The boy of age 3.5,his height is %f\n',height_predict_1);
fprintf('The boy of age 7.0,his height is %f\n',height_predict_2);

theta0_vals = linspace(-3, 3, 100);
theta1_vals = linspace(-1, 1, 100);
J_vals = zeros(length(theta0_vals), length(theta1_vals));   % initialize Jvals to 100x100 matrix of 0's
for i = 1:length(theta0_vals)
      for j = 1:length(theta1_vals)
      t = [theta0_vals(i); theta1_vals(j)];
      J_vals(i,j) = (0.5/m).*(sum((x*t-y).^2));
    end
end

% Plot the surface plot
% Because of the way meshgrids work in the surf command, we need to 
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
figure;
surf(theta0_vals, theta1_vals, J_vals)
xlabel('\theta_0'); ylabel('\theta_1')
% to see the approach to the global optimum more apparent
figure;
% Plot the cost function with 15 contours spaced logarithmically
% between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 2, 15));
xlabel('\theta_0'); ylabel('\theta_1');

运行结果:

theta =

0.7502
0.0639

The boy of age 3.5,his height is 0.973742
The boy of age 7.0,his height is 1.197334

ex2_result.png

2.2、myex3.m

这个写的不好,刚开始写的时候没感觉,你们可以改成循环的(之后的几次练习我都贯彻了“简洁”的原则)

clear all;close all; clc
x = load('ex3x.dat');
y = load('ex3y.dat');
m = length(y);
x = [ones(m,1),x];xx=x;yy=y;
% 因为x数据scale不一,对其归一化处理
sigma = std(x);
mu = mean(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);

MAX_ITR = 100;
%% alpha = 0.01 
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
alpha = 0.01;   %% My initial learning rate %%
J = zeros(MAX_ITR, 1); 
%% alpha = 0.03 
theta1 = zeros(size(x(1,:)))';
alpha1 = 0.03;   
J1 = zeros(MAX_ITR, 1); 
%% alpha = 0.1 
theta2 = zeros(size(x(1,:)))';
alpha2 = 0.1;   
J2 = zeros(MAX_ITR, 1); 
%% alpha = 0.3 
theta3 = zeros(size(x(1,:)))';
alpha3 = 0.3;   
J3 = zeros(MAX_ITR, 1); 
%% alpha = 1 
theta4 = zeros(size(x(1,:)))';
alpha4 = 1;   
J4 = zeros(MAX_ITR, 1); 
%% alpha = 1.3 
theta5 = zeros(size(x(1,:)))';
alpha5 = 1.3;   
J5 = zeros(MAX_ITR, 1); 
%% alpha = 1.4 
theta6 = zeros(size(x(1,:)))';
alpha6 = 1.4;   
J6 = zeros(MAX_ITR, 1); 


for num_iterations = 1:MAX_ITR
    %% alpha = 0.01 
    J(num_iterations) = (0.5/m).*(x * theta - y)'*(x * theta - y);%% Calculate my cost function  %%
    grad = (1/m).* x' * ((x * theta) - y);
    theta = theta - alpha .* grad; %% Result of gradient descent update %%
    %% alpha = 0.03 
    J1(num_iterations) = (0.5/m).*(x * theta1 - y)'*(x * theta1 - y);
    grad1 = (1/m).* x' * ((x * theta1) - y);
    theta1 = theta1 - alpha1 .* grad1; 
    %% alpha = 0.1 
    J2(num_iterations) = (0.5/m).*(x * theta2 - y)'*(x * theta2 - y);
    grad2 = (1/m).* x' * ((x * theta2) - y);
    theta2 = theta2 - alpha2 .* grad2; 
    %% alpha = 0.3 
    J3(num_iterations) = (0.5/m).*(x * theta3 - y)'*(x * theta3 - y);
    grad3 = (1/m).* x' * ((x * theta3) - y);
    theta3 = theta3 - alpha3 .* grad3; 
    %% alpha = 1 
    J4(num_iterations) = (0.5/m).*(x * theta4 - y)'*(x * theta4 - y);
    grad4 = (1/m).* x' * ((x * theta4) - y);
    theta4 = theta4 - alpha4 .* grad4; 
    %% alpha = 1.3
    J5(num_iterations) = (0.5/m).*(x * theta5 - y)'*(x * theta5 - y);
    grad5 = (1/m).* x' * ((x * theta5) - y);
    theta5 = theta5 - alpha5 .* grad5;
    %% alpha = 1.4 
    J6(num_iterations) = (0.5/m).*(x * theta6 - y)'*(x * theta6 - y);
    grad6 = (1/m).* x' * ((x * theta6) - y);
    theta6 = theta6 - alpha6 .* grad6;
end
fprintf('Finally,gradient descent with alpha= %.1f,after %d iterations,get:\n',alpha4,MAX_ITR);
final_theta = theta4
pdc_obj = [1,1650,3];
pdc_obj(2) = (pdc_obj(2) - mu(2))./ sigma(2);
pdc_obj(3) = (pdc_obj(3) - mu(3))./ sigma(3);
prediction = pdc_obj * final_theta
% Using normal equations to calculate theta:
fprintf('Finally,using normal equations,get:\n');
NE_theta = inv(xx'*xx)*(xx')*yy;
NE_theta
prediction = [1,1650,3] * NE_theta
% now plot J
% technically, the first J starts at the zero-eth iteration
% but Matlab/Octave doesn't have a zero index
figure;
plot(0:49, J(1:50), 'b-','LineWidth',2); %% alpha = 0.01
xlabel('Number of iterations');
ylabel('Cost J');
hold on;
plot(0:49, J1(1:50), 'r-','LineWidth',2); %% alpha = 0.03
plot(0:49, J2(1:50), 'g-','LineWidth',2); %% alpha = 0.1
plot(0:49, J3(1:50), 'k-','LineWidth',2); %% alpha = 0.3
plot(0:49, J4(1:50), 'b--','LineWidth',2); %% alpha = 1
plot(0:49, J5(1:50), 'r--','LineWidth',2); %% alpha = 1.3
legend('0.01', '0.03','0.1','0.3','1','1.3');
hold off;
figure;
plot(0:49, J6(1:50), 'r--','LineWidth',2); %% alpha = 1.4
xlabel('Number of iterations');
ylabel('Cost J');
legend('1.4');

运行结果:

这里不放了,去工作空间看比较准确

ex3_result.png

2.3、myex4

clear all; close all; clc
% in this code, almost use vectorized implement
x = load('ex4x.dat'); 
y = load('ex4y.dat');

m = length(y);

% Add intercept term to x
x = [ones(m, 1), x];

% find returns the indices of the
% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);

% Assume the features are in the 2nd and 3rd
% columns of x
plot(x(pos, 2), x(pos,3), '+'); hold on
plot(x(neg, 2), x(neg, 3), 'ro');
xlabel('Exam 1 score');
ylabel('Exam 2 score');


% To define sigmoid function through an inline expression:
g = inline('1.0 ./ (1.0 + exp(-z))'); 
% Usage: To find the value of the sigmoid 
% evaluated at 2, call g(2),z can be a vector.

MAX_ITR = 7;
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
J = zeros(MAX_ITR, 1);

for num_iterations = 1:MAX_ITR
    % calculate coss J, vectorized implement
    G = g(x * theta);
    G1 = 1 - G;
    S = log(G);
    V = log(G1);
    J(num_iterations) = (-1.0/m) .* (y' * S + (1 - y)' * V); % logistic regression cost function J
    
    % update theta
    grad_J = (1/m) .* x' * (G - y); % J gradient
    H = 0; % Hessian matrix initial
    for i = 1:m
        H = H + (1/m) .* G(i) * G1(i) .* (x(i,:)' * x(i,:));
    end
    theta = theta - inv(H) * grad_J; % use Newton's Method to update theta
end
theta
pro = 1-g([1,20,80] * theta);
fprintf('the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:%.3f\n',pro);
% plot the decision boundary line
plot(x(:,2),-((theta(1)*x(:,1)+theta(2)*x(:,2))/(theta(3))));
xlim([10,70]);ylim([40,100]);
legend('Admitted','Not admitted','Decision boundary');
hold off;
figure;
plot(0:MAX_ITR-1,J,'b--');hold on;
plot(0:MAX_ITR-1,J,'r*');
xlabel('Iteration');ylabel('J');

运行结果:

theta =

-16.3787
0.1483
0.1589
the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:0.668

ex4_result.png

2.4

2.4.1、myex5_1.m

clear all; close all; clc

% Regularization linear regression
% Using Normal Equations
x = load('ex5Linx.dat'); 
y = load('ex5Liny.dat');
x_nointercept = x;
m = length(y);

figure
plot(x_nointercept,y,'ro','MarkerFaceColor','r');hold on;

% Add intercept term to x
x = [ones(m, 1), x, x.^2, x.^3, x.^4, x.^5];
n = length(x(1,:));
lambda = [0,1,10];
theta_normal = zeros(n,length(lambda));
norm_theta = zeros(1,length(lambda));
for i=1:length(lambda)
    theta_normal(:,i) = (x' * x + lambda(i) * diag([0,ones(1,n-1)]))\x' * y;
    norm_theta(i) = norm(theta_normal(:,i));
end
theta_normal
norm_theta
x_test = linspace(-1,1,50)';
x_test = [ones(length(x_test), 1), x_test, x_test.^2, x_test.^3, x_test.^4, x_test.^5];
for i=1:length(lambda)
    plot(x_test(:,2),x_test * theta_normal(:,i),'--');
end
hold off;
legend('Training data','5th order fit,\lambda=0','5th order fit,\lambda=1','5th order fit,\lambda=10');

运行结果:

theta_normal =

0.4725 0.3976 0.5205
0.6814 -0.4207 -0.1825
-1.3801 0.1296 0.0606
-5.9777 -0.3975 -0.1482
2.4417 0.1753 0.0743
4.7371 -0.3394 -0.1280

norm_theta =

8.1687 0.8098 0.5931

ex5_1_result.png

2.4.2、myex5_2.m

clear all; close all; clc

% Regularization logistic regression
% Using Newton's Method
x = load('ex5Logx.dat'); 
y = load('ex5Logy.dat');
m = length(y);
x_expand = map_feature(x(:,1),x(:,2));

% Find the indices for the 2 classes
pos = find(y); neg = find(y == 0);

% plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
% hold on
% plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
% xlabel('u');ylabel('v');legend('y=1','y=0');hold off;

% Newton's Method Iterations
g = inline('1.0 ./ (1.0 + exp(-z))'); 
MAX_ITR = 15;
theta = zeros(size(x_expand(1,:)))'; % initialize fitting parameters
lambda = [0,1,10];
J = zeros(MAX_ITR, length(lambda));

for choose_lambda = 1:length(lambda)
    for num_iterations = 1:MAX_ITR
        % Calculate coss J, vectorized implement
        G = g(x_expand * theta);
        G1 = 1 - G;
        S = log(G);
        V = log(G1);
        % Regularized logistic regression cost function J
        % Add the regularization term
        J(num_iterations,choose_lambda) = (-1.0/m) .* (y' * S + (1 - y)' * V) + (lambda(choose_lambda)/(2*m)).* (theta(2:end)' * theta(2:end)); 

        % Update theta
        grad_J_before = (1/m) .* x_expand' * (G - y); % J gradient
        extra_theta = [0;(lambda(choose_lambda)/m) .* theta(2:end)];
        grad_J = grad_J_before + extra_theta;
        H = 0; % Hessian matrix initial
        for i = 1:m
            H = H + (1/m) .* G(i) * G1(i) .* (x_expand(i,:)' * x_expand(i,:));
        end
        H = H + (lambda(choose_lambda)/m) .* diag([0,ones(1,length(theta)-1)]);
        theta = theta - H \ grad_J; % use Newton's Method to update theta

    end
    norm_theta(choose_lambda) = norm(theta);
    % Plot decision boundary 
    % Define the ranges of the grid
    u = linspace(-1, 1.5, 200);
    v = linspace(-1, 1.5, 200);

    % Initialize space for the values to be plotted
    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            % Notice the order of j, i here!
            z(j,i) = map_feature(u(i), v(j))*theta;
        end
    end

    % Because of the way that contour plotting works
    % in Matlab, we need to transpose z, or
    % else the axis orientation will be flipped!
    z = z';
    % Plot z = 0 by specifying the range [0, 0]
    figure
    plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
    hold on
    plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
    xlabel('u');ylabel('v');
    contour(u,v,z, [0, 0], 'LineWidth', 2)
    legend('y = 1', 'y = 0', 'Decision boundary');
    hold off;
    title(sprintf('\\lambda = %g', lambda(choose_lambda)), 'FontSize', 14);
end
J
norm_theta
fprintf('Want to see detailed value,go to workspace!\n')

运行结果:


ex5_2_result.png

其中我还得再学习的理论方面包括(估计有点难度,就怕找不到合适的资料,谁有资源也可以推荐一下):

logistic regression的使用
Normal Equations(包括正则化前后的)
多元牛顿迭代法的理论支撑(包括正则化前后的)

---可以到这里看看教程,对前两个问题会有些解释(2019/7/21更新)-----

有问题的也可以加Q询问

你可能感兴趣的:(Openclassroom Machine Learning)