机器学习课程梳理 (01) ——线性回归

来源:http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex2/ex2.html

在这个例子中,因为数据方差比较小,迭代500次可以得到非常近似的解。但是如同文中所讲,运行1500次可以得到更加精确的解,4位有效数字情况下和理论解相同。


Linear regression

Now, we will implement linear regression for this problem. Recall that the linear regression model is 

\begin{displaymath}h_{\theta}(x) = \theta^Tx = \sum_{i=0}^n \theta_i x_i, \nonumber\end{displaymath}

and the batch gradient descent update rule is

\begin{displaymath}\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_{\......{(i)}) x_j^{(i)} \;\;\;\;\;\mbox{(for all $j$)} \nonumber\par\end{displaymath}

1. Implement gradient descent using a learning rate of $\alpha = 0.07$. Since Matlab/Octave and Octave index vectors starting from 1 rather than 0, you'll probably use theta(1) and theta(2) in Matlab/Octave to represent $\theta_0$ and $\theta_1$. Initialize the parameters to $\theta = \vec{0}$ (i.e., $\theta_0=\theta_1=0$), and run one iteration of gradient descent from this initial starting point. Record the value of of $\theta_0$ and $\theta_1$ that you get after this first iteration. (To verify that your implementation is correct, later we'll ask you to check your values of $\theta_0$ and $\theta_1$ against ours.)

2. Continue running gradient descent for more iterations until $\theta$ converges. (this will take a total of about 1500 iterations). After convergence, record the final values of $\theta_0$ and $\theta_1$ that you get.

Understanding $J(\theta)$

We'd like to understand better what gradient descent has done, and visualize the relationship between the parameters$\theta \in {\mathbb R}^2$ and $J(\theta)$. In this problem, we'll plot $J(\theta)$ as a 3D surface plot. (When applying learning algorithms, we don't usually try to plot $J(\theta)$ since usually $\theta \in {\mathbb R}^n$ is very high-dimensional so that we don't have any simple way to plot or visualize $J(\theta)$. But because the example here uses a very low dimensional $\theta \in {\mathbb R}^2$, we'll plot $J(\theta)$ to gain more intuition about linear regression.) Recall that the formula for $J(\theta)$ is 

\begin{displaymath}J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right)^{2} \nonumber\end{displaymath}

机器学习课程梳理 (01) ——线性回归_第1张图片

What is the relationship between this 3D surface and the value of $\theta_0$ and $\theta_1$ that your implementation of gradient descent had found?

Linear Regression

1. After your first iteration of gradient descent, verify that you get

\begin{eqnarray*}\theta_0 &=& 0.0745 \\\theta_1 &=& 0.3800\end{eqnarray*}


If your answer does not exactly match this solution, you may have implemented something wrong. Did you get the correct$\theta_0 = 0.0745$, but the wrong answer for $\theta_1$? (You might have gotten $\theta_1 = 0.4057$). If this happened, you probably updated the $\theta_j$ terms sequentially, that is, you first updated $\theta_0$, plugged that value back into $\theta$, and then updated $\theta_1$. Remember that you should not be basing your calculations on any intermediate values of $\theta$ that you would get this way.

2. After running gradient descent until convergence, verify that your parameters are approximately equal to the exact closed-form solution (which you will learn about in the next assignment):

\begin{eqnarray*}\theta_0 &=& 0.7502 \\\theta_1 &=& 0.0639\end{eqnarray*}


If you run gradient descent in MATLAB for 1500 iterations at a learning rate of 0.07, you should see these exact numbers for theta. If used fewer iterations, your answer should not differ by more than 0.01, or you probably did not iterate enough. For example, running gradient descent in MATLAB for 500 iterations gives theta = [0.7318, 0.0672]. This is close to convergence, but theta can still get closer to the exact value if you run gradient descent some more.

If your answer differs drastically from the solutions above, there may be a bug in your implementation. Check that you used the correct learning rate of 0.07 and that you defined the gradient descent update correctly. Then, check that your x and y vectors are indeed what you expect them to be. Remember that x needs an extra column of ones.

3. The predicted height for age 3.5 is 0.9737 meters, and for age 7 is 1.1975 meters.

Plot A plot of the training data with the best fit from gradient descent should look like the following graph.

机器学习课程梳理 (01) ——线性回归_第2张图片

Understanding $J(\theta)$

In your surface plot, you should see that the cost function $J(\theta)$ approaches a minimum near the values of $\theta_0$ and $\theta_1$that you found through gradient descent. In general, the cost function for a linear regression problem will be bowl-shaped with a global minimum and no local optima.

The result is a plot like the following.

机器学习课程梳理 (01) ——线性回归_第3张图片

Now the location of the minimum is more obvious.


你可能感兴趣的:(机器学习)