An example of linear regression
For example, from previous study, we can easly darw a conclusion that the model which the table described in the picture is a supervised learning model. And moreover, this is an example of Regression problem (回归问题).
And more formally, in superivised learning, we have a data set, which is called training set (训练集). The algorithm's job is to learn from this data how to predict the "right answer" (such as predict the prices of the houses).
Here are some notations which can help us study:
- m: represent the number of training examples (the number of rows in the above picture) .
- x^(i): represent the "input" variable / features in the i-th row.
- y^(i): represent the "output" variable / "target" variable in the i-th row.
To describle the supervised learning problem more formally, our goal is, given a training set, to learn a function h: x -> y so that h(x) is a "good" predictor for the corresponding value of y. For histirical reasons, this function h is called a hypothesis function(假设函数). Seen pictorially, the process is therefore like this:
So h is a function that maps from x (the size of house) to y (the estimated price) in this example, and according to supervised learning we can build a h like this:
This is also called linear regression with one variable (一元线性回归) or univariate linear regression (单变量线性回归), which is the basic block of learning other more complicated models.
Cost function (代价函数)
Here is model with a training set, and we got its hypothesis:
The θi in the hypothesis is the parameters of the model (模型参数).
And the task of the algorithm is to get these two parameter values (θ0 and θ1), so that the straight line we get out of this corresponds to a straight line that somehow fits the data well.
For example:
What can be seen clearly is that based on different parameter values (θ0 and θ1), we can get different hypothesis.Therefore, in the example of predicting housing prices, we neet to predict the housing prices as correct as possible through choosing appropriate θ0 and θ1.
If we want to choose θ0 and θ1 to minimize the difference between h(x) and y, what we need to do is to minimize the square (平方) of the difference between the output value of the hypothetical function and the real price of the house (to let the cost function be the smallest/使得代价函数最小), which can be expressed by mathematical expression (数学表达式) as:
This is called cost function (代价函数) or squared error function (平方误差函数), we can measure the accuracy of our hypothesis function by using a cost function.
To understand the cost function intuitively (直观地) Ⅰ
Firstly, we use a simplified model like this (only one parameter θ1 -- the hypothesis functions that pass through the origin (原点)):
So through different θ1 we can get different hypothesis function, and the result of the cost function J is also different:
And if we get more result of the function J through different θ1, then we can get a functional image of J like this:
Our goal is to minumize the cost function J ( the h(x) line should pass though all the points of our training data set in the ideal situation). In this case, θ1 = 1 is our global minimum which is the minumum value of the cost function.
To understand the cost function intuitively Ⅱ
when it goes to h(x) = θ0 + θ1x, the cost function J has two variables (θ0 and θ1).
which makes the functional image of J a three-dimensional image (三维图像). And we also can use a contour plots (等高线图) to show it. A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line.
And through different θ1 and θ0 we can get different hypothesis function, and the result of the cost function J is also different (The contour line position of the results is different):
Our goal is to minumize the cost function J (the h(x) line should pass though all the points of our training data set as far as possible), like this: