对假设函数(hypothesis function)建模:
在机器学习中,现有一个数据集,k个特征。大致画出特征点与结果集的映射。
预测一个假设函数模型,如果用当前的k个特征,不一定能符合假设函数的公式:
We can combine multiple features into one. For example, we can combine x_1x1 and x_2x2 into a new feature x_3x3 by taking x_1x1⋅x_2x2.
那么我们就需要把 k 个特征合并或扩展为 n 个特征。比如 { x1, x2 } -> { x3=x1*x2 }; { x1 } -> { x1=x1, x2=x1^2 }。图像不一定是二维的。这样我们就可以建立一个合理的假设函数模型,仅对特征值的合并和扩展,而不改变假设函数的式子。
得到代价函数(cost function) :
Octave矩阵表达式:J = sum((X * theta - y) .^ 2) / 2 / m;
求
这样就把多项式回归转化成了线性回归问题来求解了。
接下来详细说一说梯度下降法(gradient descent)和正规方程法(normal equation)实现步骤:
梯度下降法(gradient descent):
伪代码如下:
每一层循环中的Octave矩阵表达式: theta = theta - alpha / m * X' * (X * theta - y);
首先要对数据集进行特征缩放(feature scaling)也可叫归一化(normalization),否则下降的步数很多会导致算法缓慢。尽量把数据集的范围缩小到 [-1, 1] 的范围。
Where μi is the average of all the values for feature (i) and s_isi is the range of values (max - min), or s_isi is the standard deviation.
是
的平均值或者是标准差
是
的范围值的 (max - min).
其次,估算一个合理的学习率 ,可以通过绘制代价函数
的图像来调试学习率
大小。正确的代价函数应该逐渐收敛的。
If \alphaα is too small: slow convergence.
If \alphaα is too large: may not decrease on every iteration and thus may not converge.
正规方程法(normal equation):
The normal equation formula is given below:
注:把 都置为 1.
两种方法的选择:
The following is a comparison of gradient descent and the normal equation:
Gradient Descent Normal Equation Need to choose alpha No need to choose alpha Needs many iterations No need to iterate O (kn^2) O (n^3), need to calculate inverse of X^T*X Works well when n is large Slow if n is very large With the normal equation, computing the inversion has complexity O(n3). So if we have a very large number of features, the normal equation will be slow. In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.
总结就是,数据量如果很大(>10,000)那就用梯度下降法,否则使用正规方程法。