波士顿房价预测,给出不同面积的房屋价格,预测其它面积的房屋
h θ ( x ) = θ 0 + θ 1 x h_θ(x)=\theta_0+\theta_1x hθ(x)=θ0+θ1x
其中: θ 0 \theta_0 θ0和 θ 1 \theta_1 θ1为代求参数
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})^2} J(θ0,θ1)=2m1i=1∑m(hθ(x(i))−y(i))2
其中,m为样本数量。
目标:选择合适的参数,使代价函数降到最低
作用:求解函数最小值
思想:开始时,随机选择一组参数组合,然后计算代价函数及其梯度,然后寻找下一个能让代价函数下降最多的参数组合。持续这样做,得到一个局部最小值(选择不同的初始参数组合,可能会找到不同的局部最小值)
公式如下:
θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) f o r j = 0 a n d j = 1 \theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)\ \ \ \ \ \ for\ j=0\ and\ j=1 θj:=θj−α∂θj∂J(θ0,θ1) for j=0 and j=1
在进行参数更新的时候,参数要同步更新。
其中, α \alpha α是学习率,决定下降的步长; α \alpha α太小时,梯度下降慢; α \alpha α太大时,梯度下降法可能会越过最低点,导致无法收敛甚至发散。
对线性回归问题运用批量梯度下降算法:
∂ ∂ θ j J ( θ 0 , θ 1 ) = ∂ ∂ θ j 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 \frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)=\frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})^2} ∂θj∂J(θ0,θ1)=∂θj∂2m1i=1∑m(hθ(x(i))−y(i))2
j = 0 时 : ∂ ∂ θ 0 J ( θ 0 , θ 1 ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) j\ =\ 0时:\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})} j = 0时:∂θ0∂J(θ0,θ1)=m1i=1∑m(hθ(x(i))−y(i)) j = 1 时 : ∂ ∂ θ 1 J ( θ 0 , θ 1 ) = 1 m ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) ) j\ =\ 1时:\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m{((h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)})} j = 1时:∂θ1∂J(θ0,θ1)=m1i=1∑m((hθ(x(i))−y(i))⋅x(i))
即:
θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) \theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m{(h_\theta(x^{(i)})-y^{(i)})} θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i)) θ 1 : = θ 1 − α 1 m ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) ) \theta_1:=\theta_1-\alpha\frac{1}{m}\sum_{i=1}^m{((h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)})} θ1:=θ1−αm1i=1∑m((hθ(x(i))−y(i))⋅x(i))
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# 获取X、y矩阵
def getX(df):
ones = pd.DataFrame({'ones': np.ones(len(df))})
data = pd.concat([ones, df], axis=1)
return data.iloc[:, :-1].values
def gety(df):
return np.array(df.iloc[:, -1])
# 代价函数
def cost(theta, X, y):
m = X.shape[0]
inner = X @ theta - y
cost = (inner.T @ inner) / (2 * m)
return cost
# 梯度下降
def gradientDescent(theta, X, y):
return X.T @ (X @ theta - y) / X.shape[0]
def batchDo(theta, X, y, epoch, learningRate):
costData = [cost(theta, X, y)]
for i in range(epoch):
theta = theta - learningRate * gradientDescent(theta, X, y)
costData.append(cost(theta, X, y))
return theta, costData
# 主函数
if __name__ == '__main__':
# 获取数据并绘图
data = pd.read_csv('ex1data1.txt', names=['population', 'profit'])
sns.set(context="notebook", style="white", palette="dark")
sns.lmplot('population', 'profit', data, height=10, fit_reg=True)
plt.show()
# 获取X、y,并随机生产Θ
X = getX(data)
y = gety(data)
theta = np.random.rand(X.shape[1])
# 梯度下降
epoch = 800
learningRate = 0.01
theta, costData = batchDo(theta, X, y, epoch, learningRate)
# 绘制代价函数值--迭代次数图
a = sns.tsplot(costData, np.arange(epoch + 1))
a.set_xlabel("epoch")
a.set_ylabel("cost")
plt.show()
# 绘制原始数据及回归方程
plt.scatter(data.population, data.profit, label="Training data")
plt.plot(data.population, data.population * theta[1] + theta[0], label="Prediction")
plt.legend(loc=2)
plt.show()