机器学习课程的实验,原本想着在实验室copy一下基友的,,,结果他们提前写完了。。。八点半开始实验八点四十就回去了。。。奈何只能自己动手。。
导入库
import pandas as pd
import numpy as np
from sklearn import datasets
from matplotlib import font_manager as fm, rcParams
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
加载需要的数据集,并选择 data,target
boston = datasets.load_boston()
x = pd.DataFrame(data=boston.data,columns=boston.feature_names)
y = pd.DataFrame(data=boston.target,columns=['MEDV'])
把x,y标准化
x = ((x - x.mean()) / x.std()).values
y = ((y - y.mean()) / y.std()).values
分割数据集,实验要求用五折交叉验证法
kf = KFold(n_splits=5)
for x_train_index,x_test_index in kf.split(x):
x_train,x_test = x[x_train_index],x[x_test_index]
for y_train_index,y_test_index in kf.split(y):
y_train,y_test = y[y_train_index],y[y_test_index]
添加偏置列,同时初始化theta矩阵
x_train =np.insert(x_train, 0, 1,axis= 1)
x_test = np.insert(x_test,0,1,axis=1)
theta = np.matrix(np.zeros((1,14))) #x的第二维维度为14,所以初始化theta为(1,14)
定义costfunction
def costfunction(x,y,theta):
inner = np.power(x*theta.T - y,2)
return np.sum(inner)/(2*len(x))
这里costfunction采用最平常的 (预测值-真实值)的平方 / 2m,m为样本总数。
不太了解的可以看吴恩达老师的机器学习里的梯度下降。
定义正则化代价函数
def regularizedcost(x,y,theta,l):
reg = (l / (2 * len(x))) * (np.power(theta,2).sum())
return costfunction(x,y,theta) + reg
防止overfitting等情况
定义梯度下降
def gradientdescent(x,y,theta,rate,l,epoch):
temp = np.matrix(np.zeros(np.shape(theta))) #
parameters = int(theta.flatten().shape[1]) #参数数量
cost = np.zeros(epoch) #储存每个epoch的cost
m = x.shape[0] #x样本总数
for i in range(epoch): #循环迭代epoch次
temp = theta - (rate / m) * ((x * theta.T - y).T * x) - (rate*l)/m * theta #迭代公式
theta = temp
cost[i] = regularizedcost(x,y,theta,l)
return theta,cost
最终定义参数
if __name__ == '__main__':
rate = 0.001 #学习率
epoch = 5000 #迭代次数
l = 50 #正则化参数
finallycost,cost = gradientdescent(x_train,y_train,theta,rate,l,epoch)
print(finallycost) #输出最终权重
用得到的参数与测试集对比,
t = np.arange(len(x_test)) # 创建等差数组
plt.plot(t, y_test, 'r-', linewidth=2, label=u'truth')
plt.plot(t, y_hat_test, 'b-', linewidth=2, label=u'foresee')
plt.legend(loc='upper right')
plt.grid(b=True, linestyle='--')
plt.show()
部分代码引用此:https://blog.csdn.net/weixin_44209013/article/details/106521149?ops_request_misc=&request_id=&biz_id=102&utm_term=%E6%89%8B%E5%86%99%E6%A2%AF%E5%BA%A6%E4%B8%8B%E9%99%8D%E9%A2%84%E6%B5%8B%E6%88%BF%E4%BB%B7&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-0-106521149.first_rank_v2_pc_rank_v29&spm=1018.2226.3001.4187