预测2011-2012年共享单车每小时使用数量(华盛顿)

数据集链接http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
相关描述可以在网站上看到,我就不写啦~
分别使用线性回归/决策树/随机森林决策树进行预测,顺便比较了一下哪个模型预测更加精准。
在使用随机森林预测时,如果对时间要求不是很高的话,可以把n_estimators设置的稍微大一些,0-200之间都可以,因为模型准确率函数为一个对数函数。

代码:
读取csv文件

import pandas as pd 
import matplotlib.pyplot as plt


bike_rentals=pd.read_csv('./data/hour.csv')

#plt.hist(bike_rentals['cnt'])
#plt.show()
cnt_correlations=bike_rentals.corr()['cnt']
print("\n Reading success! cnt-correlations:\n")
print(cnt_correlations)

处理数据,生成模型并预测

import read_file
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

bike_rentals=read_file.bike_rentals

# Formatting 'hr' column
def assign_label(hour):
    if hour >=0 and hour < 6:
        return 4
    elif hour >=6 and hour < 12:
        return 1
    elif hour >= 12 and hour < 18:
        return 2
    elif hour >= 18 and hour <=24:
        return 3

bike_rentals['time_labels']=bike_rentals['hr'].apply(assign_label)

#Splitting data
train=bike_rentals.sample(frac=.8)
test=bike_rentals.iloc[~bike_rentals.index.isin(train.index)]

# Removing columns,such as indirect and unuseful columns
columns=list(bike_rentals.columns)
columns.remove('cnt')
columns.remove('casual')
columns.remove('dteday')
columns.remove('registered')

print("\n===========>>>>>>Predictting:\n")
#Predictting target column,selectting mse as metric.
#LinearRegression

model=LinearRegression()
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
print("MSE using LinearRegression:    ",end='')
print(mse,'\n')

#DecisionTreeRegression

model=DecisionTreeRegressor(min_samples_leaf=5)
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
print("MSE using DecisionTreeRegression:    ",end='')
print(mse,'\n')

#RandomForsetRegression

model=RandomForestRegressor(n_estimators=50,min_samples_leaf=2)
model.fit(train[columns],train['cnt'])
predictions=model.predict(test[columns])
mse=mean_squared_error(test['cnt'],predictions)
test['predictions']=predictions
print("MSE using DecisionTreeRegression:    ",end='')
print(mse,'\n')
print(test.iloc[:10][['cnt','predictions']])

结果:


预测2011-2012年共享单车每小时使用数量(华盛顿)_第1张图片

你可能感兴趣的:(预测2011-2012年共享单车每小时使用数量(华盛顿))