共享单车数据可视化

  • 所需的环境配置

Windows10,Python,PyCharm

推荐下载顺序:Python->PyCharm

Python与第三方库安装教程:https://blog.csdn.net/weixin_42128329/article/details/90046108

PyCharm与第三方库安装教程:https://blog.csdn.net/weixin_42128329/article/details/90046725

用到第三方库:Numpy,Pandas,Matplotlib,Seaborn

PyCharm第三方库安装教程:https://blog.csdn.net/qq_41106517/article/details/81140563

如果出现No information available提示,请检查网络,如网络没问题,可以通过将pip的源更换为阿里源解决

共享单车数据可视化_第1张图片

新建文件pip.ini,其中写入

[global]

index-url = http://mirrors.aliyun.com/pypi/simple/

[install]

trusted-host=mirrors.aliyun.com

在C:\Users\用户名\AppData\Roaming 目录下新建pip文件夹,将pip.ini文件放入

  • 项目背景

拿到了一份共享单车骑行数据,利用Python进行数据可视化。

数据下载地址:链接:https://pan.baidu.com/s/11DiPR0LjT5xNgic4wrpLGg  提取码:g7bc 

  • 数据的理解

使用XLS表格打开

共享单车数据可视化_第2张图片

Datetime:时间

Season:季度

Holiday:节假日(0为否,1为真)

Workingday:工作日(0为否,1为真)

Casual:非会员

Registered:会员 

Count:总数

对第某行数据的解读:第一季度,2011/1/1 0:00 - 2011/1/1 1:00时段,非节假日,非工作日,非会员骑行人数为3,会员骑行人数为13,总人数为16。

  • 代码讲解

导包(将所需要的库导入):

#导包
import numpy as np
import pandas as pd
from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['SimHei']#设置字体,防止中文无法显示
from datetime import datetime
import calendar
import matplotlib.pyplot as plt
import seaborn as sn

数据的读入和相关查看:

#读入数据
bikedata=pd.read_csv("train.csv");
#显示相应所需数据
print(bikedata)
print(bikedata.shape)
print(bikedata.head())
print(bikedata.tail())
print(bikedata.dtypes)
print(bikedata.describe())

Bikedata.shape查看数据大小

Bikedata.head()查看数据前5行

Bikedata.tail()查看数据后5行

Bikedata.dtypes查看数据类型

Bikedata.describe()查看统计摘要信息

结果展示:

共享单车数据可视化_第3张图片

  数据提取:

bikedata['date']=bikedata.datetime.apply(lambda x:x.split()[0])#新建date列,数据是datetime中的年月日
bikedata['hour']=bikedata.datetime.apply(lambda x:x.split()[1].split(':')[0])
bikedata.drop('season',axis=1,inplace=True)#删除season列
bikedata['season']=bikedata.date.apply(lambda x:x.split('/')[1])#再新建season列,数据是date列中的月份
bikedata['weekday']=bikedata.date.apply(lambdadateString:calendar.day_name[datetime.strptime(dateString,'%Y/%m/%d').weekday()])
bikedata['month']=bikedata.date.apply(lambdadateString:calendar.month_name[datetime.strptime(dateString,'%Y/%m/%d').month])
bikedata['season'] = bikedata['season'].astype('int')#数据转为int型
bikedata['season']=bikedata.season.map({3:'Spring',4:'Spring',5:'Spring',6:'summer',7:'summer',8:'summer',9:'Fall',10:'Fall',11:'Fall',12:'Winter',1:'Winter',2:'Winter'})#使用字典进行替换
bikedata['hour']=bikedata['hour'].astype('int')
varlist=['weekday','month','season','holiday','workingday']
for x in varlist:
    bikedata[x]=bikedata[x].astype('category')
bikedata.drop('datetime',axis=1,inplace=True)

# 处理数据
fig, axes = plt.subplots(nrows=2, ncols=2)#plt包里的包 绘制子图 如果里面没有,默认绘制一个
fig.set_size_inches(12,12)
sn.boxplot(data=bikedata,y="count",orient="v",ax=axes[0][0])
sn.boxplot(data=bikedata,y="count",x="season",orient="v",ax=axes[0][1])
sn.boxplot(data=bikedata,y="count",x="hour",orient="v",ax=axes[1][0])
sn.boxplot(data=bikedata,y="count",x="workingday",orient="v",ax=axes[1][1])
#绘制箱型图
axes[0][0].set(ylabel='骑行人数',title="骑行人数")
axes[0][1].set(ylabel='骑行人数',xlabel='季节',title="不同季节骑行人数")
axes[1][0].set(xlabel='时间',ylabel='骑行人数',title="一天不同时间骑行人数")
axes[1][1].set(xlabel='工作日',ylabel='骑行人数',title="工作日骑行人数")
plt.savefig("Abnormal_value_analysis.png")
plt.show()

# 剔除数据
bikedata1 = bikedata[np.abs(bikedata["count"] - bikedata["count"].mean()) <=(3*bikedata["count"].std())]
#三倍标准差剔除异常值 abs绝对值 mean平均
bikedata1.to_csv('processed_data.csv')#保存处理后的数据为bikedata1

结果展示: 

共享单车数据可视化_第4张图片

绘制不同月份骑行人数图:

#不同月份骑行人数
def Data_Analysis_and_Visualization_month(bikedata1):
    fig1, ax1 = plt.subplots()
    fig1.set_size_inches(12,20)
    sortOrder =["January","February","March","April","May","June","July","August","September","October","November","December"]
    monthAggregated = pd.DataFrame(bikedata1.groupby("month")["count"].mean()).reset_index()
    monthSorted = monthAggregated.sort_values(by="count",ascending=False)
    sn.barplot(data=monthSorted,x="month",y="count",order=sortOrder)
    ax1.set(xlabel='月份',ylabel='平均骑行人数',title="不同月份骑行人数")
    plt.savefig('result1.png')
    plt.show()

结果展示: 

共享单车数据可视化_第5张图片

绘制一周内不同时间骑行人数图:

#一周内不同时间的骑行人数
def Data_Analysis_and_Visualization_week(bikedata1):
    fig2, ax2 = plt.subplots()
    fig2.set_size_inches(12,20)
    hueOrder = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']
    hourAggregated = pd.DataFrame(bikedata1.groupby(["hour","weekday"])["count"].mean()).reset_index()
    print(hourAggregated)
    sn.pointplot(x=hourAggregated["hour"],y=hourAggregated["count"],hue=hourAggregated["weekday"],hue_order=hueOrder,data=hourAggregated)
    ax2.set(xlabel='时间',ylabel='骑行人数',title='一周内不同时间的骑行人数')
    plt.savefig('result2.png')
    plt.show()

结果展示: 

 共享单车数据可视化_第6张图片

绘制不同季节不同时间的骑行人数图:

#不同季节不同时间的骑行人数
def season_And_hour(bikedata1):
    fig2, ax2 = plt.subplots()
    fig2.set_size_inches(12, 20)
    hueOrder = ['Spring','summer','Fall','Winter']
    hourAggregated = pd.DataFrame(bikedata1.groupby(["hour", "season"])["count"].mean()).reset_index()
    sn.pointplot(x=hourAggregated["hour"], y=hourAggregated["count"], hue=hourAggregated["season"],hue_order=hueOrder,data=hourAggregated)
    ax2.set(xlabel='时间', ylabel='骑行人数', title='不同季节不同时间的骑行人数')
    plt.savefig('result3.png')
    plt.show()

结果展示: 

共享单车数据可视化_第7张图片

 绘制不同用户在不同时间内的骑行人数图:

def user_And_hour(bikedata1):
    fig, axes = plt.subplots()
    fig.set_size_inches(12, 20)
    hour_Transform = pd.melt(bikedata1[['hour', 'casual', 'registered', 'weekday']],id_vars=['hour', 'weekday'],value_vars=['casual', 'registered'])
    hour_Aggregated = pd.DataFrame(hour_Transform.groupby(['hour', 'variable'])['value'].mean()).reset_index()
    sn.pointplot(data=hour_Aggregated, x='hour', y='value', hue='variable', hue_order=['casual', 'registered'])
    axes.set(xlabel='时间', ylabel='骑行人数', title='不同用户在不同时间内的骑行人数')
    plt.savefig('result4.png')
    plt.show()

结果展示: 

共享单车数据可视化_第8张图片

 源代码下载:链接:https://pan.baidu.com/s/1ZJjKHMRo07xfFGk84OnDCQ    提取码:a5v0 

你可能感兴趣的:(Python,数据)