数据:高密,莱阳等十个地区的气温数据,6月17日当天分时段的温度
二 单城市温度可视化
我们选择城市莱西,使用pandas对其数据进行加工整理,使用matplot进行可视化展示,并且保存svg图片
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil import parser
df_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')
y1=df_laixi['temp']
x1=df_laixi['day']
dayoflaixi=[ parser.parse(x) for x in x1 ]
fig,ax=plt.subplots()
plt.xticks(rotation=70)
hours=mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(hours)
ax.plot(dayoflaixi,y1,'r')
plt.show()
plt.savefig('E:/wea/WeatherData/laixi.svg')
莱西的温度日走势图
在下午两点到六点之间,出现最高气温tm
三 判断当日最高气温tm和距离s是否存在显性关系
我们选取三个最近的城市,威海,牟平,烟台和三个最远的城市,莱西,平度,高密.分析这两组城市的最高气温走势.
#!/usr/bin/env python
# encoding: utf-8
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil import parser
#读入文件
df_weihai = pd.read_csv('E:/wea/WeatherData/weihai.csv')
df_mouping = pd.read_csv('E:/wea/WeatherData/mouping.csv')
df_yantai= pd.read_csv('E:/wea/WeatherData/yantai.csv')
df_pingdu = pd.read_csv('E:/wea/WeatherData/pingdu.csv')
df_gaomi = pd.read_csv('E:/wea/WeatherData/gaomi.csv')
df_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')
#y,x轴读取数据
y1 = df_weihai['temp']
x1 = df_weihai['day']
y2 = df_mouping['temp']
x2 = df_mouping['day']
y3 = df_yantai['temp']
x3 = df_yantai['day']
y4 = df_laixi['temp']
x4 = df_laixi['day']
y5 = df_pingdu['temp']
x5 = df_pingdu['day']
y6 = df_gaomi['temp']
x6 = df_gaomi['day']
#把日期从string转化成datetime
day_weihai = [parser.parse(x) for x in x1]
day_mouping = [parser.parse(x) for x in x2]
day_yantai = [parser.parse(x) for x in x3]
dat_laixi= [parser.parse(x) for x in x4]
day_pingdu = [parser.parse(x) for x in x5]
day_gaomi = [parser.parse(x) for x in x6]
#绘图,调用subplots()定义fig,ax变量
fig,ax=plt.subplots()
plt.xticks(rotation=70)
hours=mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(hours)
#绘图,在一个面板里面画六条线
ax.plot(day_weihai,y1,'r',day_mouping,y2,'r',day_yantai,y3,'r')
ax.plot(dat_laixi,y4,'g',day_pingdu,y5,'g',day_gaomi,y6,'g')
fig
plt.show()
我们得到了阳性的结论,三个最远城市的最高气温明显高于三个距离海岸最近城市的最高气温.说明tm和s存在显性的关系.
四 最高气温和距离s的定量描述
选取十个城市的最高气温,绘制tm/s散点图
#!/usr/bin/env python
# encoding: utf-8
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil import parser
df_gaomi = pd.read_csv('E:/wea/WeatherData/gaomi.csv')
df_laixi = pd.read_csv('E:/wea/WeatherData/laixi.csv')
df_laiyang = pd.read_csv('E:/wea/WeatherData/laiyang.csv')
df_longkou = pd.read_csv('E:/wea/WeatherData/longkou.csv')
df_mouping = pd.read_csv('E:/wea/WeatherData/mouping.csv')
df_pingdu = pd.read_csv('E:/wea/WeatherData/pingdu.csv')
df_qixia= pd.read_csv('E:/wea/WeatherData/qixia.csv')
df_weihai = pd.read_csv('E:/wea/WeatherData/weihai.csv')
df_yantai= pd.read_csv('E:/wea/WeatherData/yantai.csv')
df_zhaoyuan = pd.read_csv('E:/wea/WeatherData/zhaoyuan.csv')
#dist列表,读取城市的s距离
dist = [df_weihai['dist'][0],
df_yantai['dist'][0],
df_mouping['dist'][0],
df_zhaoyuan['dist'][0],
df_longkou['dist'][0],
df_qixia['dist'][0],
df_laiyang['dist'][0],
df_laixi['dist'][0],
df_pingdu['dist'][0],
df_gaomi['dist'][0]
]
#temp_max列表,,存放每个城市的最高气温
temp_max = [df_weihai['temp'].max(),
df_yantai['temp'].max(),
df_mouping['temp'].max(),
df_zhaoyuan['temp'].max(),
df_longkou['temp'].max(),
df_qixia['temp'].max(),
df_laiyang['temp'].max(),
df_laixi['temp'].max(),
df_pingdu['temp'].max(),
df_gaomi['temp'].max()
]
#temp_min 存放每个城市的最低气温
temp_min = [df_weihai['temp'].min(),
df_yantai['temp'].min(),
df_mouping['temp'].min(),
df_zhaoyuan['temp'].min(),
df_longkou['temp'].min(),
df_qixia['temp'].min(),
df_laiyang['temp'].min(),
df_laixi['temp'].min(),
df_pingdu['temp'].min(),
df_gaomi['temp'].min()
]
#绘制最高气温/s关系图
fig,ax =plt.subplots()
ax.plot(dist,temp_max,'ro')
fig
plt.show()
由散点图,我们看到在100km以内,tm和距离s存在近似的线性关系,在100km以后,关系发生改变
五 scikit回归分析
由上面的分析,我们假定两个线性相关,用scikit_learn来模拟两条线的走势.
#新建两个列表 dist1靠近海,dist2远离海洋
dist1=dist[0:5]
dist2=dist[5:10]
dist1=[[x] for x in dist1]
dist2=[[x] for x in dist2]
temp_m1=temp_max[0:5]
temp_m2=temp_max[5:10]
#调用svr函数,在参数中规定linear线性拟合,c是拟合度
svr_lin1=SVR(kernel='linear',C=1e3)
svr_lin2=SVR(kernel='linear',C=1e3)
svr_lin1.fit(dist1,temp_m1)
svr_lin2.fit(dist2,temp_m2)
xp1=np.arange(10,100,10).reshape((9,1))
xp2=np.arange(50,400,50).reshape((7,1))
yp1=svr_lin1.predict(xp1)
yp2=svr_lin2.predict(xp2)
fig,ax=plt.subplots()
ax.set_xlim(0,400)
ax.plot(xp1,yp1,c='b',label='strong sea weather')
ax.plot(xp2,yp2,c='g',label='low sea weather')
fig
plt.show()
拟合后的曲线,看到在50公里附近,两个线出现了交叉,说明海洋气候对最高气温的影响,在50公里附近.
我们用y=ax+b来描述两条直线
print svr_lin1.coef_ #斜率
print svr_lin1.intercept_ #截距
print svr_lin2.coef_ #斜率
print svr_lin2.intercept_ #截距
输出结果:
结论:
在山东半岛,距离海岸线50公里以内,当地最高气温tm受海洋气候影响,和距离海岸线的距离s(km)近似满足:
tm=0.04794118s+27.65617647
距离50公里以后,近似满足:
tm=0.00401274s+29.98745223