本文章从https://blog.csdn.net/xufive/article/details/104093197上获取灵感并用自己的方式进行改进,提升精度并使用更直接的热力图方式进行可视化
python版本:3.7.5,另安装了pyecharts地图扩展包,安装教程详见百度,不再赘述
BME方向,非计算机专业,代码如有可优化之处请与我联系,联系方式在最下方
首先放出成果(数据为2020-02-04 10:28:21全国确诊人数):
这篇文章里只谈数据的采集和分析绘图,不表达对疫情的任何看法
就是你了,腾讯疫情实时追踪
Firefox使用F12获取有用的json
观察url,我们根据前人的经验,不去理会后面的时间戳,从此爬虫模块轻易完成,代码如下:
import datetime
import json
import requests
def catch():
url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5'
data = json.loads(requests.get(url=url).json()['data'])
distb = data['areaTree'][0]
growth = data['chinaDayList']
provinces = distb['children']
cityTotalConfirm = {}
cityTotalSuspect = {}
cityTotalDead = {}
cityTotalHeal = {}
cityTodayConfirm = {}
cityTodaySuspect = {}
cityTodayDead = {}
cityTodayHeal = {}
for province in provinces:
for city in province['children']:
cityTotalConfirm.update({city['name']: city['total']['confirm']})
cityTotalSuspect.update({city['name']: city['total']['suspect']})
cityTotalDead.update({city['name']: city['total']['dead']})
cityTotalHeal.update({city['name']: city['total']['heal']})
cityTodayConfirm.update({city['name']: city['today']['confirm']})
cityTodaySuspect.update({city['name']: city['today']['suspect']})
cityTodayDead.update({city['name']: city['today']['dead']})
cityTodayHeal.update({city['name']: city['today']['heal']})
return cityTodayConfirm, \
cityTodaySuspect, \
cityTodayDead, \
cityTodayHeal, \
cityTotalConfirm, \
cityTotalSuspect, \
cityTotalDead, \
cityTotalHeal
至此,获取了今日、总体上确诊、疑似、死亡、治愈共八个数据组,和前人不同的是,数据上我精确到了市级
这里我为了方便使用了pyechart,毕竟人生苦短就不跟工具较劲了
import numpy as np
from pyecharts import options as opts
from pyecharts.charts import Geo
from pyecharts.globals import ChartType
def geo_heatmap(k, v) -> Geo:
try:
c = Geo()
c.add_schema(maptype="china")
c.add(
"ratio",
[list(z) for z in zip(k, v)],
type_=ChartType.HEATMAP,
)
except TypeError:
print("地址有误,开始排错......")
with open("place.txt", mode="a") as f:
i = 0
c = Geo()
c.add_schema(maptype="china")
while i < len(k):
try:
c.add(
"ratio",
[list(z) for z in zip(k[:i], v[:i])],
type_=ChartType.HEATMAP,
)
except:
print(k[i - 1])
f.writelines(k[i - 1])
f.writelines('\n')
del k[i - 1]
del v[i - 1]
i -= 1
continue
else:
i += 1
geo_heatmap(k,v)
else:
c.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
timen = datetime.datetime.now().strftime('%Y-%m-%d')
timea = datetime.datetime.now().strftime('%Y-%m-%d %H.%M.%S')
c.set_global_opts(visualmap_opts=opts.VisualMapOpts(),
title_opts=opts.TitleOpts(title="2019nCoV{} BY PZW".format(timen))
)
Geo.render(c, path='{}HEATMAP.html'.format(timea))
print('渲染完毕')
其中爬到腾讯的地址有一些和pyecharts的地址映射并不对口,于是中间那个排错步骤诞生了,由于本人只是业余玩玩,也不知道有什么优化的方法,就这样吧。
在第一次排错以后会建立未成功地名的文件,在每一次渲染之前,都会读取未成功地名并作出排除以避免出错
以下是读取代码:
def Rep():
lines = []
with open("place.txt", mode='r', encoding='gbk') as ef:
while True:
line = ef.readline()
if not line:
break
line = line.strip('\n')
lines.append(line)
return lines
然后就是主调用程序:
def main(raw=0):
tdc, tds, tdd, tdh, ttc, tts, ttd, tth = catch()
cities, numbers = list(ttc.keys()), list(tdc.values())
k, v = [], []
ep = Rep()
for i in range(len(cities)):
if cities[i] != '地区待确认' and cities[i] not in ep:
k.append(cities[i])
v.append(numbers[i])
else:
continue
if raw == 0:
v = np.array(v)
v = np.log2(v+1)*10
v = list(v)
else:
pass
geo_heatmap(k, v)
if __name__ == '__main__':
main()
其中我传给热力图的数据是ttc,即总确诊人数,可以根据不同的需要改变
在传入数据时,由于传染病学的指数模型的影响,各地感染人数间差异过大,为了让热力图拥有较为合适的对比度,我设置了raw==0的情况,即不适用原数据而对它进行对数化处理并加以权数10,当然,raw=1时采用原数据也行。
得到结果如下(2020-02-04 10.28.21):
至此绘制完毕,过程中如有侵权请与我联系
Github源码已上传:
https://github.com/A-nnonymous/2019nCoV_HEATMAP
QQ:617428699