本来两三个月之前就想搞个疫情数据实时数据展示的,由于各种不可抗拒因素一而再再而三的鸽了,最近终于抽空写了一个,数据是用 Python 爬取的百度疫情实时大数据报告,请求库用的 requests,解析用的 Xpath 语法,词云用的 wordcloud 库,数据可视化用 pyecharts 绘制的地图和折线图,数据储存在 Excel 表格里面,使用 openpyxl 对表格进行处理。
本程序实现了累计确诊地图展示和每日数据变化折线图展示,其他更多数据的获取和展示均可在程序中进行拓展,可以将程序部署在服务器上,设置定时运行,即可实时展示数据,pyecharts 绘图模块也可以整合到 Web 框架(Django、Flask等)中使用。
在获取数据时有全球和境外两个概念,全球包含中国,境外不包含中国,后期绘制的四个图:中国累计确诊地图、全球累计确诊地图(包含中国)、中国每日数据折线图、境外每日数据折线图(不包含中国)。
注意项:直接向该网页发送请求获取的响应中,没有每个国家的每日数据,该数据获取的地址是:https://voice.baidu.com/newpneumonia/get?target=trend&isCaseIn=1&stage=publish
通过查看百度的疫情数据页面,可以看到很多整齐的数据,猜测就是疫情相关的数据,保存该页面,对其进行格式化,很容易可以分析出所有的数据都在 里面,其中 title 里面是一些 Unicode 编码,将其转为中文后更容易得到不同的分类数据。
由于数据繁多,可以将数据主体部分提取出来,删除一些重复项和其他杂项,留下数据大体位置并分析数据结构,便于后期的数据提取,经过处理后的数据大致结构如下:
分别将数据获取、词云图绘制、地图绘制写入三个文件:data_get()、data_wordcloud()、data_map(),然后使用一个主函数文件 main.py 来调用这三个文件里面的函数。
import data_get import data_wordcloud import data_map data_dict = data_get.init() data_get.china_total_data(data_dict) data_get.global_total_data(data_dict) data_get.china_daily_data(data_dict) data_get.foreign_daily_data(data_dict) data_wordcloud.china_wordcloud() data_wordcloud.global_wordcloud() data_map.all_map()
使用 xpath 语法 //script[@id="captain-config"]/text() 提取里面的值,利用 json.loads 方法将其转换为字典对象,以便后续的其他函数调用。
def init(): headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.13 Safari/537.36' } url = 'https://voice.baidu.com/act/newpneumonia/newpneumonia/' response = requests.get(url=url, headers=headers) tree = etree.HTML(response.text) dict1 = tree.xpath('//script[@id="captain-config"]/text()') print(type(dict1[0])) dict2 = json.loads(dict1[0]) return dict2
def china_total_data(data): """ 1、中国省/直辖市/自治区/行政区疫情数据 省/直辖市/自治区/行政区:area 现有确诊: curConfirm 累计确诊: confirmed 累计治愈: crued 累计死亡: died 现有确诊增量: curConfirmRelative 累计确诊增量: confirmedRelative 累计治愈增量: curedRelative 累计死亡增量: diedRelative """ wb = openpyxl.Workbook() # 创建工作簿 ws_china = wb.active # 获取工作表 ws_china.title = "中国省份疫情数据" # 命名工作表 ws_china.append(['省/直辖市/自治区/行政区', '现有确诊', '累计确诊', '累计治愈', '累计死亡', '现有确诊增量', '累计确诊增量', '累计治愈增量', '累计死亡增量']) china = data['component'][0]['caseList'] for province in china: ws_china.append([province['area'], province['curConfirm'], province['confirmed'], province['crued'], province['died'], province['curConfirmRelative'], province['confirmedRelative'], province['curedRelative'], province['diedRelative']]) """ 2、中国城市疫情数据 城市:city 现有确诊:curConfirm 累计确诊:confirmed 累计治愈:crued 累计死亡:died 累计确诊增量:confirmedRelative """ ws_city = wb.create_sheet('中国城市疫情数据') ws_city.append(['城市', '现有确诊', '累计确诊', '累计治愈', '累计死亡', '累计确诊增量']) for province in china: for city in province['subList']: # 某些城市没有 curConfirm 数据,则将其设置为 0,crued 和 died 为空时,替换成 0 if 'curConfirm' not in city: city['curConfirm'] = '0' if city['crued'] == '': city['crued'] = '0' if city['died'] == '': city['died'] = '0' ws_city.append([city['city'], '0', city['confirmed'], city['crued'], city['died'], city['confirmedRelative']]) """ 3、中国疫情数据更新时间:mapLastUpdatedTime """ time_domestic = data['component'][0]['mapLastUpdatedTime'] ws_time = wb.create_sheet('中国疫情数据更新时间') ws_time.column_dimensions['A'].width = 22 # 调整列宽 ws_time.append(['中国疫情数据更新时间']) ws_time.append([time_domestic]) wb.save('COVID-19-China.xlsx') print('中国疫情数据已保存至 COVID-19-China.xlsx!')
全球总数据在提取完成后,进行地图绘制时发现并没有中国的数据,因此在写入全球数据时注意要单独将中国的数据插入 Excel 中。
def global_total_data(data): """ 1、全球各国疫情数据 国家:country 现有确诊:curConfirm 累计确诊:confirmed 累计治愈:crued 累计死亡:died 累计确诊增量:confirmedRelative """ wb = openpyxl.Workbook() ws_global = wb.active ws_global.title = "全球各国疫情数据" # 按照国家保存数据 countries = data['component'][0]['caseOutsideList'] ws_global.append(['国家', '现有确诊', '累计确诊', '累计治愈', '累计死亡', '累计确诊增量']) for country in countries: ws_global.append([country['area'], country['curConfirm'], country['confirmed'], country['crued'], country['died'], country['confirmedRelative']]) # 按照洲保存数据 continent = data['component'][0]['globalList'] for area in continent: ws_foreign = wb.create_sheet(area['area'] + '疫情数据') ws_foreign.append(['国家', '现有确诊', '累计确诊', '累计治愈', '累计死亡', '累计确诊增量']) for country in area['subList']: ws_foreign.append([country['country'], country['curConfirm'], country['confirmed'], country['crued'], country['died'], country['confirmedRelative']]) # 在“全球各国疫情数据”和“亚洲疫情数据”两张表中写入中国疫情数据 ws1, ws2 = wb['全球各国疫情数据'], wb['亚洲疫情数据'] original_data = data['component'][0]['summaryDataIn'] add_china_data = ['中国', original_data['curConfirm'], original_data['confirmed'], original_data['cured'], original_data['died'], original_data['confirmedRelative']] ws1.append(add_china_data) ws2.append(add_china_data) """ 2、全球疫情数据更新时间:foreignLastUpdatedTime """ time_foreign = data['component'][0]['foreignLastUpdatedTime'] ws_time = wb.create_sheet('全球疫情数据更新时间') ws_time.column_dimensions['A'].width = 22 # 调整列宽 ws_time.append(['全球疫情数据更新时间']) ws_time.append([time_foreign]) wb.save('COVID-19-Global.xlsx') print('全球疫情数据已保存至 COVID-19-Global.xlsx!')
def china_daily_data(data): """ i_dict = data['component'][0]['trend'] i_dict['updateDate']:日期 i_dict['list'][0]:确诊 i_dict['list'][1]:疑似 i_dict['list'][2]:治愈 i_dict['list'][3]:死亡 i_dict['list'][4]:新增确诊 i_dict['list'][5]:新增疑似 i_dict['list'][6]:新增治愈 i_dict['list'][7]:新增死亡 i_dict['list'][8]:累计境外输入 i_dict['list'][9]:新增境外输入 """ ccd_dict = data['component'][0]['trend'] update_date = ccd_dict['updateDate'] # 日期 china_confirmed = ccd_dict['list'][0]['data'] # 每日累计确诊数据 china_crued = ccd_dict['list'][2]['data'] # 每日累计治愈数据 china_died = ccd_dict['list'][3]['data'] # 每日累计死亡数据 wb = openpyxl.load_workbook('COVID-19-China.xlsx') # 写入每日累计确诊数据 ws_china_confirmed = wb.create_sheet('中国每日累计确诊数据') ws_china_confirmed.append(['日期', '数据']) for data in zip(update_date, china_confirmed): ws_china_confirmed.append(data) # 写入每日累计治愈数据 ws_china_crued = wb.create_sheet('中国每日累计治愈数据') ws_china_crued.append(['日期', '数据']) for data in zip(update_date, china_crued): ws_china_crued.append(data) # 写入每日累计死亡数据 ws_china_died = wb.create_sheet('中国每日累计死亡数据') ws_china_died.append(['日期', '数据']) for data in zip(update_date, china_died): ws_china_died.append(data) wb.save('COVID-19-China.xlsx') print('中国每日累计确诊/治愈/死亡数据已保存至 COVID-19-China.xlsx!')
def foreign_daily_data(data): """ te_dict = data['component'][0]['allForeignTrend'] te_dict['updateDate']:日期 te_dict['list'][0]:累计确诊 te_dict['list'][1]:治愈 te_dict['list'][2]:死亡 te_dict['list'][3]:现有确诊 te_dict['list'][4]:新增确诊 """ te_dict = data['component'][0]['allForeignTrend'] update_date = te_dict['updateDate'] # 日期 foreign_confirmed = te_dict['list'][0]['data'] # 每日累计确诊数据 foreign_crued = te_dict['list'][1]['data'] # 每日累计治愈数据 foreign_died = te_dict['list'][2]['data'] # 每日累计死亡数据 wb = openpyxl.load_workbook('COVID-19-Global.xlsx') # 写入每日累计确诊数据 ws_foreign_confirmed = wb.create_sheet('境外每日累计确诊数据') ws_foreign_confirmed.append(['日期', '数据']) for data in zip(update_date, foreign_confirmed): ws_foreign_confirmed.append(data) # 写入累计治愈数据 ws_foreign_crued = wb.create_sheet('境外每日累计治愈数据') ws_foreign_crued.append(['日期', '数据']) for data in zip(update_date, foreign_crued): ws_foreign_crued.append(data) # 写入累计死亡数据 ws_foreign_died = wb.create_sheet('境外每日累计死亡数据') ws_foreign_died.append(['日期', '数据']) for data in zip(update_date, foreign_died): ws_foreign_died.append(data) wb.save('COVID-19-Global.xlsx') print('境外每日累计确诊/治愈/死亡数据已保存至 COVID-19-Global.xlsx!')
def china_wordcloud(): wb = openpyxl.load_workbook('COVID-19-China.xlsx') # 获取已有的xlsx文件 ws_china = wb['中国省份疫情数据'] # 获取中国省份疫情数据表 ws_china.delete_rows(1) # 删除第一行 china_dict = {} # 将省份及其累计确诊按照键值对形式储存在字典中 for data in ws_china.values: china_dict[data[0]] = int(data[2]) word_cloud = wordcloud.WordCloud(font_path='C:/Windows/Fonts/simsun.ttc', background_color='#CDC9C9', min_font_size=15, width=900, height=500) word_cloud.generate_from_frequencies(china_dict) word_cloud.to_file('WordCloud-China.png') print('中国省份疫情词云图绘制完毕!')
def global_wordcloud(): wb = openpyxl.load_workbook('COVID-19-Global.xlsx') ws_global = wb['全球各国疫情数据'] ws_global.delete_rows(1) global_dict = {} for data in ws_global.values: global_dict[data[0]] = int(data[2]) word_cloud = wordcloud.WordCloud(font_path='C:/Windows/Fonts/simsun.ttc', background_color='#CDC9C9', width=900, height=500) word_cloud.generate_from_frequencies(global_dict) word_cloud.to_file('WordCloud-Global.png') print('全球各国疫情词云图绘制完毕!')
def china_total_map(): wb = openpyxl.load_workbook('COVID-19-China.xlsx') # 获取已有的xlsx文件 ws_time = wb['中国疫情数据更新时间'] # 获取文件中中国疫情数据更新时间表 ws_data = wb['中国省份疫情数据'] # 获取文件中中国省份疫情数据表 ws_data.delete_rows(1) # 删除第一行 province = [] # 省份 curconfirm = [] # 累计确诊 for data in ws_data.values: province.append(data[0]) curconfirm.append(data[2]) time_china = ws_time['A2'].value # 更新时间 # 设置分级颜色 pieces = [ {'max': 0, 'min': 0, 'label': '0', 'color': '#FFFFFF'}, {'max': 9, 'min': 1, 'label': '1-9', 'color': '#FFE5DB'}, {'max': 99, 'min': 10, 'label': '10-99', 'color': '#FF9985'}, {'max': 999, 'min': 100, 'label': '100-999', 'color': '#F57567'}, {'max': 9999, 'min': 1000, 'label': '1000-9999', 'color': '#E64546'}, {'max': 99999, 'min': 10000, 'label': '≧10000', 'color': '#B80909'} ] # 绘制地图 ct_map = ( Map() .add(series_name='累计确诊人数', data_pair=[list(z) for z in zip(province, curconfirm)], maptype="china") .set_global_opts( title_opts=opts.TitleOpts(title="中国疫情数据(累计确诊)", subtitle='数据更新至:' + time_china + '\n\n来源:百度疫情实时大数据报告'), visualmap_opts=opts.VisualMapOpts(max_=300, is_piecewise=True, pieces=pieces) ) ) return ct_map
def global_total_map(): wb = openpyxl.load_workbook('COVID-19-Global.xlsx') ws_time = wb['全球疫情数据更新时间'] ws_data = wb['全球各国疫情数据'] ws_data.delete_rows(1) country = [] # 国家 curconfirm = [] # 累计确诊 for data in ws_data.values: country.append(data[0]) curconfirm.append(data[2]) time_global = ws_time['A2'].value # 更新时间 # 国家名称中英文映射表 name_map = { "Somalia": "索马里", "Liechtenstein": "列支敦士登", "Morocco": "摩洛哥", "W. Sahara": "西撒哈拉", "Serbia": "塞尔维亚", "Afghanistan": "阿富汗", "Angola": "安哥拉", "Albania": "阿尔巴尼亚", "Andorra": "安道尔共和国", "United Arab Emirates": "阿拉伯联合酋长国", "Argentina": "阿根廷", "Armenia": "亚美尼亚", "Australia": "澳大利亚", "Austria": "奥地利", "Azerbaijan": "阿塞拜疆", "Burundi": "布隆迪", "Belgium": "比利时", "Benin": "贝宁", "Burkina Faso": "布基纳法索", "Bangladesh": "孟加拉国", "Bulgaria": "保加利亚", "Bahrain": "巴林", "Bahamas": "巴哈马", "Bosnia and Herz.": "波斯尼亚和黑塞哥维那", "Belarus": "白俄罗斯", "Belize": "伯利兹", "Bermuda": "百慕大", "Bolivia": "玻利维亚", "Brazil": "巴西", "Barbados": "巴巴多斯", "Brunei": "文莱", "Bhutan": "不丹", "Botswana": "博茨瓦纳", "Central African Rep.": "中非共和国", "Canada": "加拿大", "Switzerland": "瑞士", "Chile": "智利", "China": "中国", "Côte d'Ivoire": "科特迪瓦", "Cameroon": "喀麦隆", "Dem. Rep. Congo": "刚果(布)", "Congo": "刚果(金)", "Colombia": "哥伦比亚", "Cape Verde": "佛得角", "Costa Rica": "哥斯达黎加", "Cuba": "古巴", "N. Cyprus": "北塞浦路斯", "Cyprus": "塞浦路斯", "Czech Rep.": "捷克", "Germany": "德国", "Djibouti": "吉布提", "Denmark": "丹麦", "Dominican Rep.": "多米尼加", "Algeria": "阿尔及利亚", "Ecuador": "厄瓜多尔", "Egypt": "埃及", "Eritrea": "厄立特里亚", "Spain": "西班牙", "Estonia": "爱沙尼亚", "Ethiopia": "埃塞俄比亚", "Finland": "芬兰", "Fiji": "斐济", "France": "法国", "Gabon": "加蓬", "United Kingdom": "英国", "Georgia": "格鲁吉亚", "Ghana": "加纳", "Guinea": "几内亚", "Gambia": "冈比亚", "Guinea-Bissau": "几内亚比绍", "Eq. Guinea": "赤道几内亚", "Greece": "希腊", "Grenada": "格林纳达", "Greenland": "格陵兰岛", "Guatemala": "危地马拉", "Guam": "关岛", "Guyana": "圭亚那合作共和国", "Honduras": "洪都拉斯", "Croatia": "克罗地亚", "Haiti": "海地", "Hungary": "匈牙利", "Indonesia": "印度尼西亚", "India": "印度", "Br. Indian Ocean Ter.": "英属印度洋领土", "Ireland": "爱尔兰", "Iran": "伊朗", "Iraq": "伊拉克", "Iceland": "冰岛", "Israel": "以色列", "Italy": "意大利", "Jamaica": "牙买加", "Jordan": "约旦", "Japan": "日本", "Siachen Glacier": "锡亚琴冰川", "Kazakhstan": "哈萨克斯坦", "Kenya": "肯尼亚", "Kyrgyzstan": "吉尔吉斯斯坦", "Cambodia": "柬埔寨", "Korea": "韩国", "Kuwait": "科威特", "Lao PDR": "老挝", "Lebanon": "黎巴嫩", "Liberia": "利比里亚", "Libya": "利比亚", "Sri Lanka": "斯里兰卡", "Lesotho": "莱索托", "Lithuania": "立陶宛", "Luxembourg": "卢森堡", "Latvia": "拉脱维亚", "Moldova": "摩尔多瓦", "Madagascar": "马达加斯加", "Mexico": "墨西哥", "Macedonia": "马其顿", "Mali": "马里", "Malta": "马耳他", "Myanmar": "缅甸", "Montenegro": "黑山", "Mongolia": "蒙古国", "Mozambique": "莫桑比克", "Mauritania": "毛里塔尼亚", "Mauritius": "毛里求斯", "Malawi": "马拉维", "Malaysia": "马来西亚", "Namibia": "纳米比亚", "New Caledonia": "新喀里多尼亚", "Niger": "尼日尔", "Nigeria": "尼日利亚", "Nicaragua": "尼加拉瓜", "Netherlands": "荷兰", "Norway": "挪威", "Nepal": "尼泊尔", "New Zealand": "新西兰", "Oman": "阿曼", "Pakistan": "巴基斯坦", "Panama": "巴拿马", "Peru": "秘鲁", "Philippines": "菲律宾", "Papua New Guinea": "巴布亚新几内亚", "Poland": "波兰", "Puerto Rico": "波多黎各", "Dem. Rep. Korea": "朝鲜", "Portugal": "葡萄牙", "Paraguay": "巴拉圭", "Palestine": "巴勒斯坦", "Qatar": "卡塔尔", "Romania": "罗马尼亚", "Russia": "俄罗斯", "Rwanda": "卢旺达", "Saudi Arabia": "沙特阿拉伯", "Sudan": "苏丹", "S. Sudan": "南苏丹", "Senegal": "塞内加尔", "Singapore": "新加坡", "Solomon Is.": "所罗门群岛", "Sierra Leone": "塞拉利昂", "El Salvador": "萨尔瓦多", "Suriname": "苏里南", "Slovakia": "斯洛伐克", "Slovenia": "斯洛文尼亚", "Sweden": "瑞典", "Swaziland": "斯威士兰", "Seychelles": "塞舌尔", "Syria": "叙利亚", "Chad": "乍得", "Togo": "多哥", "Thailand": "泰国", "Tajikistan": "塔吉克斯坦", "Turkmenistan": "土库曼斯坦", "Timor-Leste": "东帝汶", "Tonga": "汤加", "Trinidad and Tobago": "特立尼达和多巴哥", "Tunisia": "突尼斯", "Turkey": "土耳其", "Tanzania": "坦桑尼亚", "Uganda": "乌干达", "Ukraine": "乌克兰", "Uruguay": "乌拉圭", "United States": "美国", "Uzbekistan": "乌兹别克斯坦", "Venezuela": "委内瑞拉", "Vietnam": "越南", "Vanuatu": "瓦努阿图", "Yemen": "也门", "South Africa": "南非", "Zambia": "赞比亚", "Zimbabwe": "津巴布韦", "Aland": "奥兰群岛", "American Samoa": "美属萨摩亚", "Fr. S. Antarctic Lands": "南极洲", "Antigua and Barb.": "安提瓜和巴布达", "Comoros": "科摩罗", "Curaçao": "库拉索岛", "Cayman Is.": "开曼群岛", "Dominica": "多米尼加", "Falkland Is.": "福克兰群岛马尔维纳斯", "Faeroe Is.": "法罗群岛", "Micronesia": "密克罗尼西亚", "Heard I. and McDonald Is.": "赫德岛和麦克唐纳群岛", "Isle of Man": "曼岛", "Jersey": "泽西岛", "Kiribati": "基里巴斯", "Saint Lucia": "圣卢西亚", "N. Mariana Is.": "北马里亚纳群岛", "Montserrat": "蒙特塞拉特", "Niue": "纽埃", "Palau": "帕劳", "Fr. Polynesia": "法属波利尼西亚", "S. Geo. and S. Sandw. Is.": "南乔治亚岛和南桑威奇群岛", "Saint Helena": "圣赫勒拿", "St. Pierre and Miquelon": "圣皮埃尔和密克隆群岛", "São Tomé and Principe": "圣多美和普林西比", "Turks and Caicos Is.": "特克斯和凯科斯群岛", "St. Vin. and Gren.": "圣文森特和格林纳丁斯", "U.S. Virgin Is.": "美属维尔京群岛", "Samoa": "萨摩亚" } pieces = [ {'max': 0, 'min': 0, 'label': '0', 'color': '#FFFFFF'}, {'max': 49, 'min': 1, 'label': '1-49', 'color': '#FFE5DB'}, {'max': 99, 'min': 50, 'label': '50-99', 'color': '#FFC4B3'}, {'max': 999, 'min': 100, 'label': '100-999', 'color': '#FF9985'}, {'max': 9999, 'min': 1000, 'label': '1000-9999', 'color': '#F57567'}, {'max': 99999, 'min': 10000, 'label': '10000-99999', 'color': '#E64546'}, {'max': 999999, 'min': 100000, 'label': '100000-999999', 'color': '#B80909'}, {'max': 9999999, 'min': 1000000, 'label': '≧1000000', 'color': '#8A0808'} ] gt_map = ( Map() .add(series_name='累计确诊人数', data_pair=[list(z) for z in zip(country, curconfirm)], maptype="world", name_map=name_map, is_map_symbol_show=False) .set_series_opts(label_opts=opts.LabelOpts(is_show=False)) .set_global_opts( title_opts=opts.TitleOpts(title="全球疫情数据(累计确诊)", subtitle='数据更新至:' + time_global + '\n\n来源:百度疫情实时大数据报告'), visualmap_opts=opts.VisualMapOpts(max_=300, is_piecewise=True, pieces=pieces), ) ) return gt_map
此文为转载文章,如有侵权,联系小编速删!
由于篇幅有限,所以下部分文章内容 和此项目完整代码 加下源码群:1136192749