【Python与数据分析实验报告】Matplotlib绘图库的应用

目录

  • 引子
  • 实验内容
  • 实验代码
  • 吐槽

引子

同校的文科生期末考都考的差不多了,某些cs学子手上还捧着一堆的实验报告。唯一值得庆幸的是,这是复习前的最后一篇实验报告了QwQ

实验内容

给定数据文件owid-covid-data.csv,其中记录了全球各个国家自Covid-19疫情爆发以来的疑似、确诊人数、疫苗接种人数等数据。请参考所给的示例,比特币价格变化的动态可视化(见notebook 文件’matplotlib动态可视化 - 比特币价格变化.ipynb’),根据所给定的数据文件owid-covid-data.csv绘制出某个国家(地区)Covid-19疫情确诊人数变化的动态可视化图。

实验代码

上正式代码前,先来看看有哪些国家(地区)可以选⚆_⚆

df.location.unique()

array([‘Afghanistan’, ‘Africa’, ‘Albania’, ‘Algeria’, ‘Andorra’, ‘Angola’,‘Anguilla’, ‘Antigua and Barbuda’, ‘Argentina’, ‘Armenia’, ‘Aruba’,‘Asia’, ‘Australia’, ‘Austria’, ‘Azerbaijan’, ‘Bahamas’, ‘Bahrain’,‘Bangladesh’, ‘Barbados’, ‘Belarus’, ‘Belgium’, ‘Belize’, ‘Benin’,‘Bermuda’, ‘Bhutan’, ‘Bolivia’, ‘Bonaire Sint Eustatius and Saba’,‘Bosnia and Herzegovina’, ‘Botswana’, ‘Brazil’,‘British Virgin Islands’, ‘Brunei’, ‘Bulgaria’, ‘Burkina Faso’,‘Burundi’, ‘Cambodia’, ‘Cameroon’, ‘Canada’, ‘Cape Verde’,‘Cayman Islands’, ‘Central African Republic’, ‘Chad’, ‘Chile’, ‘China’, ‘Colombia’, ‘Comoros’, ‘Congo’, ‘Cook Islands’, ‘Costa Rica’, “Cote d’Ivoire”, ‘Croatia’, ‘Cuba’, ‘Curacao’,‘Cyprus’, ‘Czechia’, ‘Democratic Republic of Congo’, ‘Denmark’,‘Djibouti’, ‘Dominica’, ‘Dominican Republic’, ‘Ecuador’, ‘Egypt’, ‘El Salvador’, ‘Equatorial Guinea’, ‘Eritrea’, ‘Estonia’,‘Eswatini’, ‘Ethiopia’, ‘Europe’, ‘European Union’, ‘Faeroe Islands’, ‘Falkland Islands’, ‘Fiji’, ‘Finland’, ‘France’, ‘French Polynesia’, ‘Gabon’, ‘Gambia’, ‘Georgia’, ‘Germany’, ‘Ghana’, ‘Gibraltar’, ‘Greece’, ‘Greenland’, ‘Grenada’, ‘Guam’, ‘Guatemala’, ‘Guernsey’, ‘Guinea’, ‘Guinea-Bissau’, ‘Guyana’, ‘Haiti’, ‘High income’, ‘Honduras’, ‘Hong Kong’, ‘Hungary’, ‘Iceland’, ‘India’, ‘Indonesia’, ‘International’, ‘Iran’, ‘Iraq’, ‘Ireland’, ‘Isle of Man’, ‘Israel’, ‘Italy’, ‘Jamaica’, ‘Japan’, ‘Jersey’, ‘Jordan’, ‘Kazakhstan’, ‘Kenya’, ‘Kiribati’, ‘Kosovo’, ‘Kuwait’, ‘Kyrgyzstan’, ‘Laos’, ‘Latvia’, ‘Lebanon’, ‘Lesotho’, ‘Liberia’, ‘Libya’, ‘Liechtenstein’, ‘Lithuania’, ‘Low income’, ‘Lower middle income’, ‘Luxembourg’, ‘Macao’, ‘Madagascar’, ‘Malawi’, ‘Malaysia’, ‘Maldives’, ‘Mali’, ‘Malta’, ‘Marshall Islands’, ‘Mauritania’, ‘Mauritius’, ‘Mexico’, ‘Micronesia (country)’, ‘Moldova’, ‘Monaco’, ‘Mongolia’, ‘Montenegro’, ‘Montserrat’, ‘Morocco’, ‘Mozambique’, ‘Myanmar’, ‘Namibia’, ‘Nauru’, ‘Nepal’, ‘Netherlands’, ‘New Caledonia’, ‘New Zealand’, ‘Nicaragua’, ‘Niger’, ‘Nigeria’, ‘Niue’, ‘North America’, ‘North Korea’, ‘North Macedonia’, ‘Northern Cyprus’, ‘Northern Mariana Islands’, ‘Norway’, ‘Oceania’, ‘Oman’, ‘Pakistan’, ‘Palau’, ‘Palestine’, ‘Panama’, ‘Papua New Guinea’, ‘Paraguay’, ‘Peru’, ‘Philippines’, ‘Pitcairn’, ‘Poland’, ‘Portugal’, ‘Puerto Rico’, ‘Qatar’, ‘Romania’, ‘Russia’, ‘Rwanda’, ‘Saint Helena’, ‘Saint Kitts and Nevis’, ‘Saint Lucia’, ‘Saint Pierre and Miquelon’, ‘Saint Vincent and the Grenadines’, ‘Samoa’, ‘San Marino’, ‘Sao Tome and Principe’, ‘Saudi Arabia’,‘Senegal’, ‘Serbia’, ‘Seychelles’, ‘Sierra Leone’, ‘Singapore’, ‘Sint Maarten (Dutch part)’, ‘Slovakia’, ‘Slovenia’, ‘Solomon Islands’, ‘Somalia’, ‘South Africa’, ‘South America’, ‘South Korea’, ‘South Sudan’, ‘Spain’, ‘Sri Lanka’, ‘Sudan’,‘Suriname’, ‘Sweden’, ‘Switzerland’, ‘Syria’, ‘Taiwan’,‘Tajikistan’, ‘Tanzania’, ‘Thailand’, ‘Timor’, ‘Togo’, ‘Tokelau’, ‘Tonga’, ‘Trinidad and Tobago’, ‘Tunisia’, ‘Turkey’, ‘Turkmenistan’, ‘Turks and Caicos Islands’, ‘Tuvalu’, ‘Uganda’, ‘Ukraine’, ‘United Arab Emirates’, ‘United Kingdom’, ‘United States’, ‘United States Virgin Islands’, ‘Upper middle income’, ‘Uruguay’, ‘Uzbekistan’, ‘Vanuatu’, ‘Vatican’, ‘Venezuela’, ‘Vietnam’, ‘Wallis and Futuna’,‘Western Sahara’, ‘World’, ‘Yemen’, ‘Zambia’, ‘Zimbabwe’],dtype=object)

这里的国家(地区)有亿点点多了,就随机挑一个Somalia吧。
下面开始干活

#导包并设置参数
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib
from IPython.display import HTML
from matplotlib import cm

matplotlib.rcParams['animation.embed_limit'] = 2**128 #设置动画大小限制,一开始没设就爆限制了
plt.rcParams['font.sans-serif'] = ['SimHei']  
plt.rcParams['axes.unicode_minus'] = False  
plt.rc('axes',axisbelow=True) 
df = pd.read_csv('owid-covid-data.csv')
df_Somalia = df[df.location=='Somalia'].loc[:,('date', 'total_cases')]
df_Somalia.head()
df_Somalia

【Python与数据分析实验报告】Matplotlib绘图库的应用_第1张图片
这里我们会发现,筛出来的Somalia数据,它的索引还是原来的索引。
那么,我们只能给它整理一番了。

#重置索引
df_Somalia=df_Somalia.reset_index(drop=True)
df_Somalia

【Python与数据分析实验报告】Matplotlib绘图库的应用_第2张图片
全部的数据已经整理好了,总共806条,我们挑选150条看一看效果。

#验证截取150天的区间
Span=150
N_Span=0
df_temp=df_Somalia.loc[N_Span*Span:(N_Span+1)*Span,:]
df_temp

【Python与数据分析实验报告】Matplotlib绘图库的应用_第3张图片
取一部分没问题,那我们再来看看全局可视化的样子

#验证全时段的疫情视图
Span=806
N_Span=0
df_temp=df_Somalia.loc[N_Span*Span:(N_Span+1)*Span,:]
fig, ax = plt.subplots(figsize=(6,4), dpi=100)
plt.subplots_adjust(left=0.12, right=0.98, top=0.85, bottom=0.1,hspace=0,wspace=0)
plt.fill_between(df_temp.date.values, y1=df_temp.total_cases.values, y2=0,alpha=0.75, facecolor='r', linewidth=1,edgecolor ='none',zorder=1)
plt.plot(df_temp.date, df_temp.total_cases, color='k',zorder=2)
plt.scatter(df_temp.date.values[-1], df_temp.total_cases.values[-1], color='white',s=150,edgecolor ='k',linewidth=2,zorder=3)
plt.text(df_temp.date.values[-1], df_temp.total_cases.values[-1]*1.18,s=np.round(df_temp.total_cases.values[-1],1),size=10,ha='center', va='top')
plt.ylim(0, df_temp.total_cases.max()*1.68)
plt.xticks(ticks=df_temp.date.values[0:Span+1:150],labels=df_temp.date.values[0:Span+1:150],rotation=0)
plt.margins(x=0.01)
ax = plt.gca()#获取边框
ax.spines['top'].set_color('none')   # 设置上‘脊梁’为无色
ax.spines['right'].set_color('none')  # 设置上‘脊梁’为无色
ax.spines['left'].set_color('none')   # 设置上‘脊梁’为无色
plt.grid(axis="y",c=(217/256,217/256,217/256),linewidth=1)   #设置网格线   
plt.show()

【Python与数据分析实验报告】Matplotlib绘图库的应用_第4张图片
我们的目的是要动态可视化,实际上,动态图片也是一帧一帧的图片拼接而成的,所以我们要先弄个可以获得一帧的工具(函数)

#定义函数:150天的区间视图
def draw_areachart(Num_Date):
    Span_Date=150
    ax.clear()
    
    if Num_Date<Span_Date:
        df_temp=df_Somalia.loc[0:Num_Date,:]
        df_span=df_Somalia.loc[0:Span_Date,:]
        colors = cm.Spectral_r(df_span.total_cases.values / float(max(df_span.total_cases.values)))
        plt.bar(df_temp.date.values,df_temp.total_cases.values,color=colors,width=1.5,align="center",zorder=1)
        plt.plot(df_temp.date, df_temp.total_cases, color='k',zorder=2)
        plt.scatter(df_temp.date.values[-1], df_temp.total_cases.values[-1], color='white',s=150,edgecolor ='k',linewidth=2,zorder=3)
        plt.text(df_temp.date.values[-1], df_temp.total_cases.values[-1]*1.18,s=np.round(df_temp.total_cases.values[-1],1),
size=10,ha='center', va='top')
        plt.ylim(0, df_span.total_cases.max()*1.68)
        plt.xlim(df_span.date.values[0], df_span.date.values[-1])
        plt.xticks(ticks=df_span.date.values[0:Span_Date+1:30],labels=df_span.date.values[0:Span_Date+1:30],rotation=0,fontsize=9)
    else:
        df_temp=df_Somalia.loc[Num_Date-Span_Date:Num_Date,:]
        colors = cm.Spectral_r(df_temp.total_cases / float(max(df_temp.total_cases)))
        plt.bar(df_temp.date.values[:-2],df_temp.total_cases.values[:-2],color=colors[:-2],width=1.5,align="center",zorder=1)
        plt.plot(df_temp.date[:-2], df_temp.total_cases[:-2], color='k',zorder=2)
        plt.scatter(df_temp.date.values[-4], df_temp.total_cases.values[-4], color='white',s=150,edgecolor ='k',linewidth=2,zorder=3)
        plt.text(df_temp.date.values[-1], df_temp.total_cases.values[-1]*1.18,s=np.round(df_temp.total_cases.values[-1],1),
size=10,ha='center', va='top')
        plt.ylim(0, df_temp.total_cases.max()*1.68)
        plt.xlim(df_temp.date.values[0], df_temp.date.values[-1])
        plt.xticks(ticks=df_temp.date.values[0:Span_Date+1:30],labels=df_temp.date.values[0:Span_Date+1:30],rotation=0,fontsize=9)   
    plt.margins(x=0.2)
    ax.spines['top'].set_color('none')  # 设置上‘脊梁’为红色
    ax.spines['right'].set_color('none')  # 设置上‘脊梁’为无色
    ax.spines['left'].set_color('none')  # 设置上‘脊梁’为无色
    plt.grid(axis="y",c=(217/256,217/256,217/256),linewidth=1)         #设置网格线  
    plt.text(0.01, 0.95,"确诊人数(人)",transform=ax.transAxes, size=10, weight='light', ha='left')
    ax.text(-0.07, 1.03, '2020年到2022年Somalia新冠疫情确诊情况',transform=ax.transAxes, size=17, weight='light', ha='left')

fig, ax = plt.subplots(figsize=(6,4), dpi=100)
plt.subplots_adjust(top=1,bottom=0.1,left=0.1,right=0.9,hspace=0,wspace=0)  
draw_areachart(805)

【Python与数据分析实验报告】Matplotlib绘图库的应用_第5张图片
最后,开始拼动图,这个过程正常情况需要几分钟,耐心等一等就好,建议这个时候让它慢慢跑着,别去管它。

#可视化
fig, ax = plt.subplots(figsize=(6,4), dpi=100)
plt.subplots_adjust(left=0.12, right=0.98, top=0.85, bottom=0.1,hspace=0,wspace=0)  
animator = animation.FuncAnimation(fig, draw_areachart, frames=np.arange(0,df_Somalia.shape[0],1),interval=100)
HTML(animator.to_jshtml()) 

【Python与数据分析实验报告】Matplotlib绘图库的应用_第6张图片
【Python与数据分析实验报告】Matplotlib绘图库的应用_第7张图片

吐槽

明明是个很简单的实验,但是有个大冤种一开始忘记改animation.FuncAnimation的参数,跑了19万条数据的全集跑了一上午,后来改正后跑806条数据还是跑了一个半小时,简直人间惨案,不过后续重启jupyter notebook的内核后再跑就变成几分钟的了。只能说,这是一个人间惨案故事。顺带一提,这个动图大小会超过默认的限制20M,没法把数据全部载入动图之中,所以要在开头对animation的embed_limit属性进行设置。

你可能感兴趣的:(学科资料,python,数据分析)