Date | Total number of license issued | lowest price | avg price | Total number of applicants |
---|---|---|---|---|
日期 | 颁发许可证总数 | 最低价格 | 平均价格 | 申请人总数 |
2-Jan | 1400 | 13600 | 14735 | 3718 |
2-Feb | 1800 | 13100 | 14057 | 4590 |
2-Mar | 2000 | 14300 | 14662 | 5190 |
问题
(1) 哪一次拍卖的中标率首次小于 5%?
idf1 = df[df['Total number of license issued'] / df['Total number of applicants'] < 0.05]
print(idf1.index[0])
结果:
159
(2) 按年统计拍卖最低价的下列统计量:最大值、均值、0.75 分位数,要求
显示在同一张表上。(先进行(3))
group_year = idf2.drop(columns = ['Month',
'Total number of license issued','avg price',
'Total number of applicants']).groupby('Year')
idf3 = pd.DataFrame()
idf3['max'] = group_year['lowest price '].max()
idf3['mean'] = group_year['lowest price '].mean()
idf3['quantile'] = group_year['lowest price '].quantile(0.75)
print(idf3)
结果:
max | mean | quantile | |
---|---|---|---|
Year | |||
2002 | 30800 | 20316.666667 | 24300.0 |
2003 | 38500 | 31983.333333 | 36300.0 |
2004 | 44200 | 29408.333333 | 38400.0 |
2005 | 37900 | 31908.333333 | 35600.0 |
2006 | 39900 | 37058.333333 | 39525.0 |
2007 | 53800 | 45691.666667 | 48950.0 |
2008 | 37300 | 29945.454545 | 34150.0 |
2009 | 36900 | 31333.333333 | 34150.0 |
2010 | 44900 | 38008.333333 | 41825.0 |
2011 | 53800 | 47958.333333 | 51000.0 |
2012 | 68900 | 61108.333333 | 65325.0 |
2013 | 90800 | 79125.000000 | 82550.0 |
2014 | 74600 | 73816.666667 | 74000.0 |
2015 | 85300 | 80575.000000 | 83450.0 |
2016 | 88600 | 85733.333333 | 87475.0 |
2017 | 93500 | 90616.666667 | 92350.0 |
2018 | 89000 | 87825.000000 | 88150.0 |
(3) 将第一列时间列拆分成两个列,一列为年份(格式为 20××),另一列为
月份(英语缩写),添加到列表作为第一第二列,并将原表第一列删除,
其他列依次向后顺延。
df['Year'] = df['Date'].apply(lambda x:2000 + int(x.split('-')[0]))
df['Month'] = df['Date'].apply(lambda x: x.split('-')[1])
idf2 = df.drop(columns = 'Date')
idf2 = idf2.reindex(columns = ['Year', 'Month',
'Total number of license issued', 'lowest price ',
'avg price', 'Total number of applicants'])
print(idf2.head())
结果:
Year | Month | … | avg price | Total | number of applicants | |
---|---|---|---|---|---|---|
0 | 2002 | Jan | … | 14735 | 3718 | |
1 | 2002 | Feb | … | 14057 | 4590 | |
2 | 2002 | Mar | … | 14662 | 5190 | |
3 | 2002 | Apr | … | 16334 | 4806 | |
4 | 2002 | May | … | 18357 | 4665 |
(4) 现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第
五列的变量名,列索引为月份。
我大概知道是什么形式,但我没弄出来
(5) 一般而言某个月最低价与上月最低价的差额,会与该月均值与上月均值
的差额具有相同的正负号,哪些拍卖时间不具有这个特点?
for index in range(202):
flag = (df.loc[index,'lowest price ']-df.loc[index+1,'lowest price '])*\
(df.loc[index,'avg price']-df.loc[index+1,'avg price'])
if flag < 0:
print(df.loc[index+1]['Date'])
结果:
3-Oct
3-Nov
4-Jun
5-Jan
5-Feb
5-Sep
6-May
6-Sep
7-Jan
7-Feb
7-Dec
12-Oct
(6) 将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增
益,最初的两个月用 0 填充,求发行增益极值出现的时间。
df['issue gain'] = 0
for index in range(2,203):
df.loc[index,'issue gain'] = df.loc[index,'Total number of license issued']\
-(df.loc[index-1,'Total number of license issued']\
+df.loc[index-2,'Total number of license issued'])/2
imin = df[df['issue gain']==df['issue gain'].min()]['Date']
imax = df[df['issue gain']==df['issue gain'].max()]['Date']
print("min:",imin.values[0],'\nmax:',imax.values[0])
结果:
min: 8-Apr
max: 8-Jan
Airport name | Year | January | February | March | April | May | June | July | August | September | October | November | December | Whole year | Airport coordinates |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
机场名称 | 年 | 一月 | 二月 | 三月 | 四月 | 五月 | 六月 | 七月 | 八月 | 九月 | 十月 | 十一月 | 十二月 | 全年 | 机场坐标 |
Abakan | 2019 | 44.7 | 66.21 | 72.7 | 75.82 | 100.34 | 78.38 | 63.88 | 73.06 | 66.74 | 75.44 | 110.5 | 89.8 | 917.57 | “(Decimal(‘91.399735’) Decimal(‘53.751351’))” |
Aikhal | 2019 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | “(Decimal(‘111.543324’) Decimal(‘65.957161’))” |
Loss | 2019 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | “(Decimal(‘125.398355’) Decimal(‘58.602489’))” |
Amderma | 2019 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | “(Decimal(‘61.577429’) Decimal(‘69.759076’))” |
Anadyr (Carbon) | 2019 | 81.63 | 143.01 | 260.9 | 304.36 | 122 | 106.87 | 84.99 | 130 | 102 | 118 | 94 | 199 | 1746.76 | “(Decimal(‘177.738273’) Decimal(‘64.713433’))” |
问题
(1) 求每年货运航班总运量。
group_year = df.groupby('Year')
print(group_year['Whole year'].sum())
结果:
Year | |
---|---|
2007 | 659438.23 |
2008 | 664682.46 |
2009 | 560809.77 |
2010 | 693033.98 |
2011 | 818691.71 |
2012 | 846388.03 |
2013 | 792337.08 |
2014 | 729457.12 |
2015 | 630208.97 |
2016 | 679370.15 |
2017 | 773662.28 |
2018 | 767095.28 |
2019 | 764606.27 |
(2) 每年记录的机场都是相同的吗?
print(group_year['Airport name'].count())
结果:
Year | |
---|---|
2007 | 292 |
2008 | 292 |
2009 | 292 |
2010 | 292 |
2011 | 292 |
2012 | 292 |
2013 | 292 |
2014 | 292 |
2015 | 292 |
2016 | 292 |
2017 | 292 |
2018 | 248 |
2019 | 251 |
可见每年记录的机场不是完全相同的
(3) 按年计算 2010 年-2015 年全年货运量记录为 0 的机场航班比例。
idf3 = group_year['Whole year'].agg(rate = lambda x: str(len(x[x == 0])/len(x)*100)+'%')
print(idf3.loc[2010:2015])
结果:
rate | |
---|---|
Year | |
2010 | 76.71232876712328% |
2011 | 77.05479452054794% |
2012 | 77.05479452054794% |
2013 | 77.05479452054794% |
2014 | 77.05479452054794% |
2015 | 77.05479452054794% |
(4) 若某机场至少存在 5 年或以上满足所有月运量记录都为 0,则将其所有
年份的记录信息从表中删除,并返回处理后的表格
group_name = df.groupby('Airport name')
zerocount = group_name['Whole year'].agg(count = lambda x: len(x[x == 0]))
idf4 = df.set_index('Airport name').drop(zerocount[zerocount['count'] > 5].index)
print(idf4)
结果:
Year | … | Airport | coordinates | |
---|---|---|---|---|
Airport name | ||||
Abakan | 2019 | … | (Decimal(‘91.399735’), | Decimal(‘53.751351’)) |
Anadyr(Carbon) | 2019 | … | (Decimal(‘177.738273’), | Decimal(‘64.713433’)) |
Anapa(Vitjazevo) | 2019 | … | (Decimal(‘37.341511’), | Decimal(‘45.003748’)) |
Arkhangelsk(Talagy) | 2019 | … | (Decimal(‘40.714892’), | Decimal(‘64.596138’)) |
Astrakhan(Narimanovo) | 2019 | … | (Decimal(‘47.999896’), | Decimal(‘46.287344’)) |
[5 rows x 15 columns]
(5) 采用一种合理的方式将所有机场划分为东南西北四个分区,并给出 2017
年-2019 年货运总量最大的区域。
东南西北四个分区不知道怎么分
要是分成东西、东南、西北、东北四个区的话,可以对经纬度分别求均值来分组,再求对应货运总量来比较
(6) 在统计学中常常用秩代表排名,现在规定某个机场某年某个月的秩为该
机场该月在当年所有月份中货运量的排名(例如 *** 机场 19 年 1 月运
量在整个 19 年 12 个月中排名第一,则秩为 1),那么判断某月运量情
况的相对大小的秩方法为将所有机场在该月的秩排名相加,并将这个量
定义为每一个月的秩综合指数,请根据上述定义计算 2016 年 12 个月
的秩综合指数。
df2016 = df.query('Year == 2016').reset_index()
Month = df2016.columns[3:15]
irank = pd.DataFrame(index = Month)
for ix in df2016.index:
rank = df2016.loc[ix,Month].sort_values(ascending = False).index.to_list()
irank[ix] = [rank.index(mon)+1 for mon in Month]
print(irank.sum(axis = 1))
结果:
January | 3406 |
---|---|
February | 3076 |
March | 2730 |
April | 2432 |
May | 2276 |
June | 2047 |
July | 1854 |
August | 1527 |
September | 1269 |
October | 1009 |
November | 728 |
December | 422 |
美国确证数
UID | iso2 | iso3 | code3 | FIPS | Admin2 | Province_State | Country_Region | Lat | Long_ | Combined_Key | 2020/1/22 | 2020/1/23 | 2020/1/24 | 2020/1/25 | 2020/1/26 | 2020/1/27 | 2020/1/28 | 2020/1/29 | 2020/1/30 | 2020/1/31 | 2020/2/1 | 2020/2/2 | 2020/2/3 | 2020/2/4 | 2020/2/5 | 2020/2/6 | 2020/2/7 | 2020/2/8 | 2020/2/9 | 2020/2/10 | 2020/2/11 | 2020/2/12 | 2020/2/13 | 2020/2/14 | 2020/2/15 | 2020/2/16 | 2020/2/17 | 2020/2/18 | 2020/2/19 | 2020/2/20 | 2020/2/21 | 2020/2/22 | 2020/2/23 | 2020/2/24 | 2020/2/25 | 2020/2/26 | 2020/2/27 | 2020/2/28 | 2020/2/29 | 2020/3/1 | 2020/3/2 | 2020/3/3 | 2020/3/4 | 2020/3/5 | 2020/3/6 | 2020/3/7 | 2020/3/8 | 2020/3/9 | 2020/3/10 | 2020/3/11 | 2020/3/12 | 2020/3/13 | 2020/3/14 | 2020/3/15 | 2020/3/16 | 2020/3/17 | 2020/3/18 | 2020/3/19 | 2020/3/20 | 2020/3/21 | 2020/3/22 | 2020/3/23 | 2020/3/24 | 2020/3/25 | 2020/3/26 | 2020/3/27 | 2020/3/28 | 2020/3/29 | 2020/3/30 | 2020/3/31 | 2020/4/1 | 2020/4/2 | 2020/4/3 | 2020/4/4 | 2020/4/5 | 2020/4/6 | 2020/4/7 | 2020/4/8 | 2020/4/9 | 2020/4/10 | 2020/4/11 | 2020/4/12 | 2020/4/13 | 2020/4/14 | 2020/4/15 | 2020/4/16 | 2020/4/17 | 2020/4/18 | 2020/4/19 | 2020/4/20 | 2020/4/21 | 2020/4/22 | 2020/4/23 | 2020/4/24 | 2020/4/25 | 2020/4/26 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
84001001 | US | USA | 840 | 1001 | Autauga | Alabama | US | 32.53952745 | -86.64408227 | "Autauga | Alabama | US" | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | 6 | 6 | 6 | 6 | 6 | 7 | 8 | 10 | 12 | 12 | 12 | 12 | 12 | 12 | 15 | 17 | 19 | 19 | 19 | 23 | 24 | 26 | 26 | 25 | 26 | 28 | 30 | 32 | 33 | 36 |
84001003 | US | USA | 840 | 1003 | Baldwin | Alabama | US | 30.72774991 | -87.72207058 | "Baldwin | Alabama | US" | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 4 | 4 | 5 | 5 | 10 | 15 | 18 | 19 | 20 | 24 | 28 | 29 | 29 | 38 | 42 | 44 | 56 | 59 | 66 | 71 | 72 | 87 | 91 | 101 | 103 | 109 | 112 | 117 | 123 | 132 | 143 | 147 |
84001005 | US | USA | 840 | 1005 | Barbour | Alabama | US | 31.868263 | -85.3871286 | "Barbour | Alabama | US" | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 9 | 9 | 10 | 10 | 11 | 12 | 14 | 15 | 18 | 20 | 22 | 28 | 29 | 30 | 32 |
84001007 | US | USA | 840 | 1007 | Bibb | Alabama | US | 32.99642064 | -87.1251146 | "Bibb | Alabama | US" | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 3 | 4 | 4 | 4 | 5 | 7 | 8 | 9 | 9 | 11 | 13 | 16 | 17 | 17 | 18 | 22 | 24 | 26 | 28 | 32 | 32 | 34 | 33 | 34 |
84001009 | US | USA | 840 | 1009 | Blount | Alabama | US | 33.98210918 | -86.56790593 | "Blount | Alabama | US" | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 4 | 5 | 5 | 5 | 5 | 5 | 6 | 9 | 10 | 10 | 10 | 10 | 10 | 11 | 12 | 12 | 13 | 14 | 16 | 17 | 18 | 20 | 20 | 21 | 22 | 26 | 29 | 31 | 31 |
美国死亡数
UID | iso2 | iso3 | code3 | FIPS | Admin2 | Province_State | Country_Region | Lat | Long_ | Combined_Key | Population | 2020/1/22 | 2020/1/23 | 2020/1/24 | 2020/1/25 | 2020/1/26 | 2020/1/27 | 2020/1/28 | 2020/1/29 | 2020/1/30 | 2020/1/31 | 2020/2/1 | 2020/2/2 | 2020/2/3 | 2020/2/4 | 2020/2/5 | 2020/2/6 | 2020/2/7 | 2020/2/8 | 2020/2/9 | 2020/2/10 | 2020/2/11 | 2020/2/12 | 2020/2/13 | 2020/2/14 | 2020/2/15 | 2020/2/16 | 2020/2/17 | 2020/2/18 | 2020/2/19 | 2020/2/20 | 2020/2/21 | 2020/2/22 | 2020/2/23 | 2020/2/24 | 2020/2/25 | 2020/2/26 | 2020/2/27 | 2020/2/28 | 2020/2/29 | 2020/3/1 | 2020/3/2 | 2020/3/3 | 2020/3/4 | 2020/3/5 | 2020/3/6 | 2020/3/7 | 2020/3/8 | 2020/3/9 | 2020/3/10 | 2020/3/11 | 2020/3/12 | 2020/3/13 | 2020/3/14 | 2020/3/15 | 2020/3/16 | 2020/3/17 | 2020/3/18 | 2020/3/19 | 2020/3/20 | 2020/3/21 | 2020/3/22 | 2020/3/23 | 2020/3/24 | 2020/3/25 | 2020/3/26 | 2020/3/27 | 2020/3/28 | 2020/3/29 | 2020/3/30 | 2020/3/31 | 2020/4/1 | 2020/4/2 | 2020/4/3 | 2020/4/4 | 2020/4/5 | 2020/4/6 | 2020/4/7 | 2020/4/8 | 2020/4/9 | 2020/4/10 | 2020/4/11 | 2020/4/12 | 2020/4/13 | 2020/4/14 | 2020/4/15 | 2020/4/16 | 2020/4/17 | 2020/4/18 | 2020/4/19 | 2020/4/20 | 2020/4/21 | 2020/4/22 | 2020/4/23 | 2020/4/24 | 2020/4/25 | 2020/4/26 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
84001001 | US | USA | 840 | 1001 | Autauga | Alabama | US | 32.53952745 | -86.64408227 | "Autauga | Alabama | US" | 55869 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 2 |
84001003 | US | USA | 840 | 1003 | Baldwin | Alabama | US | 30.72774991 | -87.72207058 | "Baldwin | Alabama | US" | 223234 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
84001005 | US | USA | 840 | 1005 | Barbour | Alabama | US | 31.868263 | -85.3871286 | "Barbour | Alabama | US" | 24686 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
84001007 | US | USA | 840 | 1007 | Bibb | Alabama | US | 32.99642064 | -87.1251146 | "Bibb | Alabama | US" | 22394 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
84001009 | US | USA | 840 | 1009 | Blount | Alabama | US | 33.98210918 | -86.56790593 | "Blount | Alabama | US" | 57826 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
问题
(1) 用 corr() 函数计算县(每行都是一个县)人口与表中最后一天记录日期
死亡数的相关系数。
print(df2[['Population','2020/4/26']].corr())
结果:
Population | 2020/4/26 | |
---|---|---|
Population | 1.000000 | 0.403844 |
2020/4/26 | 0.403844 | 1.000000 |
(2) 截止到 4 月 1 日,统计每个州零感染县的比例。
group_Province = df1[['Province_State', '2020/4/1']].groupby('Province_State')
idf2 = group_Province['2020/4/1'].agg(rate = lambda x: str(len(x[x == 0])/len(x)*100)+'%')
print(idf2)
结果:
rate | |
---|---|
Province_State | |
Alabama | 11.940298507462686% |
Alaska | 79.3103448275862% |
Arizona | 0.0% |
Arkansas | 29.333333333333332% |
California | 13.793103448275861% |
Colorado | 21.875% |
Connecticut | 0.0% |
Delaware | 0.0% |
District of Columbia | 0.0% |
Florida | 16.417910447761194% |
Georgia | 12.578616352201259% |
Hawaii | 20.0% |
Idaho | 38.63636363636363% |
Illinois | 48.03921568627451% |
Indiana | 10.869565217391305% |
Iowa | 40.4040404040404% |
Kansas | 60.952380952380956% |
Kentucky | 44.166666666666664% |
Louisiana | 6.25% |
Maine | 25.0% |
Maryland | 4.166666666666666% |
Massachusetts | 14.285714285714285% |
Michigan | 19.27710843373494% |
Minnesota | 36.7816091954023% |
Mississippi | 6.097560975609756% |
Missouri | 39.130434782608695% |
Montana | 62.5% |
Nebraska | 75.26881720430107% |
Nevada | 47.05882352941176% |
New Hampshire | 10.0% |
New Jersey | 0.0% |
New Mexico | 42.42424242424242% |
New York | 8.064516129032258% |
North Carolina | 18.0% |
North Dakota | 54.71698113207547% |
Ohio | 18.181818181818183% |
Oklahoma | 37.66233766233766% |
Oregon | 27.77777777777778% |
Pennsylvania | 10.44776119402985% |
Rhode Island | 0.0% |
South Carolina | 6.521739130434782% |
South Dakota | 56.060606060606055% |
Tennessee | 11.578947368421053% |
Texas | 45.2755905511811% |
Utah | 48.275862068965516% |
Vermont | 14.285714285714285% |
Virginia | 27.06766917293233% |
Washington | 12.82051282051282% |
West Virginia | 47.27272727272727% |
Wisconsin | 31.944444444444443% |
Wyoming | 34.78260869565217% |
(3) 请找出最早出确证病例的三个县。
idf3 = df1.copy()
towns = []
for day in df1.columns[11:]:
if len(towns) >= 3: break
town = idf3[idf3[day] > 0]['Admin2']
if town.shape[0] > 0:
towns.extend(town.values)
idf3 = idf3.drop(index = town.index)
print(towns)
结果:
[‘King’, ‘Cook’, ‘Maricopa’, ‘Los Angeles’, ‘Orange’]
(4) 按州统计单日死亡增加数,并给出哪个州在哪一天确诊数增加最大(这
里指的是在所有州和所有天两个指标一起算,不是分别算)。
(5) 现需对每个州编制确证与死亡表,第一列为时间,并且起始时间为该州
开始出现死亡比例的那一天,第二列和第三列分别为确证数和死亡数,
每个州需要保存为一个单独的 csv 文件,文件名为“州名.csv”。
(6) 现需对 4 月 1 日至 4 月 10 日编制新增确证数与新增死亡数表,第一列
为州名,第二列和第三列分别为新增确证数和新增死亡数,分别保存为
十个单独的 csv 文件,文件名为“日期.csv”。