017 pandas DataFrame数据综合练习

问题:统计911出警类型以及次数 

  数据形式 

         lat        lng                                               desc  \
0  40.297876 -75.581294  REINDEER CT & DEAD END;  NEW HANOVER; Station ...   
1  40.258061 -75.264680  BRIAR PATH & WHITEMARSH LN;  HATFIELD TOWNSHIP...   
2  40.121182 -75.351975  HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...   
3  40.116153 -75.343513  AIRY ST & SWEDE ST;  NORRISTOWN; Station 308A;...   
4  40.251492 -75.603350  CHERRYWOOD CT & DEAD END;  LOWER POTTSGROVE; S...   

       zip                    title            timeStamp                twp  \
0  19525.0   EMS: BACK PAINS/INJURY  2015-12-10 17:10:52        NEW HANOVER   
1  19446.0  EMS: DIABETIC EMERGENCY  2015-12-10 17:29:21  HATFIELD TOWNSHIP   
2  19401.0      Fire: GAS-ODOR/LEAK  2015-12-10 14:39:21         NORRISTOWN   
3  19401.0   EMS: CARDIAC EMERGENCY  2015-12-10 16:47:36         NORRISTOWN   
4      NaN           EMS: DIZZINESS  2015-12-10 16:56:52   LOWER POTTSGROVE   

                         addr  e  
0      REINDEER CT & DEAD END  1  
1  BRIAR PATH & WHITEMARSH LN  1  
2                    HAWS AVE  1  
3          AIRY ST & SWEDE ST  1  
4    CHERRYWOOD CT & DEAD END  1  

手动遍历: 

import pandas as pd
import numpy as np

file_path = '数据'
pd.set_option('display.max_columns',20)
df = pd.read_csv(file_path)

# 获取分类情况
# print(df['title'].str.split(':'))
# 将标题按照‘:’分割,并转化为 list 列表
temp_list = df['title'].str.split(':').tolist()
cate_list = list(set([i[0] for i in temp_list]))
# print(df.head(5))

# 构造全为0的数组
zero_df = pd.DataFrame(np.zeros((df.shape[0], len(cate_list))), columns=cate_list)
# 遍历cate_list给有数据的赋值为1, 只需要遍历三次
for cate in cate_list:
    zero_df[cate][df['title'].str.contains(cate)] = 1

# 遍历temp_list给有数据的位置赋值为1,需要遍历25万次,效率低
# for i in range(df.shape[0]):
#     zero_df.loc[i, temp_list[i][0]] = 1
# print(zero_df)

# 统计各种情况出现的数量
sum_ret = zero_df.sum(axis=0).sort_values()
print(sum_ret)

使用分组与聚合 

import pandas as pd
import numpy as np

file_path = '../youtube_video_data/911.csv'
pd.set_option('display.max_columns',20)
df = pd.read_csv(file_path)

# 获取分类情况
# print(df['title'].str.split(':'))
temp_list = df['title'].str.split(':').tolist()
cate_list = [i[0] for i in temp_list]
df['cate'] = pd.DataFrame(np.array(cate_list).reshape((df.shape[0], 1)))

# print(df.head(5))
print(df.groupby(by='cate').count()['title'])

 

结果

Fire        37432.0
Traffic     87465.0
EMS        124844.0
dtype: float64

Process finished with exit code 0

 

017 pandas DataFrame数据综合练习_第1张图片

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(Python数据分析,python,pandas)