pandas 使用tips

最近实习做数据分析相关工作,用pandas用的比较多,所以在这里记录一下平时用到的一些函数和相关tips,也方便以后自己忘记了可以查询下。这个帖子会一直更新的~

 

1.新建dataframe,并往dataframe中添加数据

如下所示,即新建列名为question_id和city_nam的数据,通过append往空dataframe中添加数据

data_topic = pd.DataFrame(columns = ["question_id", "city_name"])

data_topic = data_topic.append({"question_id": question_id , "city_name": city_name, }, ignore_index=True)

2.读取数据(间隔为“\t”),并为数据添加列名

school_area = pd.read_csv("./school_area.csv",sep  = "\t")
school_area.columns=['school_id','province','city_name']

3.对dataframe进行条件判断替换值

以下是将province中国为福建省的行的city_name都替换为福建省

school_area.ix[school_area["province"]=="福建省",'city_name'] ="福建省"

4.选取dataframe的某几列数据

以下是选取school_id和city_name列

school_area = school_area[["school_id","city_name"]]

5.对dataframe的数据进行全量去重使用drop_duplicates()

school_area = school_area.drop_duplicates()

6.查看dataframe某一列的元素有哪些

print(set(list(school_area["city_name"])))

7.筛选dataframe数据

question_data = question_data[ question_data['school_id']!= "bg-1"]

8.合并数据

由于我这里的连接列不是数字类型,所以这里先把他变成int

question_data['school_id'] = question_data['school_id'].apply(int)
school_area['school_id'] = school_area['school_id'].apply(int)
alldata = pd.merge(question_data, school_area,how = "left" ,on=['school_id'])

9.自制的去除空值的方法

先把空值填为“空”,然后筛选该列不为“空”的数据,即可得到没有空值的数据

alldata["city_name"]=alldata["city_name"].fillna("空")
alldata = alldata[ alldata['city_name']!= "空"]


  

你可能感兴趣的:(Python学习)