记录一些日常用到的偏技巧性的pandas操作
返回各列非空值的个数,默认降序排序
- loans_2007['loan_status'].value_counts()
替换为1和0
- status_replace = { "loan_status" : { "Fully Paid": 1, "Charged Off": 0, } }
- loans_2007 = loans_2007.replace(status_replace)
选出特定类型的列
- object_columns_df = loans.select_dtypes(include=["object"])
- print(object_columns_df.iloc[0])
按照某几列排序
- data.sort_values(['Fare','Age'],ascending=False)
按照某列分组,groupby的使用
- data.groupby(by='Sex').count()
透视表功能,不设置aggfunc方法的情况下,用的是均值
- data.pivot_table(index='Pclass',values='Age',aggfunc=[len,mean,median])
- 透视表创建完成后,一般需要重置索引列
- data_reindexed = new_data.reset_index(drop = True)
创建虚拟变量
- dummy_df = pd.get_dummies(loans[cat_columns])
- loans = pd.concat([loans, dummy_df], axis=1)
- loans = loans.drop(cat_columns, axis=1)
过滤只有一个值的列
orig_columns = loans_2007.columns
drop_columns = []
for col in orig_columns:
col_series = loans_2007[col].dropna().unique()
if len(col_series) == 1:
drop_columns.append(col)
loans_2007 = loans_2007.drop(drop_columns, axis=1)