pandas常见操作

处理缺失值

``` total = train.isnull().sum().sort_values(ascending = False) percent = round(train.isnull().sum().sort_values(ascending = False)/len(train)*100, 2) pd.concat([total, percent], axis = 1,keys= ['Total', 'Percent'])

查看类别输出: value_counts()


分组统计

https://blog.csdn.net/elecjack/article/details/50760736 df[df['列名'].isin([相应的值])] 这个命令会输出等于该值的行。 有时,你可能希望得到DataFrame中多个相关列的一张柱状图。例如:

In [263]: data = pd.DataFrame({‘Qu1’: [1, 3, 4, 3, 4],
…: ‘Qu2’: [2, 3, 1, 2, 3],
…: ‘Qu3’: [1, 5, 2, 4, 4]})

In [264]: data
Out[264]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4


将pandas.value_counts传给该DataFrame的apply函数,就会出现:

In [265]: result = data.apply(pd.value_counts).fillna(0)

In [266]: result
Out[266]:
Qu1 Qu2 Qu3
1 1.0 1.0 1.0
2 0.0 2.0 1.0
3 2.0 2.0 0.0
4 2.0 0.0 2.0
5 0.0 0.0 1.0


这里,结果中的行标签是所有列的唯一值。后面的频率值是每个列中这些值的相应计数。

dataset3['is_weekend'] = dataset3.day_of_week.apply(lambda x: 1 if x in (6, 7) else 0)

Python Pandas找到缺失值的位置(转):
https://blog.csdn.net/u012387178/article/details/52571725

pandas 下的 one hot encoder 及 pd.get_dummies() 与 sklearn.preprocessing 下的 OneHotEncoder 的区别(转)
https://blog.csdn.net/lanchunhui/article/details/72870358

ontHot编码
weekday_dummies = pd.get_dummies(dataset3.day_of_week)
weekday_dummies.columns = ['weekday' + str(i+1) for i in range(weekday_dummies.shape[1])]
dataset3 = pd.concat([dataset3, weekday_dummies], axis= 1)

pandas merge详解
https://www.cnblogs.com/bigshow1949/p/7016235.html

python3连接数据库出错解决方法
https://www.cnblogs.com/magicc/p/6490671.html

pandas DataFrame 根据多列的值做判断,生成新的列值
https://blog.csdn.net/qq_30565883/article/details/79464266


![常用统计方法解释](https://img-blog.csdn.net/20180705021040262?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2doajc4NjExMA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)


pandas DataFrame 根据多列的值做判断,生成新的列值
https://blog.csdn.net/qq_30565883/article/details/79464266

你可能感兴趣的:(pandas)