Python之pandas汇总统计

1.导入模块
>>> import pandas as pd
2.解决DataFrame中的行列显示不全问题
>>> pd.set_option('display.max_rows', 100,'display.max_columns', 1000,"display.max_colwidth",1000,'display.width',1000)
3.导入数据表格
>>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
4.统计平均年龄
>>> titanic["Age"].mean()
29.69911764705882

默认会跳过空值,并不会跨行统计

5.统计年龄和票价中位数
>>> titanic[["Age", "Fare"]].median()
Age     28.0000
Fare    14.4542
dtype: float64
6.多列数据统计,函数自定义统计值
>>> titanic[["Age", "Fare"]].describe()
              Age        Fare
count  714.000000  891.000000
mean    29.699118   32.204208
std     14.526497   49.693429
min      0.420000    0.000000
25%     20.125000    7.910400
50%     28.000000   14.454200
75%     38.000000   31.000000
max     80.000000  512.329200
7.多列数据统计,自定义统计值
>>> titanic.agg({'Age': ['min', 'max', 'median', 'skew'],
                'Fare': ['min', 'max', 'median', 'mean']})
...               Age        Fare
max     80.000000  512.329200
mean          NaN   32.204208
median  28.000000   14.454200
min      0.420000    0.000000
skew     0.389108         NaN
8.按类别分组统计
分类统计流程.png
>>> titanic.groupby("Sex").mean()          #按性别统计各类别的平均值
        PassengerId  Survived    Pclass        Age     SibSp     Parch       Fare
Sex                                                                              
female   431.028662  0.742038  2.159236  27.915709  0.694268  0.649682  44.479818
male     454.147314  0.188908  2.389948  30.726645  0.429809  0.235702  25.523893
>>> titanic.groupby("Sex")["Age"].mean()    #按性别统计年龄的平均值
Sex
female    27.915709
male      30.726645
Name: Age, dtype: float64
>>> titanic.groupby(["Sex", "Pclass"])["Fare"].mean()    #按性别和机舱舱位组合统计平均票价
Sex     Pclass
female  1         106.125798
        2          21.970121
        3          16.118810
male    1          67.226127
        2          19.741782
        3          12.661633
Name: Fare, dtype: float64
9.按类别统计其个数
>>> titanic.groupby("Pclass")["Pclass"].count()
Pclass
1    216
2    184
3    491
Name: Pclass, dtype: int64
>>> 
>>> titanic["Pclass"].value_counts()
3    491
1    216
2    184
Name: Pclass, dtype: int64

value_counts()方法计算列中每个类别的记录数,该函数是一个快捷方式,它实际上是一个groupby操作

你可能感兴趣的:(Python之pandas汇总统计)