有这样的一组数据:Age(年龄)是浮点数;New_Salutation(称谓)有五个取值——Mr,Mrs,Miss,Master,Other;Pclass(级别)有三个取值——1,2,3;Sex(性别)有两个取值——male,female。具体数据内容见下:
>>>df[["Age","New_Salutation","Pclass","Sex"]]
Age New_Salutation Pclass Sex
0 22.000000 Mr 3 male
1 38.000000 Mrs 1 female
2 26.000000 Miss 3 female
3 35.000000 Mrs 1 female
4 35.000000 Mr 3 male
5 29.699118 Mr 3 male
6 54.000000 Mr 1 male
7 2.000000 Master 3 male
8 27.000000 Mrs 3 female
利用pandas库的pivot_table方法进行分析,使用方法如下。
pivot_table参数含义:
values:透视表中展示的是有关于Age的数值
index:按New_Salutation的五个取值(Mr,Mrs,Miss,Master,Other)进行索引排序
columns:先按照Pclass的三个取值(1,2,3)分成分成三组,每组中再按照Sex的取值(male,female)分成两组,一共是六组。也可以只填Pclass一个值,则只分成三组,不在继续细分。
aggfunc:透视表中的数值展示的是每组关于Age的均值
对于New_Salutation取值为Master,Pclass为1,Sex为Male的这些人,他们的Age均值是4.0,见下文数据中的标红数字
>>>table=df.pivot_table(values="Age",index=["New_Salutation"],columns=["Pclass","Sex"],aggfunc=np.median)
输出结果为:
Pclass 1 2 3
Sex female male female male female male
New_Salutation
Master NaN 4.0 NaN 1.0 NaN 6.500000
Miss 30.0 NaN 24.0 NaN 22.000000 NaN
Mr NaN 36.0 NaN 30.0 NaN 29.699118
Mrs 38.5 NaN 32.0 NaN 29.699118 NaN
Others 28.5 47.0 28.0 46.5 NaN NaN