Python之pandas表格排序及透视图

1.导入模块
>>> import pandas as pd
2.设置显示列
>>> pd.set_option('display.max_rows', 100,'display.max_columns', 1000,"display.max_colwidth",1000,'display.width',1000)
3.导入表格数据
>>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
>>> titanic.head()
   PassengerId  Survived  Pclass                                                 Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
0            1         0       3                              Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S
1            2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1      0          PC 17599  71.2833   C85        C
2            3         1       3                               Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
3            4         1       1         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
4            5         0       3                             Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S
>>> air_quality = pd.read_csv(r"C:\Users\Administrator\Desktop\air_quality_long.csv", index_col="date.utc", parse_dates=True)
>>> air_quality.head()
                                city country location parameter  value   unit
date.utc                                                                     
2019-06-18 06:00:00+00:00  Antwerpen      BE  BETR801      pm25   18.0  µg/m³
2019-06-17 08:00:00+00:00  Antwerpen      BE  BETR801      pm25    6.5  µg/m³
2019-06-17 07:00:00+00:00  Antwerpen      BE  BETR801      pm25   18.5  µg/m³
2019-06-17 06:00:00+00:00  Antwerpen      BE  BETR801      pm25   16.0  µg/m³
2019-06-17 05:00:00+00:00  Antwerpen      BE  BETR801      pm25    7.5  µg/m³
4.对年龄进行升序排序
>>> titanic.sort_values(by="Age").head()
     PassengerId  Survived  Pclass                             Name     Sex   Age  SibSp  Parch  Ticket     Fare Cabin Embarked
803          804         1       3  Thomas, Master. Assad Alexander    male  0.42      0      1    2625   8.5167   NaN        C
755          756         1       2        Hamalainen, Master. Viljo    male  0.67      1      1  250649  14.5000   NaN        S
644          645         1       3           Baclini, Miss. Eugenie  female  0.75      2      1    2666  19.2583   NaN        C
469          470         1       3    Baclini, Miss. Helene Barbara  female  0.75      2      1    2666  19.2583   NaN        C
78            79         1       2    Caldwell, Master. Alden Gates    male  0.83      0      2  248738  29.0000   NaN        S
5.对客舱等级和年龄按降序对数据进行排序
>>> titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()
     PassengerId  Survived  Pclass                       Name     Sex   Age  SibSp  Parch  Ticket    Fare Cabin Embarked
851          852         0       3        Svensson, Mr. Johan    male  74.0      0      0  347060  7.7750   NaN        S
116          117         0       3       Connors, Mr. Patrick    male  70.5      0      0  370369  7.7500   NaN        Q
280          281         0       3           Duane, Mr. Frank    male  65.0      0      0  336439  7.7500   NaN        Q
483          484         1       3     Turkula, Mrs. (Hedwig)  female  63.0      0      0    4134  9.5875   NaN        S
326          327         0       3  Nysveen, Mr. Johan Hansen    male  61.0      0      0  345364  6.2375   NaN        S
6.过滤列指定值
>>> no2 = air_quality[air_quality["parameter"] == "no2"]
>>> no2
                             city country            location parameter  value   unit
date.utc                                                                             
2019-06-21 00:00:00+00:00   Paris      FR             FR04014       no2   20.0  µg/m³
2019-06-20 23:00:00+00:00   Paris      FR             FR04014       no2   21.8  µg/m³
2019-06-20 22:00:00+00:00   Paris      FR             FR04014       no2   26.5  µg/m³
2019-06-20 21:00:00+00:00   Paris      FR             FR04014       no2   24.9  µg/m³
2019-06-20 20:00:00+00:00   Paris      FR             FR04014       no2   21.4  µg/m³
...                           ...     ...                 ...       ...    ...    ...
2019-04-09 06:00:00+00:00  London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 05:00:00+00:00  London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 04:00:00+00:00  London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 03:00:00+00:00  London      GB  London Westminster       no2   67.0  µg/m³
2019-04-09 02:00:00+00:00  London      GB  London Westminster       no2   67.0  µg/m³

[3447 rows x 6 columns]
7.将不同行的值转为列值显示
>>> no2_subset = no2.sort_index().groupby(["location"]).head(3)
>>> no2_subset
                                city country            location parameter  value   unit
date.utc                                                                                
2019-04-09 01:00:00+00:00  Antwerpen      BE             BETR801       no2   22.5  µg/m³
2019-04-09 01:00:00+00:00      Paris      FR             FR04014       no2   24.4  µg/m³
2019-04-09 02:00:00+00:00     London      GB  London Westminster       no2   67.0  µg/m³
2019-04-09 02:00:00+00:00  Antwerpen      BE             BETR801       no2   53.5  µg/m³
2019-04-09 02:00:00+00:00      Paris      FR             FR04014       no2   27.4  µg/m³
2019-04-09 03:00:00+00:00  Antwerpen      BE             BETR801       no2   54.5  µg/m³
2019-04-09 03:00:00+00:00     London      GB  London Westminster       no2   67.0  µg/m³
2019-04-09 03:00:00+00:00      Paris      FR             FR04014       no2   34.2  µg/m³
2019-04-09 04:00:00+00:00     London      GB  London Westminster       no2   41.0  µg/m³
>>> no2_subset.pivot(columns="location", values="value")
location                   BETR801  FR04014  London Westminster
date.utc                                                       
2019-04-09 01:00:00+00:00     22.5     24.4                 NaN
2019-04-09 02:00:00+00:00     53.5     27.4                67.0
2019-04-09 03:00:00+00:00     54.5     34.2                67.0
2019-04-09 04:00:00+00:00      NaN      NaN                41.0
8.使用透视表查看每个站的no2和pm25的平均值
>>> air_quality.pivot_table(values="value", index="location", columns="parameter", aggfunc="mean")
parameter                 no2       pm25
location                                
BETR801             26.950920  23.169492
FR04014             29.374284        NaN
London Westminster  29.740050  13.443568
>>> air_quality
                                city country            location parameter  value   unit
date.utc                                                                                
2019-06-18 06:00:00+00:00  Antwerpen      BE             BETR801      pm25   18.0  µg/m³
2019-06-17 08:00:00+00:00  Antwerpen      BE             BETR801      pm25    6.5  µg/m³
2019-06-17 07:00:00+00:00  Antwerpen      BE             BETR801      pm25   18.5  µg/m³
2019-06-17 06:00:00+00:00  Antwerpen      BE             BETR801      pm25   16.0  µg/m³
2019-06-17 05:00:00+00:00  Antwerpen      BE             BETR801      pm25    7.5  µg/m³
...                              ...     ...                 ...       ...    ...    ...
2019-04-09 06:00:00+00:00     London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 05:00:00+00:00     London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 04:00:00+00:00     London      GB  London Westminster       no2   41.0  µg/m³
2019-04-09 03:00:00+00:00     London      GB  London Westminster       no2   67.0  µg/m³
2019-04-09 02:00:00+00:00     London      GB  London Westminster       no2   67.0  µg/m³

[5272 rows x 6 columns]

你可能感兴趣的:(Python之pandas表格排序及透视图)