Python之pandas时间处理

1.导入模块
>>> import pandas as pd
2.导入数据表格
>>> air_quality = pd.read_csv(r"C:\Users\Administrator\Desktop\air_quality_no2_long.csv")
3.修改列名字
>>> air_quality = air_quality.rename(columns={"date.utc": "datetime"})
>>> air_quality
        city country                   datetime            location parameter  value   unit
0      Paris      FR  2019-06-21 00:00:00+00:00             FR04014       no2   20.0  µg/m³
1      Paris      FR  2019-06-20 23:00:00+00:00             FR04014       no2   21.8  µg/m³
2      Paris      FR  2019-06-20 22:00:00+00:00             FR04014       no2   26.5  µg/m³
3      Paris      FR  2019-06-20 21:00:00+00:00             FR04014       no2   24.9  µg/m³
4      Paris      FR  2019-06-20 20:00:00+00:00             FR04014       no2   21.4  µg/m³
...      ...     ...                        ...                 ...       ...    ...    ...
2063  London      GB  2019-05-07 06:00:00+00:00  London Westminster       no2   26.0  µg/m³
2064  London      GB  2019-05-07 04:00:00+00:00  London Westminster       no2   16.0  µg/m³
2065  London      GB  2019-05-07 03:00:00+00:00  London Westminster       no2   19.0  µg/m³
2066  London      GB  2019-05-07 02:00:00+00:00  London Westminster       no2   19.0  µg/m³
2067  London      GB  2019-05-07 01:00:00+00:00  London Westminster       no2   23.0  µg/m³

[2068 rows x 7 columns]
4.修改日期时间列属性
>>> air_quality["datetime"]
0       2019-06-21 00:00:00+00:00
1       2019-06-20 23:00:00+00:00
2       2019-06-20 22:00:00+00:00
3       2019-06-20 21:00:00+00:00
4       2019-06-20 20:00:00+00:00
                  ...            
2063    2019-05-07 06:00:00+00:00
2064    2019-05-07 04:00:00+00:00
2065    2019-05-07 03:00:00+00:00
2066    2019-05-07 02:00:00+00:00
2067    2019-05-07 01:00:00+00:00
Name: datetime, Length: 2068, dtype: object
>>> air_quality["datetime"] = pd.to_datetime(air_quality["datetime"])
>>> air_quality["datetime"]
0      2019-06-21 00:00:00+00:00
1      2019-06-20 23:00:00+00:00
2      2019-06-20 22:00:00+00:00
3      2019-06-20 21:00:00+00:00
4      2019-06-20 20:00:00+00:00
                  ...           
2063   2019-05-07 06:00:00+00:00
2064   2019-05-07 04:00:00+00:00
2065   2019-05-07 03:00:00+00:00
2066   2019-05-07 02:00:00+00:00
2067   2019-05-07 01:00:00+00:00
Name: datetime, Length: 2068, dtype: datetime64[ns, UTC]

使用to_datetime功能,pandas解释了字符串并将其转换为datetime对象,或者在导入数据是使用参数 pd.read_csv("xx.csv", parse_dates=["xxxx"])进行转换对象,但对导入数据性能会有所影响

5.时间的操作
>>> air_quality["datetime"].min(), air_quality["datetime"].max()     #查看最早和最晚时间
(Timestamp('2019-05-07 01:00:00+0000', tz='UTC'), Timestamp('2019-06-21 00:00:00+0000', tz='UTC'))
>>> air_quality["datetime"].max() - air_quality["datetime"].min()    #查看最大时间差
Timedelta('44 days 23:00:00')
>>> air_quality["month"] = air_quality["datetime"].dt.month    #添加显示月份的列
>>> air_quality
        city country                  datetime            location parameter  value   unit  month
0      Paris      FR 2019-06-21 00:00:00+00:00             FR04014       no2   20.0  µg/m³      6
1      Paris      FR 2019-06-20 23:00:00+00:00             FR04014       no2   21.8  µg/m³      6
2      Paris      FR 2019-06-20 22:00:00+00:00             FR04014       no2   26.5  µg/m³      6
3      Paris      FR 2019-06-20 21:00:00+00:00             FR04014       no2   24.9  µg/m³      6
4      Paris      FR 2019-06-20 20:00:00+00:00             FR04014       no2   21.4  µg/m³      6
...      ...     ...                       ...                 ...       ...    ...    ...    ...
2063  London      GB 2019-05-07 06:00:00+00:00  London Westminster       no2   26.0  µg/m³      5
2064  London      GB 2019-05-07 04:00:00+00:00  London Westminster       no2   16.0  µg/m³      5
2065  London      GB 2019-05-07 03:00:00+00:00  London Westminster       no2   19.0  µg/m³      5
2066  London      GB 2019-05-07 02:00:00+00:00  London Westminster       no2   19.0  µg/m³      5
2067  London      GB 2019-05-07 01:00:00+00:00  London Westminster       no2   23.0  µg/m³      5

[2068 rows x 8 columns]
>>> air_quality.groupby([air_quality["datetime"].dt.weekday, "location"])["value"].mean()       #查看每周中每天的二氧化碳浓度
datetime  location          
0         BETR801               27.875000
          FR04014               24.856250
          London Westminster    23.969697
1         BETR801               22.214286
          FR04014               30.999359
          London Westminster    24.885714
2         BETR801               21.125000
          FR04014               29.165753
          London Westminster    23.460432
3         BETR801               27.500000
          FR04014               28.600690
          London Westminster    24.780142
4         BETR801               28.400000
          FR04014               31.617986
          London Westminster    26.446809
5         BETR801               33.500000
          FR04014               25.266154
          London Westminster    24.977612
6         BETR801               21.896552
          FR04014               23.274306
          London Westminster    24.859155
Name: value, dtype: float64
6.以时间为索引,将每个位置设置为列
>>> no_2 = air_quality.pivot(index="datetime", columns="location", values="value")
>>> no_2
location                   BETR801  FR04014  London Westminster
datetime                                                       
2019-05-07 01:00:00+00:00     50.5     25.0                23.0
2019-05-07 02:00:00+00:00     45.0     27.7                19.0
2019-05-07 03:00:00+00:00      NaN     50.4                19.0
2019-05-07 04:00:00+00:00      NaN     61.9                16.0
2019-05-07 05:00:00+00:00      NaN     72.4                 NaN
...                            ...      ...                 ...
2019-06-20 20:00:00+00:00      NaN     21.4                 NaN
2019-06-20 21:00:00+00:00      NaN     24.9                 NaN
2019-06-20 22:00:00+00:00      NaN     26.5                 NaN
2019-06-20 23:00:00+00:00      NaN     21.8                 NaN
2019-06-21 00:00:00+00:00      NaN     20.0                 NaN

[1033 rows x 3 columns]

你可能感兴趣的:(Python之pandas时间处理)