DataFrame.dropna()方法的作用:是删除含用空值或缺失值得行或列。
语法为:dropna(axis=0,how=‘any’,thresh=None,subset=None,inplace=False)
参数:
语法为:fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
参数:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(32).reshape(8, 4), columns=list("abcd"))
df.loc[1, 'a'] = 2
df.loc[1, 'c'] = 2.0
df.loc[6, 'c'] = np.nan
df.loc[3, 'c'] = 10
df.loc[3, ['c', 'd']] = np.nan
df["year"] = '2023'
df["date"] = ['08-25','08-26','08-27','08-28','08-29','08-30','08-31','09-01']
# 合并数据
df["ydate"] =df["year"].map(str) +"-"+ df["date"].map(str)
df["高温"] = ['15°', '16°', '20°', '19°', '20°', '22°', '24°', '23°']
df["低温"] = ['10°', '11°', '18°', '17°', '10°', '18°', '20°', '17°']
df["空气质量"] = ['优', '良', '优', '优', '差', '良', '优', np.nan]
print(df)
a b c d year date ydate 高温 低温 空气质量
0 0 1 2.0 3.0 2023 08-25 2023-08-25 15° 10° 优
1 2 5 2.0 7.0 2023 08-26 2023-08-26 16° 11° 良
2 8 9 10.0 11.0 2023 08-27 2023-08-27 20° 18° 优
3 12 13 NaN NaN 2023 08-28 2023-08-28 19° 17° 优
4 16 17 18.0 19.0 2023 08-29 2023-08-29 20° 10° 差
5 20 21 22.0 23.0 2023 08-30 2023-08-30 22° 18° 良
6 24 25 NaN 27.0 2023 08-31 2023-08-31 24° 20° 优
7 28 29 30.0 31.0 2023 09-01 2023-09-01 23° 17° NaN
df1 = df.copy()
df1.loc[8, :] = np.nan
df1
a b c d year date ydate 高温 低温 空气质量
0 0.0 1.0 2.0 3.0 2023 08-25 2023-08-25 15° 10° 优
1 2.0 5.0 2.0 7.0 2023 08-26 2023-08-26 16° 11° 良
2 8.0 9.0 10.0 11.0 2023 08-27 2023-08-27 20° 18° 优
3 12.0 13.0 NaN NaN 2023 08-28 2023-08-28 19° 17° 优
4 16.0 17.0 18.0 19.0 2023 08-29 2023-08-29 20° 10° 差
5 20.0 21.0 22.0 23.0 2023 08-30 2023-08-30 22° 18° 良
6 24.0 25.0 NaN 27.0 2023 08-31 2023-08-31 24° 20° 优
7 28.0 29.0 30.0 31.0 2023 09-01 2023-09-01 23° 17° NaN
8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
df1 = df.dropna(axis=0, how='all')
df1
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 1.0 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2.0 | 5.0 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8.0 | 9.0 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
3 | 12.0 | 13.0 | NaN | NaN | 2023 | 08-28 | 2023-08-28 | 19° | 17° | 优 |
4 | 16.0 | 17.0 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20.0 | 21.0 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
6 | 24.0 | 25.0 | NaN | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 |
7 | 28.0 | 29.0 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
df2 = df.dropna(axis=0, how='any')
df2
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 1.0 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2.0 | 5.0 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8.0 | 9.0 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
4 | 16.0 | 17.0 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20.0 | 21.0 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
df3 = df.dropna(axis=1, how='any')
df3
a | b | year | date | ydate | 高温 | 低温 | |
---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2023 | 08-25 | 2023-08-25 | 15° | 10° |
1 | 2 | 5 | 2023 | 08-26 | 2023-08-26 | 16° | 11° |
2 | 8 | 9 | 2023 | 08-27 | 2023-08-27 | 20° | 18° |
3 | 12 | 13 | 2023 | 08-28 | 2023-08-28 | 19° | 17° |
4 | 16 | 17 | 2023 | 08-29 | 2023-08-29 | 20° | 10° |
5 | 20 | 21 | 2023 | 08-30 | 2023-08-30 | 22° | 18° |
6 | 24 | 25 | 2023 | 08-31 | 2023-08-31 | 24° | 20° |
7 | 28 | 29 | 2023 | 09-01 | 2023-09-01 | 23° | 17° |
df4 = df.copy()
df4["unknow"] = np.nan
df4
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | unknow | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 | NaN |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 | NaN |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 | NaN |
3 | 12 | 13 | NaN | NaN | 2023 | 08-28 | 2023-08-28 | 19° | 17° | 优 | NaN |
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 | NaN |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 | NaN |
6 | 24 | 25 | NaN | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 | NaN |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN | NaN |
df4 = df4.dropna(axis=1, how='all')
df4
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
3 | 12 | 13 | NaN | NaN | 2023 | 08-28 | 2023-08-28 | 19° | 17° | 优 |
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
6 | 24 | 25 | NaN | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
删除指定某一列有nan,这样即可定位到所在行的index,然后对该index进行drop操作即可
df[np.isnan(df['c'])].index #定位某一列是否有nan
输出:
Index([3, 6], dtype='int64')
# 直接drop对应indx即可删除该行
df5 = df.drop(df[np.isnan(df['c'])].index)
df5
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
df6 = df.dropna(axis=0, how='all', subset=['c', 'd'])
df6
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
6 | 24 | 25 | NaN | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
df_nan = df[df['d'].isna()]
df_nan
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
3 | 12 | 13 | NaN | NaN | 2023 | 08-28 | 2023-08-28 | 19° | 17° | 优 |
df_notnan = df[~df['d'].isna()]
df_notnan
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
6 | 24 | 25 | NaN | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
df9 = df.fillna('') # 将nan替换为''
df9
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2.0 | 3.0 | 2023 | 08-25 | 2023-08-25 | 15° | 10° | 优 |
1 | 2 | 5 | 2.0 | 7.0 | 2023 | 08-26 | 2023-08-26 | 16° | 11° | 良 |
2 | 8 | 9 | 10.0 | 11.0 | 2023 | 08-27 | 2023-08-27 | 20° | 18° | 优 |
3 | 12 | 13 | 2023 | 08-28 | 2023-08-28 | 19° | 17° | 优 | ||
4 | 16 | 17 | 18.0 | 19.0 | 2023 | 08-29 | 2023-08-29 | 20° | 10° | 差 |
5 | 20 | 21 | 22.0 | 23.0 | 2023 | 08-30 | 2023-08-30 | 22° | 18° | 良 |
6 | 24 | 25 | 27.0 | 2023 | 08-31 | 2023-08-31 | 24° | 20° | 优 | |
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° |
nan_df = df[(df['高温'].str.len() <= 2) | (df['空气质量'].isna())]
nan_df
a | b | c | d | year | date | ydate | 高温 | 低温 | 空气质量 | |
---|---|---|---|---|---|---|---|---|---|---|
7 | 28 | 29 | 30.0 | 31.0 | 2023 | 09-01 | 2023-09-01 | 23° | 17° | NaN |
pandas分组聚合|agg|transform|apply
缺省值判断 pd.isnull, pd.isna, pd.notna, pd.notnull, np.isnan, math.isnan 区别
pandas中DataFrame字典互转
pandas.concat实现DataFrame竖着拼接、横着拼接
pandas|找出某列最大值的所在的行
DataFrame——指定位置增加删除一行一列
AttributeError: module ‘pandas‘ has no attribute ‘isna‘
pandas–Series.str–字符串处理
list、ndarry、Series、DataFrame的创建、索引和选取
Series和DataFrame复合索引的创建和取值
pd.notnull
Pandas|DataFrame| 处理DataFrame中的inf值
由字典dictionary或列表list创建dataframe
pandas|DataFrame排序及分组排序
Pandas|DataFrame| DataFrame中的nan值处理