使用dropna()函数就可以去掉dataframe中的空值。这里就直接用的官方文档里面的例子。
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
删除有空值的列。
df.dropna(axis='columns')/df.dropna(axis=1)
name
0 Alfred
1 Batman
2 Catwoman
>>> df['sex'] = [np.nan,np.nan,np.nan]
>>> df
name toy born sex
0 Alfred NaN NaT NaN
1 Batman Batmobile 1940-04-25 NaN
2 Catwoman Bullwhip NaT NaN
>>> df.dropna(how='all',axis=1)
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
>>>
df.dropna(subset=['name', 'born'])
name toy born
1 Batman Batmobile 1940-04-25
>>> df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
>>> df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
>>>
需要将删除空值的dataframe赋值给新的变量,或者将inplcace参数赋值为true来改变原来的df(dataframe的许多函数里都有这个参数)。
>>> df2 = df.dropna()
>>> df2
name toy born
1 Batman Batmobile 1940-04-25
>>>
>>> df.dropna(inplace=True)
>>> df
name toy born
1 Batman Batmobile 1940-04-25
>>>