Pandas —— 处理缺失数据dropna( )和fillna( )

dropna( )

对于Serial对象

丢弃带有NAN的所有项

In [152]: data=pd.Series([1,np.nan,5,np.nan])

In [153]: data
Out[153]:
0    1.0
1    NaN
2    5.0
3    NaN
dtype: float64

In [154]: data.dropna()
Out[154]:
0    1.0
2    5.0
dtype: float64

对于DataFrame对象

丢弃带有NAN的行

In [19]: data=pd.DataFrame([[1,5,9,np.nan],[np.nan,3,7,np.nan],[6,np.nan,2,np.nan]
    ...: ,[np.nan,np.nan,np.nan,np.nan],[1,2,3,np.nan]])

In [20]: data
Out[20]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  NaN  3.0  7.0 NaN
2  6.0  NaN  2.0 NaN
3  NaN  NaN  NaN NaN
4  1.0  2.0  3.0 NaN

In [21]: data.dropna()
Out[21]:
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []

丢弃所有元素都是NAN的行

In [22]: data.dropna(how='all')
Out[22]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  NaN  3.0  7.0 NaN
2  6.0  NaN  2.0 NaN
4  1.0  2.0  3.0 NaN

丢弃所有元素都是NAN的列


In [23]: data.dropna(axis=1,how='all')
Out[23]:
     0    1    2
0  1.0  5.0  9.0
1  NaN  3.0  7.0
2  6.0  NaN  2.0
3  NaN  NaN  NaN
4  1.0  2.0  3.0

只保留至少有3个非NAN值的行

In [24]: data.dropna(thresh=3)
Out[24]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
4  1.0  2.0  3.0 NaN

fillna( )

以常数替换NAN值

In [25]: data.fillna(0)
Out[25]:
     0    1    2    3
0  1.0  5.0  9.0  0.0
1  0.0  3.0  7.0  0.0
2  6.0  0.0  2.0  0.0
3  0.0  0.0  0.0  0.0
4  1.0  2.0  3.0  0.0

后向填充

In [27]: data.fillna(method='ffill')
Out[27]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  1.0  3.0  7.0 NaN
2  6.0  3.0  2.0 NaN
3  6.0  3.0  2.0 NaN
4  1.0  2.0  3.0 NaN

后项填充且可以连续填充的最大数量为1

In [28]: data.fillna(method='ffill',limit=1)
Out[28]:
     0    1    2   3
0  1.0  5.0  9.0 NaN
1  1.0  3.0  7.0 NaN
2  6.0  3.0  2.0 NaN
3  6.0  NaN  2.0 NaN
4  1.0  2.0  3.0 NaN
方法 说明
dropna 对缺失的数据进行过滤
fillna 用指定值或插值的方法填充缺失数据
isnull 判断数据是否缺失
notnull isnull的否定式

你可能感兴趣的:(Pandas)