Python 之 Pandas (四)处理丢失数据

代码:

import numpy as np
import pandas as pd

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.arange(24).reshape((6, 4)), index=dates, columns=['A', 'B', 'C', 'D'])
df.iloc[0, 1] = np.nan
df.iloc[1, 2] = np.nan
print(df)

运行结果:

             A     B     C   D
2013-01-01   0   NaN   2.0   3
2013-01-02   4   5.0   NaN   7
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23

代码:

# 法一
# 丢掉行或者列
print(df.dropna(axis=0, how='any'))  # how=['any','all']
# 法二
# 填充
print(df.fillna(value=1))

运行结果:

             A     B     C   D
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23
             A     B     C   D
2013-01-01   0   1.0   2.0   3
2013-01-02   4   5.0   1.0   7
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23

代码:

# 检查有没有缺失
print(df.isnull())
print(np.any(df.isnull()) == True)

运行结果:

True

你可能感兴趣的:(Python 之 Pandas (四)处理丢失数据)