【跟着stackoverflow学Pandas】- 删除带有NaN的行

最近做一个系列博客,跟着stackoverflow学Pandas。

专栏地址:http://blog.csdn.net/column/details/16726.html

以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序:
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15

How to drop rows of Pandas DataFrame whose value in certain columns is NaN - 删除带有NaN的行

数据准备

我们随机生成了10x3列的数据,然后针对某些数据赋值 NaN。

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,3), columns=['col1', 'col2', 'col3'])

df.iloc[::2,0] = np.nan
df.iloc[::4,1] = np.nan
df.iloc[::3,2] = np.nan
print df

#        col1      col2      col3
# 0       NaN       NaN       NaN
# 1 -0.498336 -0.960804  0.705309
# 2       NaN -2.120032  2.123329
# 3  0.791883 -0.283840       NaN
# 4       NaN       NaN -1.241788
# 5 -0.399644 -0.968515 -1.509056
# 6       NaN  0.897637       NaN
# 7  1.826128  1.015091 -0.497022
# 8       NaN       NaN -1.889871
# 9  0.379287 -1.762229       NaN

pandas.notnull

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.notnull.html

可以接受Series 或者 DataFrame 类型的数据

pandas.notnull 被设计用来取代 np.isfinite / numpy.isnan

pd.notnull(df['col1'])

# 0    False
# 1     True
# 2    False
# 3     True
# 4    False
# 5     True
# 6    False
# 7     True
# 8    False
# 9     True
# Name: col1, dtype: bool

print pd.notnull(df)

#     col1   col2   col3
# 0  False  False  False
# 1   True   True   True
# 2  False   True   True
# 3   True   True  False
# 4  False  False   True
# 5   True   True   True
# 6  False   True  False
# 7   True   True   True
# 8  False  False   True
# 9   True   True  False

np.isfinite / numpy.isnan

np.isfinite 会对数据进行判断,如果是有限数据返回True。我们可以通过对不同列的bool值组合来满足我们的取值要求。
numpy.isnan 判断是否是NaN

np.isfinite(df['col1'])

# 1    True
# 3    True
# 5    True
# 7    True
# 9    True
# Name: col1, dtype: bool

df1 = df[np.isfinite(df['col1'])]
print df1

#        col1      col2      col3
# 1 -0.498336 -0.960804  0.705309
# 3  0.791883 -0.283840       NaN
# 5 -0.399644 -0.968515 -1.509056
# 7  1.826128  1.015091 -0.497022
# 9  0.379287 -1.762229       NaN

drop

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

drop 可以接受多个参数:

axis : {0 or ‘index’, 1 or ‘columns’}, or tuple/list thereof
Pass tuple or list to drop on multiple axes

how : {‘any’, ‘all’}
any : if any NA values are present, drop that label
all : if all values are NA, drop that label

thresh : int, default None
int value : require that many non-NA values

subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include

inplace : boolean, default False
If True, do operation inplace and return None.

# 默认是删除有NaN的行
print df.dropna() 

#        col1      col2      col3
# 1  1.944899 -1.792510 -0.612904
# 5 -0.609380  1.087689 -1.145582
# 7 -2.045037  1.043837  0.429135

print df.dropna(how='all')  #删除全部是NaN的行
#        col1      col2      col3
# 1  1.944899 -1.792510 -0.612904
# 2       NaN  0.780487 -1.239197
# 3 -1.050320 -0.121033       NaN
# 4       NaN       NaN -0.537213
# 5 -0.609380  1.087689 -1.145582
# 6       NaN -0.721761       NaN
# 7 -2.045037  1.043837  0.429135
# 8       NaN       NaN -0.096989
# 9  1.514520  0.224193       NaN

更多的可以参考,drop的官方说明。

你可能感兴趣的:(技术文档,python,pandas)