在计算机中本没有绝对绝对相等的数据,所谓的相等只是精度允许的条件下相等!np.nan 原意为 not a number。
概括:
c = np.array([ 1., 2., np.nan, 3., 4.])
np.isnan(c)
>>> array([False, False, True, False, False])
np.nan in c # 不能用这种方式判断 ndarray 中是否存在 np.nan
>>> False
np.nan == np.nan
>>> False
# 因为
np.min(c)
>>> nan
np.sum(c)
>>> nan
# 所以也可以通过下面的方式来判断数组中是否存在np.nan
np.isnan(np.min(c))
>>> True
np.isnan(np.sum(c))
>>> True
np.nan + 3
>>> nan
a = np.array([1,2,3,np.nan])
np.isnan(a)
>>> array([False, False, False, True])
a = np.array([1,2,3,None])
np.isnan(a) # 报类型错误警告 None并不代表NaN值无法处理
>>> TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
a = np.array(['1','2','3',np.nan])
np.isnan(a) #类型错误警告,字符类型无法运算
>>> TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
np.nan == np.nan
>>> False
np.isnan(np.nan)
>>> True
np.nan is None
>>> False
type(np.nan)
>>> float
df1 = DataFrame(data=np.random.randint(0,20,size=(5,5)),columns=list("abcde"))
df1["b"].iloc[1] = None
df1["c"].iloc[2] = None
df1.loc[2, "d"] = np.nan # 显式地去设置nan不太好
df1
>>>
a b c d e
0 19 12.0 18.0 14 11
1 11 NaN 4.0 13 6
2 5 5.0 NaN NaN 0
3 9 3.0 8.0 13 13
4 5 2.0 9.0 8 6
可以看到两者都被处理成NaN值。
id(np.nan)
>>> 2345077159808
id(np.nan)
>>> 2345077159808
id(np.nan) == id(np.nan)
>>> True
np.nan is np.nan
>>> True
np.nan == np.nan
>>> False
Nullable 类型是pandas的扩展类型,用于表示标量的缺失值且不改变数据类型。
由于 NaN , None 与 NaT 三种缺失值在功能和类型上的混乱与交叉, Nullable 类型它的设计目的在于可以提供一种能够一致使用的缺失标识符,用于处理 NaN , None 与 NaT 三种缺失字符在不同情况下的类型转化,将三种缺失值都替换为统一的 NA 符号,且不改变数据类型。
df = pd.DataFrame({
'A':[1,5,3,2],'B':[2,np.nan,4,0],'C':[3,5,np.nan,1]})
df
>>>
A B C
0 1 2.0 3.0
1 5 NaN 5.0
2 3 4.0 NaN
3 2 0.0 1.0
df.sort_values(by=['B'])
df
>>>
A B C
0 1 2.0 3.0
1 5 NaN 5.0
2 3 4.0 NaN
3 2 0.0 1.0
df.sort_values(by=['B'], inplace=True)
df
>>>
A B C
3 2 0.0 1.0
0 1 2.0 3.0
2 3 4.0 NaN
1 5 NaN 5.0
df.sort_values(by=['B'], inplace=True, na_position='first')
df
>>>
A B C
1 5 NaN 5.0
3 2 0.0 1.0
0 1 2.0 3.0
2 3 4.0 NaN
math.py源码
np.nan is np.nan
>>> True
np.nan == np.nan
>>> False
math.nan is math.nan
>>> True
math.nan == math.nan
>>> False
np.nan is math.nan
>>> False
np.nan == math.nan # 两者不等
>>> False
np.isnan(math.nan)
>>> True
np.isnan(np.nan)
>>> True
type(math.nan)
>>> float
type(np.nan)
>>> float
math.isnan(-1000000000000000000000.11)
>>> False
np.isnan(-10000000000000000000000.11)
>>> False
先看下面几行代码:
pd.isnull
>>> <function pandas.core.dtypes.missing.isna(obj)>
pd.isna
>>> <function pandas.core.dtypes.missing.isna(obj)>
pd.isnull == pd.isna
>>> True
为什么?
首先看pandas 0.21版本的改变,官方文档原文如下:
In order to promote more consistency among the pandas API, we have
added additional top-level functions isna() and notna() that are
aliases for isnull() and notnull(). The naming scheme is now more
consistent with methods like .dropna() and .fillna(). Furthermore in
all cases where .isnull() and .notnull() methods are defined, these
have additional methods named .isna() and .notna(), these are included
for classes Categorical, Index, Series, and DataFrame. (GH15001).
The configuration option pd.options.mode.use_inf_as_null is deprecated,
and pd.options.mode. use_inf_as_na is added as a replacement.
官方文档已经说的很清楚,isna()和notna()是isnull()和notnull()的别名,它们的用法是一样的。
注意:在pandas 0.21之前的版本中是没有isna和notna的,如果pandas版本是之前的老版本,会报错AttributeError: module ‘pandas‘ has no attribute ‘isna‘
需要pip install --upgrade pandas 一下即可。具体可点击链接查看。
再看一些其他资料:
那么,为什么用两个名称不同的方法做相同的事情?
简单的说:
NULL represents the null object in R: it is a reserved word. NULL is
often returned by expressions and functions whose values are
undefined.NA is a logical constant of length 1 which contains a missing value
indicator. NA can be freely coerced to any other vector type except
raw. There are also constants NA_integer_, NA_real_, NA_complex_ and
NA_character_ of the other atomic vector types which support missing
values: all of these are reserved words in the R language.
df_f = pd.DataFrame({
'A':[1,3,np.nan],'B':[2,4,np.nan],'C':[3,5,np.nan]})
df_f
>>>
A B C
0 1.0 2.0 3.0
1 3.0 4.0 5.0
2 NaN NaN NaN
pd.DataFrame.isna(df_f)
>>>
A B C
0 False False False
1 False False False
2 True True True
pd.isna(df_f)
>>>
A B C
0 False False False
1 False False False
2 True True True
df_f.isna()
>>>
A B C
0 False False False
1 False False False
2 True True True
pd.DataFrame.isna('dog') # 报类型错误,参数要求是DataFrame类型
>>> TypeError: super(type, obj): obj must be an instance or subtype of type
pd.isna('dog')
>>> False
pd.isna('')
>>> False
pd.isna([])
>>> array([], dtype=bool)
pd.isna([np.nan, 1, 3])
>>> array([ True, False, False])
index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None,"2017-07-08"])
index
>>> DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'], dtype='datetime64[ns]', freq=None)
pd.isna(index)
>>> array([False, False, True, False])
pd.isna(None)
>>> True
pd.isna(np.nan)
>>> True