pandas基础入门之部分值替换与缺失值处理

1.部分值替换

replace(to_replace=None, value=None, regex=False, inplace=False) 

  1. 直接指定方式 :to_replace 与 value 配套使用,表示 old -> new 单独值:to_replace = old, value = new    
  2. 相同长度list:to_replace = [old1, old2], value = [new1, new2]
  3. list->单值:to_replace = [old1, old2], value = new
  4. 指定相同列的dict:to_replace = {column1:old1, column2:old2},  value = {column1:new1, column2:new2}
  5. dict->单值:to_replace = {column1:old1, column2:old2}, value = new old
import pandas as pd
import numpy as np
data=pd.DataFrame({'qu1':[1,7,41,3,4],
                   'qu2':[1,9,4,37,4],
                   'qu3':[1,12,25,3,37]})
print(data)
data.replace(to_replace=[1,4],value=['replace1','replace4'],inplace=True)
print(data)



out:
   qu1  qu2  qu3
0    1    1    1
1    7    9   12
2   41    4   25
3    3   37    3
4    4    4   37
        qu1       qu2       qu3
0  replace1  replace1  replace1
1         7         9        12
2        41  replace4        25
3         3        37         3
4  replace4  replace4        37



data1=pd.DataFrame({'qu1':[1,7,41,3,4],
                   'qu2':[1,9,4,37,4],
                   'qu3':[1,12,25,3,37]})
print(data1)
data1.replace(to_replace={'qu1':1,'qu2':9},value={'qu1':'replace1','qu2':'replace9'},inplace=True)
print(data1)


out:
   qu1  qu2  qu3
0    1    1    1
1    7    9   12
2   41    4   25
3    3   37    3
4    4    4   37
        qu1       qu2  qu3
0  replace1         1    1
1         7  replace9   12
2        41         4   25
3         3        37    3
4         4         4   37

2.缺失值处理。

  • 缺失值处理步骤: 缺失值确认:isnull, notnull ,缺失值处理:dropna, fillna
  • isnull() :返回一个含有布尔值的对象,表示哪些值是NA, Notnull() :isnull的否定式。
import  pandas as pd

import  numpy as np

df_obj=pd.DataFrame([[1,6.5,3],[4.6,np.nan,2.4],[np.nan,np.nan,3.9],[np.nan
    ,8.5,np.nan]],columns=['col1','col2','col3'])
print(df_obj)
out:
  col1  col2  col3
0   1.0   6.5   3.0
1   4.6   NaN   2.4
2   NaN   NaN   3.9
3   NaN   8.5   NaN
print(df_obj.isnull())
out:
    col1   col2   col3
0  False  False  False
1  False   True  False
2   True   True  False
3   True  False  
print(df_obj.notnull())
out:
    col1   col2   col3
0   True   True   True
1   True  False   True
2  False  False   True
3  False   True  False


  • dropna() 判断指定轴中是否存在缺失数据对轴进行过滤,     可通过阈值调节对缺失值的容忍度。

pandas基础入门之部分值替换与缺失值处理_第1张图片

 pandas基础入门之部分值替换与缺失值处理_第2张图片

 

import  pandas as pd

import  numpy as np

df_obj=pd.DataFrame([[1,6.5,3],[4.6,np.nan,2.4],[np.nan,np.nan,3.9],[np.nan
    ,8.5,np.nan]],columns=['col1','col2','col3'])
print(df_obj)
out:
  col1  col2  col3
0   1.0   6.5   3.0
1   4.6   NaN   2.4
2   NaN   NaN   3.9
3   NaN   8.5   NaN

# any 只要有一个就删除,axis=0 对行进行操作,axis=1,对列进行操作,thresh就是数量要相等 subset是作用范围
print(df_obj.dropna(how='any',axis=0))
out:
  col1  col2  col3
0   1.0   6.5   3.0

print(df_obj.dropna(how='all',subset=['col2']))
out:
 col1  col2  col3
0   1.0   6.5   3.0
3   NaN   8.5   NaN

print(df_obj.dropna(how='all',thresh=2))
out:
   col1  col2  col3
0   1.0   6.5   3.0
1   4.6   NaN   2.4

#只能用于列的填充
print(df_obj.fillna({'col1':9,'col2':8}))
  col1  col2  col3
0   1.0   6.5   3.0
1   4.6   8.0   2.4
2   9.0   8.0   3.9
3   9.0   8.5   NaN

# ffill就是从上往下填充,bfill就是从下往上填充
print(df_obj.fillna(method='ffill',limit=1))
   col1  col2  col3
0   1.0   6.5   3.0
1   4.6   6.5   2.4
2   4.6   NaN   3.9
3   NaN   8.5   3.
print(df_obj.replace(to_replace=[4.6,3],value=['replace0','5a']))
out:
     col1  col2 col3
0         1   6.5   5a
1  replace0   NaN  2.4
2       NaN   NaN  3.9
3       NaN   8.5  NaN

你可能感兴趣的:(数据分析之pandas,python,数据分析,pandas)