进行
数据清洗
时,经常会遇到缺失值
。 处理缺失值的一种方式是,直接删除
DataFrame.dropna(self, axis=0, how=‘any’, thresh=None, subset=None, inplace=False)
参数:
0.构建实例
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['zhao','qian','sun','li'],
'mark':[150,122,np.nan,32],'gender':['female',np.nan,np.nan,'male']})
df
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
1.axis
: axis=0、index,删除所在行,axis=1、columns,删除所在列。默认为行。default
df.dropna(axis=0) #删除空值所在行,1,2行被删掉
name mark gender
0 zhao 150.0 female
3 li 32.0 male
df.dropna(axis='columns')#删除空值所在列,mark、gender都被删掉
name
0 zhao
1 qian
2 sun
3 li
2.how
:选择删除的模式,有any和all两种。
any
:只要存在空值即删除所在行列
,默认为any。defaultall
:所在行或列如果全是空值
,则删除df.dropna(axis=0,how='any')#any模式
name mark gender
0 zhao 150.0 female
3 li 32.0 male
#为方便演示,新构建一下df
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
4 NaN NaN NaN
df.dropna(axis=0,how='all')#all模式,第5行被删除
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
3.thresh
:指定当一行或一列存在多少非空值
时,保留
该行或该列
df
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
4 NaN NaN NaN
df.dropna(axis=0,thresh=1) #当df中每行有至少1个非空值是,保留该行
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
df.dropna(axis=0,thresh=2)#当df中,每行至少有2个空值,保留该行
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
3 li 32.0 male
4.subset
:传入一个列表,指定相应的列名
,如果指定列内有空值
,则删除
所在行,可以和how搭配。
df
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
4 NaN NaN NaN
df.dropna(how='all',subset=['name','gender'])#name和gender列同为空值是,删除对应的行。
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
5.inplace
:执行操作后,将返回值赋值给df。default=“False”
df
name mark gender
0 zhao 150.0 female
1 qian 122.0 NaN
2 sun NaN NaN
3 li 32.0 male
4 NaN NaN NaN
df.dropna(inplace=True) #df已重新赋值
df
name mark gender
0 zhao 150.0 female
3 li 32.0 male
DataFrame.drop**(self,** labels=None**,** axis=0**,** index=None**,
** columns=None**,** level=None**,** inplace=False**,** errors='raise’)
参数
0.构建实例
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['张三','李四','王二','麻子','杜甫'],'mark':[120,111,135,150,151],'gender':['male','female','female','male',np.nan]})
df #构建出df实例
name mark gender
0 张三 120 male
1 李四 111 female
2 王二 135 female
3 麻子 150 male
4 杜甫 151 NaN
1.labels
:按照标签删除对应的行或列,接受单个标签或是一个列表。
df.drop('name',axis=1)#删除列需加上axis参数
mark gender
0 120 male
1 111 female
2 135 female
3 150 male
4 151 NaN
df.drop([1,3])#传入多个参数需使用列表
name mark gender
0 张三 120 male
2 王二 135 female
4 杜甫 151 NaN
2.axis
:指出需要删除的是行(0或index,)还s是列(1或columns),default = 0.默认为0
#效果同上,此处不演示了
3.index
\columns:0.21.0版本之后,可以通过直接指定index= 来代替“label= ,axis=”。可以传入单个标签或是一个列表。
df
name mark gender
0 张三 120 male
1 李四 111 female
2 王二 135 female
3 麻子 150 male
4 杜甫 151 NaN
df.drop(index= 1)#这里等价于df.drop(1,axis=0)
name mark gender
0 张三 120 male
2 王二 135 female
3 麻子 150 male
4 杜甫 151 NaN
df.drop(columns= ['name','gender'])#这里等价于df.drop(['name','gender'],axis=2)
mark
0 120
1 111
2 135
3 150
4 151
4.level
:针对多级标签。多级标签计算时,从0开始。
m_index1=pd.Index([("A","x1"),("A","x2"),("B","y1"),("B","y2"),("B","y3")],name=("class1","class2"))
#创建多级标签
df1=pd.DataFrame(np.random.randint(1,10,(5,3)),index=m_index1)
df1#构建一个df
0 1 2
class1 class2
A x1 6 6 6
x2 1 8 6
B y1 3 6 2
y2 5 9 2
y3 9 8 1
df1.drop(index = ['y1','y2','x1'],level=1)#删除多级索引,从0开始,第二个,所以level=1
0 1 2
class1 class2
A x2 1 8 6
B y3 9 8 1
5.inplace
:同dropna
中效果一样,删除数据后自动赋值。
6.errors
:如果传入的标签不存在
,会报错
,KeyError,errors可以忽略报错。可选择raise
和ignore
,default=raise
df
name mark gender
0 张三 120 male
1 李四 111 female
2 王二 135 female
3 麻子 150 male
4 杜甫 151 NaN
df.drop(columns= ['name','number'])#没有number列,结果会报错,keyerror
KeyError Traceback (most recent call last)
in
----> 1 df.drop(columns= ['name','number'])
df.drop(columns= ['name','number'],errors='ignore')#忽略报错,删除参数中有的列
mark gender
0 120 male
1 111 female
2 135 female
3 150 male
4 151 NaN
转载于:https://blog.csdn.net/lisnyuan/article/details/106560468?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522165483095816782350984195%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=165483095816782350984195&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduend~default-1-106560468-null-null.142v12control,157v13control&utm_term=dropna%E5%92%8Cdrop%E7%9A%84%E5%8C%BA%E5%88%AB&spm=1018.2226.3001.4187