用python如何删除excel_如何在python中基于多个条件删除excel文件?

我不知道我是否正确回答了你的问题,所以我会提供一些不同的方法。在

如果要删除包含相同id、Type和Time值的所有行,可以使用以下方法:frame=pd.read_excel(io=r"D:\xxxxxx\test.xlsx")

df=pd.DataFrame(frame)

drop_dup=df.drop_duplicates(subset=("id","Type","Time"))

print(drop_dup)

结果是:

^{pr2}$

这意味着有7行具有完全相同的类型、id和时间。

如果要删除完全相同的行(合并所有列),则会得到所需的结果:df=df.drop_duplicates()

此外:dup=df.duplicated(subset=("id","Type","Time"))

返回一个True/False数组,该数组指示行是否重复0 False

1 False

2 False

3 False

4 False

5 False

6 True

7 False

如果您想知道数据帧的哪些单个值是重复的,请使用:dupl_val=df.apply(pd.Series.duplicated,axis=1)

id Duplicate 1 Duplicate 2 Total Duplicates Time Type Attribute

0 False False False False False False False

1 False False False False False False False

2 False False False False False False False

3 False False False False False False False

4 False False False True False False False

5 False False True True False False False

6 False False False True False False False

打电话的原因pd系列复制此方法应用于DataFrame的轴1,即每个DataFrame列。DataFrame列是Pandas系列对象。在

如果您不想删除行,而只想指出哪些值是重复的,请使用:dupl_val=df.apply(pd.Series.duplicated,axis=1)

df=df.where(~dupl_val,"duplicate")

print(df)

id Duplicate 1 Duplicate 2 Total Duplicates \

0 121349100 NaN NaN NaN

1 121350610 NaN NaN NaN

2 124426041 NaN NaN NaN

3 124436734 NaN NaN NaN

4 124451775 1 NaN duplicate

5 124451775 1 duplicate duplicate

6 124451775 NaN 1 duplicate

Time Type Attribute

0 2017-04-19 18:08:00 Tea NaN

1 2017-04-19 18:08:00 Tea NaN

2 2017-05-05 12:21:00 Tea NaN

3 2017-04-25 15:20:00 Coffee NaN

4 2017-04-05 21:04:00 Coffee No

5 2017-06-05 07:38:00 Tea No

6 2017-04-05 21:04:00 Coffee NaN

编辑:

如果您只想将属性列设置为一个特殊值(我选择了“复制”),如果一行中的“id”、“Type”、“Time”值与另一行重复,并且不想更改其余列的值,则此代码应提供所需的结果:frame=pd.read_excel(io=r"D:\xxxxx\test.xlsx")

df=pd.DataFrame(frame)

dup=df.duplicated(subset=("id","Type","Time"))

duplicate="duplicate"

for i in range(len(dup)):

if dup[i]==True:

df.loc[i,"Attribute"]=duplicate

print(df)

id Duplicate 1 Duplicate 2 Total Duplicates \

0 121349100 NaN NaN NaN

1 121350610 NaN NaN NaN

2 124426041 NaN NaN NaN

3 124436734 NaN NaN NaN

4 124451775 1.0 NaN 1.0

5 124451775 1.0 1.0 1.0

6 124451775 NaN 1.0 1.0

7 124463136 NaN NaN NaN

Time Type Attribute

0 2017-04-19 18:08:00 Tea NaN

1 2017-04-19 18:08:00 Tea NaN

2 2017-05-05 12:21:00 Tea NaN

3 2017-04-25 15:20:00 Coffee NaN

4 2017-04-05 21:04:00 Coffee No

5 2017-06-05 07:38:00 Tea No

6 2017-04-05 21:04:00 Coffee duplicate

7 2017-06-05 05:40:00 Coffee NaN

[85 rows x 7 columns]

您可以看到,第6行(=原始excel文件中的第8行)包含第一个副本。在本例中,这是excel文件中第6行的副本。在

编辑2

在我的第二次编辑中,代码现在将把所有重复项(也是第一项)标记为“重复项”。同时,代码不再搜索所有三列(id、Time、Type),而是查找(id和Time)或(id和Type)或(Time和Type)。因此,这三个会议的所有组合dup=[df.duplicated(subset=(i),keep=False) for i in [("id","Type"),("id","Time"),("Time","Type")]]

duplicate="duplicate"

for i in range(len(dup)):

for j in range(len(dup[i])):

if dup[i][j]==True:

df.loc[j,"Attribute"]=duplicate

print(df)

|id Duplicate 1 Duplicate 2 Total Duplicates \

0 121349100 NaN NaN NaN

1 121350610 NaN NaN NaN

2 124426041 NaN NaN NaN

3 124436734 NaN NaN NaN

4 124451775 1.0 NaN 1.0

5 124451775 1.0 1.0 1.0

6 124451775 NaN 1.0 1.0

Time Type Attribute

0 2017-04-19 18:08:00 Tea duplicate

1 2017-04-19 18:08:00 Tea duplicate

2 2017-05-05 12:21:00 Tea NaN

3 2017-04-25 15:20:00 Coffee NaN

4 2017-04-05 21:04:00 Coffee duplicate

5 2017-06-05 07:38:00 Tea No

6 2017-04-05 21:04:00 Coffee duplicate

有关此函数的更多信息,请阅读:drop_duplicates,duplicated用于Series和DataFrames(主要区别在于,对于Series,函数应用于单个值,对于dataframe,它们分别应用于指定列的行)

你可能感兴趣的:(用python如何删除excel_如何在python中基于多个条件删除excel文件?)