Jupyter 数据重复值处理

import os
import pandas as pd
import numpy as np
os.chdir('D:\Workspaces\Jupyter')
df = pd.read_excel('data_test.xlsx')
df
# 重复的是true
df.duplicated()
# 显示
df[df.duplicated()]
# 按照这两项查是否有重复值
df.duplicated(subset=['EventSubType','EventType'])
df[df.duplicated(subset=['EventSubType','EventType'])]
# 把最后一个设为保留值,前面的算重复值
df.duplicated(subset=['EventSubType','EventType'],keep='last')
# 完全重复的数量
np.sum(df.duplicated())
# 删除完全重复的
df.drop_duplicates()
# 删除这两项重复的
df.drop_duplicates(subset=['EventSubType','EventType'])

 

你可能感兴趣的:(python学习)