python删除两个excel表中的相同元素_Python数据处理--删除重复项、数值替换和表合并...

导入需要的包:numpy、pandas

import numpy as py

import pandas as pd

创建一个表:df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006],

"date":pd.date_range('20130102', periods=6),

"city":['Beijing ', 'SH', ' guangzhou ', 'Shenzhen', 'shanghai', 'Beijing '],

"age":[23,44,54,32,34,32],

"category":['100-A','100-B','110-A','110-C','210-A','130-F'],

"price":[1200,np.nan,2133,5433,np.nan,4432]},

columns =['id','date','city','category','age','price'])

得到如下表:

Python处理重复数据

drop_duplicates函数删除重复值。以city列为例,city字段中存在重复值。默认情况下drop_duplicates()将删除后出现的重复值。增加keep=‘last’参数后将删除最先出现的重复值,保留最后的值。下面是具体的代码和比较结果。df["city"].drop_duplicates()保

你可能感兴趣的:(python删除两个excel表中的相同元素_Python数据处理--删除重复项、数值替换和表合并...)