>>> data=pd.merge(data,userfeature,on='uid',how='inner')
>>> data.info()
Index: 0 entries
Data columns (total 33 columns):
aid 0 non-null int64
uid 0 non-null object
label 0 non-null int64
advertiserId 0 non-null int64
campaignId 0 non-null int64
creativeId 0 non-null int64
creativeSize 0 non-null int64
adCategoryId 0 non-null int64
productId 0 non-null int64
productType 0 non-null int64
LBS 0 non-null object
age 0 non-null object
appIdAction 0 non-null object
appIdInstall 0 non-null object
carrier 0 non-null object
consumptionAbility 0 non-null object
ct 0 non-null object
education 0 non-null object
gender 0 non-null object
house 0 non-null object
interest1 0 non-null object
interest2 0 non-null object
interest3 0 non-null object
interest4 0 non-null object
interest5 0 non-null object
kw1 0 non-null object
kw2 0 non-null object
kw3 0 non-null object
marriageStatus 0 non-null object
os 0 non-null object
topic1 0 non-null object
topic2 0 non-null object
topic3 0 non-null object
dtypes: int64(9), object(24)
memory usage: 0.0+ bytes
如上所示,要判断两个dataframe中指定列有无相同的数值(或对象)可以用inner去判断
>>> ojbk=pd.DataFrame({'x1':[11,22,33],
'x2':[55,66,77]})
>>> ojb=pd.DataFrame({'x1':[44,55,66],
'x2':[88,99,100]})
>>> ooo=pd.merge(ojbk,ojb,on='x2',how='inner')
>>> ooo.info()
Index: 0 entries
Data columns (total 3 columns):
x1_x 0 non-null int64
x2 0 non-null int64
x1_y 0 non-null int64
dtypes: int64(3)
memory usage: 0.0+ bytes
>>> ojb=pd.DataFrame({'x1':[44,55,66],
'x2':[77,99,100]})
>>> ooo=pd.merge(ojbk,ojb,on='x2',how='inner')
>>> ooo
x1_x x2 x1_y
0 33 77 44
>>>
>>> ooo.info()
Int64Index: 1 entries, 0 to 0
Data columns (total 3 columns):
x1_x 1 non-null int64
x2 1 non-null int64
x1_y 1 non-null int64
dtypes: int64(3)
memory usage: 32.0 bytes