python两字段模糊匹配_2列之间的模糊匹配(Python)

I have a pandas dataframe called "df_combo" which contains columns "worker_id", "url_entrance", "company_name". I am trying to produce an output column that would tell me if the URLs in "url_entrance" column contains any word in "company_name" column. Even a close match like fuzzywuzzy would work.

For example, if the URL is "www.grandhotelseattle.com" and the "company_name" is "Hotel Prestige Seattle", then the fuzz ratio might be somewhere 70-80.

I have tried the following script:

>>>fuzz.ratio(df_combo['url_entrance'],df_combo['company_name'])

but it returns only 1 number which is the overall fuzz ratio for the whole column. I would like to have fuzz ratio for every row and store those ratios in a new column.

解决方案

Thanks everyone for your inputs. I have solved my problem! The link that "agg3l" provided was helpful. The "TypeError" I saw was because either the "url_entrance" or "company_name" has some floating types in certain rows. I converted both columns to string using the following scripts, re-ran the fuzz.ratio script and got it to work!

df_combo['url_entrance']=df_combo['url_entrance'].astype(str)

df_combo['company_name']=df_combo['company_name'].astype(str)

你可能感兴趣的:(python两字段模糊匹配)