Python判断字符串的相似度

Python diffib库

import difflib
query_str = '广州公安'
s1 = '广州市邮政局'
s2 = '广州市公安局'
s3 = '广州市检查院'
print(difflib.SequenceMatcher(None, query_str, s1).quick_ratio())
print(difflib.SequenceMatcher(None, query_str, s2).quick_ratio())
print(difflib.SequenceMatcher(None, query_str, s3).quick_ratio())

返回的结果为:
0.4
0.8
0.4

在比较里,常常发现一些空格字符没有用,想把它们丢掉,有没有方法呢?肯定是有的,这里就介绍一个SequenceMatcher的使用。例子如下:

from difflib import SequenceMatcher
 
 
def show_results(match):
    print('  a    = {}'.format(match.a))
    print('  b    = {}'.format(match.b))
    print('  size = {}'.format(match.size))
    i, j, k = match
    print('  A[a:a+size] = {!r}'.format(A[i:i + k]))
    print('  B[b:b+size] = {!r}'.format(B[j:j + k]))
 
 
A = " abcd"
B = "abcd abcd"
 
print('A = {!r}'.format(A))
print('B = {!r}'.format(B))
 
print('\nWithout junk detection:')
s1 = SequenceMatcher(None, A, B)
match1 = s1.find_longest_match(0, len(A), 0, len(B))
show_results(match1)
 
print('\nTreat spaces as junk:')
s2 = SequenceMatcher(lambda x: x == " ", A, B)
match2 = s2.find_longest_match(0, len(A), 0, len(B))
show_results(match2)

结果输出如下:
A = ' abcd'
B = 'abcd abcd'

Without junk detection:
a = 0
b = 4
size = 5
A[a:a+size] = ' abcd'
B[b:b+size] = ' abcd'

Treat spaces as junk:
a = 1
b = 0
size = 4
A[a:a+size] = 'abcd'
B[b:b+size] = 'abcd'

ratio()函数:

返回序列相似性的度量,作为[0,1]范围内的浮点数

Differ()和compare()函数:

class difflib.Differ([ linejunk [,charjunk ] ] )

可选关键字参数linejunk和charjunk用于过滤函数(或None),默认值是None

你可能感兴趣的:(Python判断字符串的相似度)