python查找两个字符串_python-查找两个字符串之间的公共子字符串

python-查找两个字符串之间的公共子字符串

我想比较2个字符串并保持匹配,在比较失败的地方分开。

因此,如果我有2个字符串-

string1 = apples

string2 = appleses

answer = apples

另一个示例,因为字符串可以有多个单词。

string1 = apple pie available

string2 = apple pies

answer = apple pie

我敢肯定有一种简单的Python方式可以做到这一点,但我无法解决,感谢任何帮助和解释。

14个解决方案

129 votes

为了完整起见,标准库中的find_longest_match提供了很多序列比较实用程序。 例如find_longest_match,它在字符串上使用时会找到最长的公共子字符串。 使用示例:

from difflib import SequenceMatcher

string1 = "apple pie available"

string2 = "come have some apple pies"

match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))

print(match) # -> Match(a=0, b=15, size=9)

print(string1[match.a: match.a + match.size]) # -> apple pie

print(string2[match.b: match.b + match.size]) # -> apple pie

RickardSjogren answered 2020-01-09T01:53:47Z

33 votes

def common_start(sa, sb):

""" returns the longest common substring from the beginning of sa and sb """

def _iter():

for a, b in zip(sa, sb):

if a == b:

yield a

else:

return

return ''.join(_iter())

>>> common_start("apple pie available", "apple pies")

'apple pie'

还是一个稍微陌生的方式:

def stop_iter():

"""An easy way to break out of a generator"""

raise StopIteration

def common_start(sa, sb):

return ''.join(a if a == b else stop_iter() for a, b in zip(sa, sb))

可能更容易阅读

def terminating(cond):

"""An easy way to break out of a generator"""

if cond:

return True

raise StopIteration

def common_start(sa, sb):

return ''.join(a for a, b in zip(sa, sb) if terminating(a == b))

Eric answered 2020-01-09T01:54:12Z

15 votes

它称为最长公共子串问题。 在这里,我提出了一个简单,易于理解但效率低下的解决方案。 为大型字符串生成正确的输出将花费很长时间,因为该算法的复杂度为O(N ^ 2)。

def longestSubstringFinder(string1, string2):

answer = ""

len1, len2 = len(string1), len(string2)

for i in range(len1):

match = ""

for j in range(len2):

if (i + j < len1 and string1[i + j] == string2[j]):

match += string2[j]

else:

if (len(match) > len(answer)): answer = match

match = ""

return answer

print longestSubstringFinder("apple pie available", "apple pies")

print longestSubstringFinder("apples", "appleses")

print longestSubstringFinder("bapples", "cappleses")

输出量

apple pie

apples

apples

thefourtheye answered 2020-01-09T01:53:27Z

10 votes

人们可能还会考虑使用os.path.commonprefix,它可以处理字符,因此可用于任何字符串。

import os

common = os.path.commonprefix(['apple pie available', 'apple pies'])

assert common == 'apple pie'

jonas answered 2020-01-09T01:54:31Z

5 votes

与Evo相同,但具有任意数量的要比较的字符串:

def common_start(*strings):

""" Returns the longest common substring

from the beginning of the `strings`

"""

def _iter():

for z in zip(*strings):

if z.count(z[0]) == len(z): # check all elements in `z` are the same

yield z[0]

else:

return

return ''.join(_iter())

SergeyR answered 2020-01-09T01:54:51Z

3 votes

使用第一个答案修复错误:

def longestSubstringFinder(string1, string2):

answer = ""

len1, len2 = len(string1), len(string2)

for i in range(len1):

for j in range(len2):

lcs_temp=0

match=''

while ((i+lcs_temp < len1) and (j+lcs_temp

match += string2[j+lcs_temp]

lcs_temp+=1

if (len(match) > len(answer)):

answer = match

return answer

print longestSubstringFinder("dd apple pie available", "apple pies")

print longestSubstringFinder("cov_basic_as_cov_x_gt_y_rna_genes_w1000000", "cov_rna15pcs_as_cov_x_gt_y_rna_genes_w1000000")

print longestSubstringFinder("bapples", "cappleses")

print longestSubstringFinder("apples", "apples")

user7733798 answered 2020-01-09T01:55:11Z

1 votes

尝试:

import itertools as it

''.join(el[0] for el in it.takewhile(lambda t: t[0] == t[1], zip(string1, string2)))

它从两个字符串的开头进行比较。

Birei answered 2020-01-09T01:55:35Z

1 votes

这不是最有效的方法,但这是我能想到的并且有效的方法。 如果有人可以改善它,请这样做。 它所做的是创建一个矩阵并将字符匹配的位置放1。 然后,它扫描矩阵以找到最长的对角线1s,并跟踪其开始和结束的位置。 然后,它以起点和终点位置作为参数返回输入字符串的子字符串。

注意:这只会找到一个最长的公共子字符串。 如果不止一个,则可以创建一个数组将结果存储在其中并返回该值。此外,它区分大小写,因此(Apple pie,apple pie)将返回pple pie。

def longestSubstringFinder(str1, str2):

answer = ""

if len(str1) == len(str2):

if str1==str2:

return str1

else:

longer=str1

shorter=str2

elif (len(str1) == 0 or len(str2) == 0):

return ""

elif len(str1)>len(str2):

longer=str1

shorter=str2

else:

longer=str2

shorter=str1

matrix = numpy.zeros((len(shorter), len(longer)))

for i in range(len(shorter)):

for j in range(len(longer)):

if shorter[i]== longer[j]:

matrix[i][j]=1

longest=0

start=[-1,-1]

end=[-1,-1]

for i in range(len(shorter)-1, -1, -1):

for j in range(len(longer)):

count=0

begin = [i,j]

while matrix[i][j]==1:

finish=[i,j]

count=count+1

if j==len(longer)-1 or i==len(shorter)-1:

break

else:

j=j+1

i=i+1

i = i-count

if count>longest:

longest=count

start=begin

end=finish

break

answer=shorter[int(start[0]): int(end[0])+1]

return answer

Rali Tsanova answered 2020-01-09T01:56:01Z

1 votes

def matchingString(x,y):

match=''

for i in range(0,len(x)):

for j in range(0,len(y)):

k=1

# now applying while condition untill we find a substring match and length of substring is less than length of x and y

while (i+k <= len(x) and j+k <= len(y) and x[i:i+k]==y[j:j+k]):

if len(match) <= len(x[i:i+k]):

match = x[i:i+k]

k=k+1

return match

print matchingString('apple','ale') #le

print matchingString('apple pie available','apple pies') #apple pie

radhikesh93 answered 2020-01-09T01:56:16Z

0 votes

返回第一个最长的公共子字符串:

def compareTwoStrings(string1, string2):

list1 = list(string1)

list2 = list(string2)

match = []

output = ""

length = 0

for i in range(0, len(list1)):

if list1[i] in list2:

match.append(list1[i])

for j in range(i + 1, len(list1)):

if ''.join(list1[i:j]) in string2:

match.append(''.join(list1[i:j]))

else:

continue

else:

continue

for string in match:

if length < len(list(string)):

length = len(list(string))

output = string

else:

continue

return output

xXDaveXx answered 2020-01-09T01:56:36Z

0 votes

首先,根据itertools的成对配方改编一个辅助函数以生成子字符串。

import itertools

def n_wise(iterable, n = 2):

'''n = 2 -> (s0,s1), (s1,s2), (s2, s3), ...

n = 3 -> (s0,s1, s2), (s1,s2, s3), (s2, s3, s4), ...'''

a = itertools.tee(iterable, n)

for x, thing in enumerate(a[1:]):

for _ in range(x+1):

next(thing, None)

return zip(*a)

然后是一个对子字符串进行迭代的函数,最长的优先,然后测试成员资格。 (未考虑效率)

def foo(s1, s2):

'''Finds the longest matching substring

'''

# the longest matching substring can only be as long as the shortest string

#which string is shortest?

shortest, longest = sorted([s1, s2], key = len)

#iterate over substrings, longest substrings first

for n in range(len(shortest)+1, 2, -1):

for sub in n_wise(shortest, n):

sub = ''.join(sub)

if sub in longest:

#return the first one found, it should be the longest

return sub

s = "fdomainster"

t = "exdomainid"

print(foo(s,t))

>>>

domain

>>>

wwii answered 2020-01-09T01:57:01Z

0 votes

这是教室问题,称为“最长序列查找器”。 我给出了一些对我有用的简单代码,我的输入也是一个序列列表,该序列也可以是字符串,可能会帮助您:

def longest_substring(list1,list2):

both=[]

if len(list1)>len(list2):

small=list2

big=list1

else:

small=list1

big=list2

removes=0

stop=0

for i in small:

for j in big:

if i!=j:

removes+=1

if stop==1:

break

elif i==j:

both.append(i)

for q in range(removes+1):

big.pop(0)

stop=1

break

removes=0

return both

Bantu Manjunath answered 2020-01-09T01:57:21Z

0 votes

def LongestSubString(s1,s2):

left = 0

right =len(s2)

while(left

if(s2[left] not in s1):

left = left+1

else:

if(s2[left:right] not in s1):

right = right - 1

else:

return(s2[left:right])

s1 = "pineapple"

s2 = "applc"

print(LongestSubString(s1,s2))

user3838498 answered 2020-01-09T01:57:36Z

0 votes

**Return the comman longest substring**

def longestSubString(str1, str2):

longestString = ""

maxLength = 0

for i in range(0, len(str1)):

if str1[i] in str2:

for j in range(i + 1, len(str1)):

if str1[i:j] in str2:

if(len(str1[i:j]) > maxLength):

maxLength = len(str1[i:j])

longestString = str1[i:j]

return longestString

Jagat Singh answered 2020-01-09T01:57:52Z

你可能感兴趣的:(python查找两个字符串)