数据结构-串KMP算法

简介

KMP的基本介绍大家可以百度。在这会前你要理解字符串的前缀和后缀的概念。KMP算法在模式匹配的过程中的主要有点是:**主串(str)不用回退,只需要模式串(pat)**回退即可。
而且模式串回退是有一定规律的。
每次匹配不成功的时候要怎么回退呢?回退多少呢?

next数组

举个例子。模式串: a a b a a c aabaac aabaac.

0 1 2 3 4 5
模式串 a a b a a c
next -1 0 1 0 1 2
nextval -1 -1 1 -1 -1 2

这我们存储模式串的数组下标是从0开始的,我们next第一个要写成-1。后面的数字,就是从当前的(当前的字符不算)下标开始往前找找到最大长度的后缀等于前缀的情况再加1就是此时next[j]的值。
当我们的模式串下标是从1开始的,我们就next第一个要写成0。后面的数字,就是从当前的(当前的字符不算)下标开始往前找找到最大长度的后缀等于前缀的情况就是此时next[j]的值。
公式
n e x t [ j ] = { 0 , j = 1 m a x ( k ∣ 1 < k < j 且 ′ p 1 . . . p k − 1 ′ = ′ p j − k + 1 . . . p j − 1 ′ ) , 此 集 合 不 空 时 1 , 其 他 情 况 \begin{aligned} next[j]=\left\{ \begin{array}{llc} 0, & & j=1\\ max(k|1next[j]=0max(k1<k<jp1...pk1=pjk+1...pj1)1j=1
n e x t v a l [ j ] = n e x t v q l [ n e x t [ j ] ] 其 中 j = 1... l e n ( p a t ) − 1 nextval[j] = nextvql[next[j]] 其中j = 1...len(pat)-1 nextval[j]=nextvql[next[j]]j=1...len(pat)1

例题LeetCode28

Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:
Input: haystack = “hello”, needle = “ll”
Output: 2

Example 2:
Input: haystack = “aaaaa”, needle = “bba”
Output: -1

Clarification:
What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().

Constraints:

  • haystack and needle consist only of lowercase English characters.

code

def strStr(haystack: str, needle: str) -> int:

	#  get  next
    i, j = 0, -1
    next = [0]*(len(needle)+1)
    next[0] = -1
    while i < len(needle):
        if j == -1 or needle[i] == needle[j]:
            i += 1
            j += 1
            next[i] = j
        else:
            j = next[j]
    print(next)

	#  get  nextval
    i = 1
    while i<len(needle)-1:
        j = next[i]
        if needle[i] == needle[j]:
            next[i] = next[j]
        i += 1
    print(next)

	#  KMP 
    i, j = 0, 0
    while i < len(haystack) and j < len(needle):
        if j == -1 or haystack[i] == needle[j]:
            i += 1
            j += 1
        else:
            j = next[j]

    if j >= len(needle):
        print(i-len(needle))
        return i-len(needle)
    else:
        return -1


haystack = "aabaabaabaac"
needle = "aabaac"
strStr(haystack, needle)

你可能感兴趣的:(数据结构,LeetCode,字符串,字符串,数据结构)