Algorithms: think in Functional Programming way - Boyer Moore Pattern Matching Part 1

Problem Spec

Given:

  • pattern string pattern
  • text string text

Find:

  • number x such that
    text[x] matches pattern until the end

Solution Part 1 (Bad Prefix Move)

We try to find the matched position by aligning two char arrays.

So we need a pointer align_head which vertically aligns two arrays (conceptually)

We try to align from text[0] and then shift the pointer to later positions if cannot match the whole pattern.

If we eventually shift to the rightmost position, which is text[text.length - 1],

we know the last char in text string that aligns with pattern string is text[align_head + pattern.length - 1]

we got

align_head + pattern.length - 1 = text.length - 1 => align_head = text.length - pattern.length

Formula to move the align head

0 <= align_head <= text.length - pattern.length

Now we got a scope of our loop

Next question, given a fixed align head, how do we perform the matching?

We start comparison from the last char, and go backward one by one.

We call pointer of the last matched char as align_offset

When all chars from right to left match the pattern string. Bingo!

Once a un-match found, we want to shift left the align_head by some steps.

If the unmatched char in text is IN the pattern string, we will right shift the align_head such that this char will vertically aligned.

```
$unmatched_char = text[align_head+align_offset]
$unmatched_char_pos_in_pattern_str = getIndex(pattern, $unmatched_char)
new_align_head+align_offset === $unmatched_char_pos_in_pattern_str
new_align_head = align_offset - $unmatched_char_pos_in_pattern_str
```

So we left shift `align_head` by this: `new_align_head = align_offset - $unmatched_char_pos_in_pattern_str` (Caution that if this is negative, we end up left-shifting, which we will not do)

Otherwise, we just set new_align_head = align_head + align_offset + 1 (simply jump over the unmatched string cuz you can no longer match the pattern until you pass over this unmatched char)

Now, put this into code


function bmSearch(text: string, pattern: string) : number {

    const char_to_index_in_pattern_mapping = {}
    pattern.forEach((char, index) => {
        char_to_index_in_pattern_mapping[char]: index
    })

    const initial_align_head = 0
    return tryMatch(initial_align_head)

    function tryMatch(fixed_align_head: number): number {

        if (fixed_align_head > text.length - pattern.length) {
            const none_match = -1
            return none_match
        }

        const unmatched_char_offset = compareAlignTail(0)

        const fully_match = unmatched_char_offset < 0

        return fully_match
        ? fixed_align_head
        : tryMatch(
            shiftAlignHead(unmatched_char_offset)
        )

        function compareAlignTail(align_offset: number): number {

            if (align_offset < 0) {
                return align_offset
            }

            return text[fixed_align_head + align_offset] === pattern[0 + align_offset]
            ? compareAlignTail(align_offset - 1)
            : align_offset
        }

        function shiftAlignHead(offset_in_text: number): number {
            const unmatched_char = text[fixed_align_head + offset_in_text]
            const offset_in_pattern = char_to_index_in_pattern_mapping[unmatched_char]
            const shifted_align_head_to_skip_over = fixed_align_head + offset_in_text + 1
            const shifted_align_head_to_aligned_matched_char = fixed_align_head + (offset_in_text - offset_in_pattern)

            if (offset_in_pattern < 0 && shifted_align_head_to_aligned_matched_char > 0) {
                return shifted_align_head_to_aligned_matched_char
            } else {
                return offset_in_pattern
            }
        }
    }
}

你可能感兴趣的:(Algorithms: think in Functional Programming way - Boyer Moore Pattern Matching Part 1)