Problem Spec
Given:
- pattern string
pattern
- text string
text
Find:
- number x such that
text[x]
matchespattern
until the end
Solution Part 1 (Bad Prefix Move)
We try to find the matched position by aligning two char arrays.
So we need a pointer align_head
which vertically aligns two arrays (conceptually)
We try to align from text[0]
and then shift the pointer to later positions if cannot match the whole pattern.
If we eventually shift to the rightmost position, which is text[text.length - 1]
,
we know the last char in text string that aligns with pattern string is text[align_head + pattern.length - 1]
we got
align_head + pattern.length - 1 = text.length - 1
=> align_head = text.length - pattern.length
Formula to move the align head
0 <= align_head <= text.length - pattern.length
Now we got a scope of our loop
Next question, given a fixed align head, how do we perform the matching?
We start comparison from the last char, and go backward one by one.
We call pointer of the last matched char as align_offset
When all chars from right to left match the pattern string. Bingo!
Once a un-match found, we want to shift left the align_head
by some steps.
If the unmatched char in text is IN the pattern string, we will right shift the align_head
such that this char will vertically aligned.
```
$unmatched_char = text[align_head+align_offset]
$unmatched_char_pos_in_pattern_str = getIndex(pattern, $unmatched_char)
new_align_head+align_offset === $unmatched_char_pos_in_pattern_str
new_align_head = align_offset - $unmatched_char_pos_in_pattern_str
```
So we left shift `align_head` by this: `new_align_head = align_offset - $unmatched_char_pos_in_pattern_str` (Caution that if this is negative, we end up left-shifting, which we will not do)
Otherwise, we just set new_align_head = align_head + align_offset + 1
(simply jump over the unmatched string cuz you can no longer match the pattern until you pass over this unmatched char)
Now, put this into code
function bmSearch(text: string, pattern: string) : number {
const char_to_index_in_pattern_mapping = {}
pattern.forEach((char, index) => {
char_to_index_in_pattern_mapping[char]: index
})
const initial_align_head = 0
return tryMatch(initial_align_head)
function tryMatch(fixed_align_head: number): number {
if (fixed_align_head > text.length - pattern.length) {
const none_match = -1
return none_match
}
const unmatched_char_offset = compareAlignTail(0)
const fully_match = unmatched_char_offset < 0
return fully_match
? fixed_align_head
: tryMatch(
shiftAlignHead(unmatched_char_offset)
)
function compareAlignTail(align_offset: number): number {
if (align_offset < 0) {
return align_offset
}
return text[fixed_align_head + align_offset] === pattern[0 + align_offset]
? compareAlignTail(align_offset - 1)
: align_offset
}
function shiftAlignHead(offset_in_text: number): number {
const unmatched_char = text[fixed_align_head + offset_in_text]
const offset_in_pattern = char_to_index_in_pattern_mapping[unmatched_char]
const shifted_align_head_to_skip_over = fixed_align_head + offset_in_text + 1
const shifted_align_head_to_aligned_matched_char = fixed_align_head + (offset_in_text - offset_in_pattern)
if (offset_in_pattern < 0 && shifted_align_head_to_aligned_matched_char > 0) {
return shifted_align_head_to_aligned_matched_char
} else {
return offset_in_pattern
}
}
}
}