Boyer-Moore search of a list for a sub-list in Python

I had the need to search a list for a sub-list in Python, rather like std::search() in C++, but there isn’t a built-in equivalent in Python. A naive way of implementing it would be to try to match the sub-list against every position in the list, but that wouldn’t be very efficient. An efficient algorithm for string searching is the Boyer-Moore algorithm, and this can be adapted to search a list.

Below is an implementation of Boyer-Moore for searching a list in Python. It’s based on the Wikipedia article. The function search() takes a list to search (haystack), and a sub-list to search for (needle), and returns the starting index of needle, or -1 if it isn’t found.

def search(haystack, needle):
"""
Search list haystack for sub-list needle.
"""
if len(needle) == 0:
return 0
char_table = make_char_table(needle)
offset_table = make_offset_table(needle)
i = len(needle) - 1
while i < len(haystack):
j = len(needle) - 1
while needle[j] == haystack[i]:
if j == 0:
return i
i -= 1
j -= 1
i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
return -1

def make_char_table(needle):
"""
Makes the jump table based on the mismatched character information.
"""
table = {}
for i in range(len(needle) - 1):
table[needle[i]] = len(needle) - 1 - i
return table

def make_offset_table(needle):
"""
Makes the jump table based on the scan offset in which mismatch occurs.
"""
table = []
last_prefix_position = len(needle)
for i in reversed(range(len(needle))):
if is_prefix(needle, i + 1):
last_prefix_position = i + 1
table.append(last_prefix_position - i + len(needle) - 1)
for i in range(len(needle) - 1):
slen = suffix_length(needle, i)
table[slen] = len(needle) - 1 - i + slen
return table

def is_prefix(needle, p):
"""
Is needle[p:end] a prefix of needle?
"""
j = 0
for i in range(p, len(needle)):
if needle[i] != needle[j]:
return 0
j += 1
return 1

def suffix_length(needle, p):
"""
Returns the maximum length of the substring ending at p that is a suffix.
"""
length = 0;
j = len(needle) - 1
for i in reversed(range(p + 1)):
if needle[i] == needle[j]:
length += 1
else:
break
j -= 1
return length



An example program:

def main():
haystack = [0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1]
needle = [0, 0, 1]
index = search(haystack, needle)
print(index)

if __name__ == '__main__':
main()


Output:

4