I had the need to search a list for a sub-list in Python, rather like std::search()
in C++, but there isn’t a built-in equivalent in Python. A naive way of implementing it would be to try to match the sub-list against every position in the list, but that wouldn’t be very efficient. An efficient algorithm for string searching is the Boyer-Moore algorithm, and this can be adapted to search a list.
Below is an implementation of Boyer-Moore for searching a list in Python. It’s based on the Wikipedia article. The function search()
takes a list to search (haystack
), and a sub-list to search for (needle
), and returns the starting index of needle
, or -1 if it isn’t found.
def search(haystack, needle): """ Search list `haystack` for sub-list `needle`. """ if len(needle) == 0: return 0 char_table = make_char_table(needle) offset_table = make_offset_table(needle) i = len(needle) - 1 while i < len(haystack): j = len(needle) - 1 while needle[j] == haystack[i]: if j == 0: return i i -= 1 j -= 1 i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i])); return -1 def make_char_table(needle): """ Makes the jump table based on the mismatched character information. """ table = {} for i in range(len(needle) - 1): table[needle[i]] = len(needle) - 1 - i return table def make_offset_table(needle): """ Makes the jump table based on the scan offset in which mismatch occurs. """ table = [] last_prefix_position = len(needle) for i in reversed(range(len(needle))): if is_prefix(needle, i + 1): last_prefix_position = i + 1 table.append(last_prefix_position - i + len(needle) - 1) for i in range(len(needle) - 1): slen = suffix_length(needle, i) table[slen] = len(needle) - 1 - i + slen return table def is_prefix(needle, p): """ Is needle[p:end] a prefix of needle? """ j = 0 for i in range(p, len(needle)): if needle[i] != needle[j]: return 0 j += 1 return 1 def suffix_length(needle, p): """ Returns the maximum length of the substring ending at p that is a suffix. """ length = 0; j = len(needle) - 1 for i in reversed(range(p + 1)): if needle[i] == needle[j]: length += 1 else: break j -= 1 return length
An example program:
def main(): haystack = [0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1] needle = [0, 0, 1] index = search(haystack, needle) print(index) if __name__ == '__main__': main()
Output:
4