# Greedy Set Cover in Python

In the Set Cover problem, we are given a universal set $$U$$ and a family of subsets $$S_1, \ldots, S_k \subseteq U$$. A set cover is a collection of subsets $$C$$ from $$S_1, \ldots, S_k$$ whose union is the universal set $$U$$. We would like to minimise $$\left\vert{C}\right\vert$$.

The problem of finding the optimum $$C$$ is NP-Complete, but a greedy algorithm can give an $$O(log_e n)$$ approximation to optimal solution.

The greedy algorithm selects the set $$S_i$$ containing the largest number of uncovered points at each step, until all of the points have been covered.

Below is an implementation in Python:

def set_cover(universe, subsets):
"""Find a family of subsets that covers the universal set"""
elements = set(e for s in subsets for e in s)
# Check the subsets cover the universe
if elements != universe:
return None
covered = set()
cover = []
# Greedily add the subsets with the most uncovered points
while covered != elements:
subset = max(subsets, key=lambda s: len(s - covered))
cover.append(subset)
covered |= subset

return cover

def main():
universe = set(range(1, 11))
subsets = [set([1, 2, 3, 8, 9, 10]),
set([1, 2, 3, 4, 5]),
set([4, 5, 7]),
set([5, 6, 7]),
set([6, 7, 8, 9, 10])]
cover = set_cover(universe, subsets)
print(cover)

if __name__ == '__main__':
main()


Output:

[set([1, 2, 3, 8, 9, 10]), set([4, 5, 7]), set([5, 6, 7])]