Greedy Set Cover in Python

In the Set Cover problem, we are given a universal set \(U\) and a family of subsets \(S_1, \ldots, S_k \subseteq U\). A set cover is a collection of subsets \(C\) from \(S_1, \ldots, S_k\) whose union is the universal set \(U\). We would like to minimise \(\left\vert{C}\right\vert\).

The problem of finding the optimum \(C\) is NP-Complete, but a greedy algorithm can give an \(O(log_e n)\) approximation to optimal solution.

The greedy algorithm selects the set \(S_i\) containing the largest number of uncovered points at each step, until all of the points have been covered.

Below is an implementation in Python:

def set_cover(universe, subsets):
    """Find a family of subsets that covers the universal set"""
    elements = set(e for s in subsets for e in s)
    # Check the subsets cover the universe
    if elements != universe:
        return None
    covered = set()
    cover = []
    # Greedily add the subsets with the most uncovered points
    while covered != elements:
        subset = max(subsets, key=lambda s: len(s - covered))
        cover.append(subset)
        covered |= subset

    return cover

def main():
    universe = set(range(1, 11))
    subsets = [set([1, 2, 3, 8, 9, 10]),
        set([1, 2, 3, 4, 5]),
        set([4, 5, 7]),
        set([5, 6, 7]),
        set([6, 7, 8, 9, 10])]
    cover = set_cover(universe, subsets)
    print(cover)

if __name__ == '__main__':
    main()

Output:

[set([1, 2, 3, 8, 9, 10]), set([4, 5, 7]), set([5, 6, 7])]