# Combinatorial algorithms

Combinatorial algorithms are algorithms that deal with combinatorial structures, which are sets, ordered n-tuples, and any structures that can be built from them, like graphs.

Combinatorial algorithms include algorithms for:

• Generation: List all structures of a given type, such as combinations and permutations, connected components of a graph
• Search: Find at least one structure with a given property
• Optimisation and approximation algorithms can be used to solve search problems
• Optimisation methods for search problems include exhaustive search, backtracking, branch and bound, and dynamic programming
• Approximation methods include greedy algorithms

# Spanning forest of a graph in C

If a graph isn’t connected, there isn’t a spanning tree that covers all of the vertices. We can, however, construct a spanning forest, which is a set of spanning trees, one for each connected component in the graph. This is rather similar to finding connected components, except that in a spanning forest the components are represented by a set of edges, rather than a set of vertices. Any vertices in the graph that are entirely unconnected will not appear in the spanning forest.

To find the spanning forest, all we need to do is to use the depth-first search algorithm for finding a spanning tree repeatedly, starting at each unvisited vertex in turn. Once all vertices that appear in edges have been visited, the spanning forest is complete.

Below is an implementation in C. The function spanning_forest() takes a graph in edge list format, the number of edges (size), the number of vertices (order), and a callback function that is called with each spanning tree found. It reuses the spanning_tree_recursive() function from the spanning tree algorithm to find each spanning tree.

#include <stdlib.h>

typedef struct {
unsigned int first;
unsigned int second;
} edge;

typedef void (*treefn)(const unsigned int *, size_t, const edge *, size_t);

void spanning_tree_recursive(const edge *edges, unsigned int size,
unsigned int order, unsigned int *visited, unsigned int *tree,
unsigned int vertex, int edge, unsigned int *len)
{
unsigned int e;
visited[vertex] = 1;
if (edge != -1) {
tree[(*len)++] = edge;
}
for (e = 0; e < size; e++) {
if (edges[e].first == vertex || edges[e].second == vertex) {
unsigned int neighbour = edges[e].first == vertex ?
edges[e].second : edges[e].first;
if (!visited[neighbour]) {
spanning_tree_recursive(edges, size, order, visited, tree,
neighbour, e, len);
}
}
}
}

void spanning_forest(const edge *edges, unsigned int size, unsigned int order,
treefn fun)
{
unsigned int *visited = calloc(order, sizeof(unsigned int));
unsigned int *tree = malloc((order - 1) * sizeof(unsigned int));
unsigned int len, v;
if (visited == NULL || tree == NULL) {
free(visited);
free(tree);
return;
}
for (v = 0; v < order; v++) {
if (!visited[v]) {
len = 0;
spanning_tree_recursive(edges, size, order, visited, tree, v, -1, &len);
if (len > 0) {
fun(tree, len, edges, size);
}
}
}
free(visited);
free(tree);
}


Here is an example program that finds the spanning forest of the graph shown at the top.

#include <stdio.h>
#include <stdlib.h>

/* Connect two edges */
void edge_connect(edge *edges, unsigned int first, unsigned int second,
unsigned int *pos)
{
edges[*pos].first = first;
edges[*pos].second = second;
(*pos)++;
}

void print(const unsigned int *tree, size_t tree_size, const edge *edges, size_t size)
{
unsigned int e;
for (e = 0; e < tree_size; e++) {
printf("(%u, %u) ", edges[tree[e]].first, edges[tree[e]].second);
}
putchar('\n');
}

int main(void)
{
const unsigned int order = 9; /* Vertices */
const unsigned int size = 8; /* Edges */
edge *edges;

edges = malloc(size * sizeof(edge));
if (edges == NULL) {
return 1;
}

/* Square */
edges[0].first = 0;
edges[0].second = 1;
edges[1].first = 1;
edges[1].second = 2;
edges[2].first = 2;
edges[2].second = 3;
edges[3].first = 3;
edges[3].second = 0;

/* Triangle */
edges[4].first = 4;
edges[4].second = 5;
edges[5].first = 5;
edges[5].second = 6;
edges[6].first = 6;
edges[6].second = 4;

/* Line */
edges[7].first = 7;
edges[7].second = 8;

spanning_forest(edges, size, order, print);

free(edges);
return 0;
}


The output:

(0, 1) (1, 2) (2, 3)
(4, 5) (5, 6)
(7, 8)


# Spanning tree of a graph in C

In a previous post I showed an algorithm to find all spanning trees in a graph. A simpler problem is just to find a single spanning tree. This can be solved using a depth-first search. We simply need to record, for each vertex we visit, the edge by which we reached it.

Below is an implementation in C. The function spanning_tree() takes a graph in edge list format, the number of edges (size), the number of vertices (order), and the address of a pointer to which to assign the spanning tree. The spanning tree is in the form of an array of edge indices. The function returns the number of edges in this array, which will be order – 1 if the graph is connected. If the graph is not connected, the function will return a spanning tree of the component containing vertex 0, and the returned size will be correspondingly smaller.

#include <stdlib.h>

typedef struct {
unsigned int first;
unsigned int second;
} edge;

void spanning_tree_recursive(const edge *edges, unsigned int size,
unsigned int order, unsigned int *visited, unsigned int *tree,
unsigned int vertex, int edge, unsigned int *len)
{
unsigned int e;
visited[vertex] = 1;
if (edge != -1) {
tree[(*len)++] = edge;
}
for (e = 0; e < size; e++) {
if (edges[e].first == vertex || edges[e].second == vertex) {
unsigned int neighbour = edges[e].first == vertex ?
edges[e].second : edges[e].first;
if (!visited[neighbour]) {
spanning_tree_recursive(edges, size, order, visited, tree,
neighbour, e, len);
}
}
}
}

unsigned int spanning_tree(const edge *edges, unsigned int size, unsigned int order,
unsigned int **tree)
{
unsigned int *visited = calloc(order, sizeof(unsigned int));
*tree = malloc((order - 1) * sizeof(unsigned int));
unsigned int len = 0;
if (visited == NULL || *tree == NULL) {
free(visited);
free(*tree);
*tree = NULL;
return 0;
}
spanning_tree_recursive(edges, size, order, visited, *tree, 0, -1, &len);
free(visited);
return len;
}


Here is an example program that finds a spanning tree of the complete graph on 5 vertices:

/* Calculate the nth triangular number T(n) */
unsigned int triangular_number(unsigned int n)
{
return (n * (n + 1)) / 2;
}

/* Construct a complete graph on v vertices */
unsigned int complete_graph(unsigned int v, edge **edges)
{
unsigned int n = triangular_number(v - 1);
unsigned int i, j, k;
*edges = malloc(n * sizeof(edge));
if (edges != NULL) {
for (i = 0, k = 0; i < v - 1; i++) {
for (j = i + 1; j < v; j++) {
(*edges)[k].first = i;
(*edges)[k].second = j;
k++;
}
}
}
else {
n = 0;
}
return n;
}

int main(void)
{
edge *edges;
const unsigned int order = 5; /* Vertices */
const unsigned int size = complete_graph(5, &edges); /* Edges */
unsigned int *tree;
unsigned int tree_size = spanning_tree(edges, size, order, &tree);
unsigned int e;
for (e = 0; e < tree_size; e++) {
printf("(%u, %u) ", edges[tree[e]].first, edges[tree[e]].second);
}
putchar('\n');
free(edges);
free(tree);
return 0;
}


The output:

(0, 1) (1, 2) (2, 3) (3, 4)


# Recursive integer partitions in C

Here is a recursive algorithm to generate integer partitions in antilexicographic order. The function partitions() takes an integer to partition, and a callback function to call for each partition found.

#include <stdlib.h>

unsigned int min(int a, int b)
{
return a < b ? a : b;
}

typedef void (*partitionfn)(const unsigned int *, size_t);

static void partitions_recursive(unsigned int n, unsigned int maximum, unsigned int *partition,
size_t length, partitionfn fun)
{
unsigned int i;
if (n == 0) {
fun(partition, length);
}
for (i = min(maximum, n); i >= 1; i--) {
partition[length] = i;
partitions_recursive(n - i, i, partition, length + 1, fun);
}
}

void partitions(unsigned int n, partitionfn fun)
{
unsigned int *partition = malloc(n * sizeof(unsigned int));
if (partition) {
partitions_recursive(n, n, partition, 0, fun);
free(partition);
}
}


An example program:

#include <stdio.h>

void print(const unsigned int *partition, size_t length)
{
unsigned int i;
for (i = 0; i < length; i++) {
printf("%u ", partition[i]);
}
putchar('\n');
}

int main(void)
{
partitions(6, print);
return 0;
}


The output:

6
5 1
4 2
4 1 1
3 3
3 2 1
3 1 1 1
2 2 2
2 2 1 1
2 1 1 1 1
1 1 1 1 1 1


Reference: Partition.java

# Levenshtein distance in C

The Levenshtein distance is a measure of the similarity of two strings. It is defined as the minimum number of insertions, deletions, and substitutions necessary to transform the first string into the second.

For example, the Levenshtein distance between “CHALK” and “CHEESE” is 4, as follows:

1. Substitute E for A
2. Substitute E for L
3. Substitute S for K
4. Insert E

The Levenshtein distance can be calculated using a dynamic programming algorithm due to Wagner and Fischer. The algorithm fills a table with the distance between all of the prefixes of the two strings, the final entry being the distance between the two entire strings.

Below is an implementation in C. I have recorded in the matrix cells information about the edits and the previous cell from which each cell was derived to enable tracing back the sequence of edits – the edit script. The function levenshtein_distance() takes two strings, and the address of a pointer to which to assign an array containing the edit script. It returns the Levenshtein distance.

#include <string.h>
#include <stdlib.h>

typedef enum {
INSERTION,
DELETION,
SUBSTITUTION,
NONE
} edit_type;

struct edit {
unsigned int score;
edit_type type;
char arg1;
char arg2;
unsigned int pos;
struct edit *prev;
};
typedef struct edit edit;

static int min3(int a, int b, int c)
{
if (a < b && a < c) {
return a;
}
if (b < a && b < c) {
return b;
}
return c;
}

static unsigned int levenshtein_matrix_calculate(edit **mat, const char *str1, size_t len1,
const char *str2, size_t len2)
{
unsigned int i, j;
for (j = 1; j <= len2; j++) {
for (i = 1; i <= len1; i++) {
unsigned int substitution_cost;
unsigned int del = 0, ins = 0, subst = 0;
unsigned int best;
if (str1[i - 1] == str2[j - 1]) {
substitution_cost = 0;
}
else {
substitution_cost = 1;
}
del = mat[i - 1][j].score + 1; /* deletion */
ins = mat[i][j - 1].score + 1; /* insertion */
subst = mat[i - 1][j - 1].score + substitution_cost; /* substitution */
best = min3(del, ins, subst);
mat[i][j].score = best;
mat[i][j].arg1 = str1[i - 1];
mat[i][j].arg2 = str2[j - 1];
mat[i][j].pos = i - 1;
if (best == del) {
mat[i][j].type = DELETION;
mat[i][j].prev = &mat[i - 1][j];
}
else if (best == ins) {
mat[i][j].type = INSERTION;
mat[i][j].prev = &mat[i][j - 1];
}
else {
if (substitution_cost > 0) {
mat[i][j].type = SUBSTITUTION;
}
else {
mat[i][j].type = NONE;
}
mat[i][j].prev = &mat[i - 1][j - 1];
}
}
}
return mat[len1][len2].score;
}

static edit **levenshtein_matrix_create(const char *str1, size_t len1, const char *str2,
size_t len2)
{
unsigned int i, j;
edit **mat = malloc((len1 + 1) * sizeof(edit *));
if (mat == NULL) {
return NULL;
}
for (i = 0; i <= len1; i++) {
mat[i] = malloc((len2 + 1) * sizeof(edit));
if (mat[i] == NULL) {
for (j = 0; j < i; j++) {
free(mat[j]);
}
free(mat);
return NULL;
}
}
for (i = 0; i <= len1; i++) {
mat[i][0].score = i;
mat[i][0].prev = NULL;
mat[i][0].arg1 = 0;
mat[i][0].arg2 = 0;
}

for (j = 0; j <= len2; j++) {
mat[0][j].score = j;
mat[0][j].prev = NULL;
mat[0][j].arg1 = 0;
mat[0][j].arg2 = 0;
}
return mat;
}

unsigned int levenshtein_distance(const char *str1, const char *str2, edit **script)
{
const size_t len1 = strlen(str1), len2 = strlen(str2);
unsigned int i, distance;

/* If either string is empty, the distance is the other string's length */
if (len1 == 0) {
return len2;
}
if (len2 == 0) {
return len1;
}
/* Initialise the matrix */
mat = levenshtein_matrix_create(str1, len1, str2, len2);
if (!mat) {
*script = NULL;
return 0;
}
/* Main algorithm */
distance = levenshtein_matrix_calculate(mat, str1, len1, str2, len2);
/* Read back the edit script */
*script = malloc(distance * sizeof(edit));
if (*script) {
i = distance - 1;
i--;
}
}
}
else {
distance = 0;
}
/* Clean up */
for (i = 0; i <= len1; i++) {
free(mat[i]);
}
free(mat);

return distance;


Here is an example program that finds the distance from CHALK to CHEESE and prints the edit script:

#include <stdlib.h>
#include <stdio.h>

void print(const edit *e)
{
if (e->type == INSERTION) {
printf("Insert %c", e->arg2);
}
else if (e->type == DELETION) {
printf("Delete %c", e->arg1);
}
else {
printf("Substitute %c for %c", e->arg2, e->arg1);
}
printf(" at %u\n", e->pos);
}

int main(void)
{
edit *script;
unsigned int distance;
unsigned int i;
const char str1[] = "CHALK";
const char str2[] = "CHEESE";
distance = levenshtein_distance(str1, str2, &script);
printf("Distance is %d:\n", distance);
for (i = 0; i < distance; i++) {
print(&script[i]);
}
free(script);
return 0;
}


The output:

Distance is 4:
Substitute E for A at 2
Substitute E for L at 3
Substitute S for K at 4
Insert E at 4


# Permutation cycles in C

A cycle of a permutation is a subset of the elements that replace one another in sequence, until the last element is replaced by the first. For example, consider the permutation below:

$$\sigma=\begin{pmatrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 3 & 7 & 0 & 4 & 8 & 2 & 6 & 5\end{pmatrix}$$

We can find the cycles:
$$0 \rightarrow 1, 1 \rightarrow 3, 3 \rightarrow 0$$
$$2 \rightarrow 7, 7 \rightarrow 6, 6 \rightarrow 2$$
$$4 \rightarrow 4$$
$$5 \rightarrow 8, 8 \rightarrow 5$$

These can be written as:
$$(0, 1, 3)(2, 7, 6)(4)(5, 8)$$

It’s customary to omit cycles of length 1, so this would usually be written as
$$(0, 1, 3)(2, 7, 6)(5, 8).$$

To find the cycle decomposition of a permutation, we can use an algorithm that is very similar to depth-first search (DFS) on a graph. We begin a new search for each unvisited vertex (number), and visit its neighbour (image in the permutation) until we get back to the first vertex again.

Below is an implementation in C. The function permutation_cycles() takes a permutation in the form of integers starting from 0, its length, and a callback function that is called for each cycle found.

#include <stdlib.h>

typedef void(*cyclefn)(const unsigned int *, size_t);

void permutation_cycles_recursive(const unsigned int *permutation, unsigned int *visited,
unsigned int start, unsigned int current, unsigned int *path,
size_t length, cyclefn fun)
{
visited[current] = 1;
path[length] = current;
if (start == current && length > 0) {
fun(path, length);
}
else {
permutation_cycles_recursive(permutation, visited, start, permutation[current],
path, length + 1, fun);
}
}

void permutation_cycles(const unsigned int *permutation, size_t n, cyclefn fun)
{
unsigned int i;
unsigned int *visited = calloc(n, sizeof(unsigned int));
unsigned int *path = malloc(n * sizeof(unsigned int));
if (!(visited && path)) {
free(visited);
free(path);
return;
}
for (i = 0; i < n; i++) {
if (!visited[i]) {
permutation_cycles_recursive(permutation, visited, i, i,
path, 0, fun);
}
}
free(visited);
free(path);
}


Here is an example program that finds the cycle decomposition of the permutation shown above:

#include <stdio.h>

void print(const unsigned int *cycle, size_t length)
{
if (length > 1) {
unsigned int i;
putchar('(');
for (i = 0; i < length; i++) {
printf("%u", cycle[i]);
if (i < length - 1) {
printf(", ");
}
}
putchar(')');
}
}

int main(void)
{
unsigned int permutation[] = {1, 3, 7, 0, 4, 8, 2, 6, 5};
const size_t n = sizeof(permutation) / sizeof(permutation[0]);
permutation_cycles(permutation, n, print);
putchar('\n');
return 0;
}


The output:

(0, 1, 3)(2, 7, 6)(5, 8)


# Permutations in lexicographic order in C

This recursive algorithm produces the permutations in the most natural order, and is also the easiest to understand. It uses two buffers, one containing the permutation being built, and another for the remaining, unused letters.

The algorithm is:

if (there are no more characters to arrange) {
print (the current permutation);
}
else {
choose a remaining character;
remove it from the remaining characters;
use recursion to add the remaining characters to the permutation in the same way;
}


Here it is in C:

#include <stdlib.h>
#include <string.h>

typedef void(*permutationfn)(const char *);

static void permutations_recursive(char *perm, size_t used, char *remaining, size_t len,
permutationfn fun)
{
if (used == len) {
perm[used] = '\0';
fun(perm);
}
else {
const size_t unused = len - used;
unsigned int i;
for (i = 0; i < unused; i++) {
/* Remove character at i from remaining */
char c = remaining[i];
memmove(remaining + i, remaining + i + 1, unused - i);
/* Append it to perm */
perm[used] = c;
permutations_recursive(perm, used + 1, remaining, len, fun);
/* Put character back */
memmove(remaining + i + 1, remaining + i, unused - i);
remaining[i] = c;
}
}
}

void permutations(const char *str, permutationfn fun)
{
const size_t len = strlen(str);
char *perm = malloc(len + 1);
char *remaining = malloc(len + 1);
if (perm && remaining) {
strcpy(remaining, str);
permutations_recursive(perm, 0, remaining, len, fun);
}
free(perm);
free(remaining);


Example program:

#include <stdio.h>

int main(void)
{
char str[] = "ABCD";
permutations(str, (permutationfn)puts);
return 0;
}


The output:

ABCD
ABDC
ACBD
ACDB
BACD
BCDA
BDAC
BDCA
CABD
CBDA
CDAB
CDBA
DABC
DACB
DBAC
DBCA
DCAB
DCBA


Reference: Exhaustive recursion and backtracking

# Permutations in transposition order in C

This algorithm is similar to the direct insertion order permutation algorithm, except that instead of inserting a new element into the spaces between existing elements, the new element is inserted into a space already occupied by an existing element or at the end. If the space is already occupied, the existing element is bumped to the end of the permutation.

#include <string.h>
#include <stdlib.h>

typedef void(*permutationfn)(const char *);

static void permutations_recursive(const char *str, char *perm, unsigned int level, size_t len,
permutationfn fun)
{
if (level == len) {
perm[level] = '\0';
fun(perm);
}
else {
/* Insert character str[level] in every position */
unsigned int i;
for (i = 0; i <= level; i++) {
if (level > 0 && i < level) {
/* Bump the existing element to the end */
perm[level] = perm[i];
}
perm[i] = str[level];
permutations_recursive(str, perm, level + 1, len, fun);
if (level > 0 && i < level) {
/* Move the existing element back */
perm[i] = perm[level];
}
}
}
}

void permutations(const char *str, permutationfn fun)
{
char *perm = malloc(strlen(str) + 1);
permutations_recursive(str, perm, 0, strlen(str), fun);
free(perm);
}


Example program:

#include <stdio.h>

int main(void)
{
char str[] = "ABCD";
permutations(str, (permutationfn)puts);
return 0;
}


The output:

DABC
CDBA
CABD
DCAB
BDAC
BCDA
DACB
BDCA
BACD
DBAC
CDAB
CBDA
DCBA
ACDB
ACBD
DBCA
ABDC
ABCD


Reference: Decision Trees

# Permutations in direct insertion order in C

To form a permutation in direct insertion order, we take an existing partial permutation containing $$k$$ elements, and then insert the next element into each of the $$k+1$$ possible places immediately before an element or at the end. So for example if we start with the permutation “A”, we can insert “B” before the “A” or at the end, producing “BA” and “AB”. Taking “BA”, we can then insert “C” at the beginning, between the “B” and the “A”, or at the end, producing “CBA”, “BCA”, “BAC”. The process continues until every element has been added.

Below is an implementation in C. I used a recursive function that takes a callback to process each permutation as it is found. Notice the use of memmove() to move existing elements out of the way, and then to move them back after the recursive call. We can’t use memcpy() for this, because the source and destination ranges overlap.

typedef void(*permutationfn)(const char *);

static void permutations_recursive(const char *str, char *perm, unsigned int level, size_t len,
permutationfn fun)
{
if (level == len) {
perm[level] = '\0';
fun(perm);
}
else {
/* Insert character str[level] in every position */
unsigned int i;
for (i = 0; i <= level; i++) {
if (level > 0 && i < level) {
/* Move the existing elements out of the way */
memmove(perm + i + 1, perm + i, level - i);
}
perm[i] = str[level];
permutations_recursive(str, perm, level + 1, len, fun);
if (level > 0 && i < level) {
/* Move the existing elements back */
memmove(perm + i, perm + i + 1, len - i);
}
}
}
}

void permutations(const char *str, permutationfn fun)
{
char *perm = malloc(strlen(str) + 1);
permutations_recursive(str, perm, 0, strlen(str), fun);
free(perm);
}


An example program:

int main(void)
{
char str[] = "ABCD";
permutations(str, (permutationfn)puts);
return 0;
}


The output:

DCBA
CDBA
CBDA
DBCA
BDCA
BCDA
DBAC
BDAC
BACD
DCAB
CDAB
CABD
DACB
ACDB
ACBD
DABC
ABDC
ABCD


Reference: Decision Trees

# Greedy partition in C

In a previous post, I showed a dynamic programming algorithm that gives an exact solution to the partition problem. It’s also possible to get an approximate solution using a greedy algorithm.

The algorithm is:

1. Sort the elements of the set in decreasing order
2. For each element in the set:
1. If the sum of the first subset is greater, put the element in the second set
2. Else, put it in the first set

You can see intuitively that this is going to make the sets as close as possible to having the same sum without using an exact algorithm. Below is an implementation in C. The function partition() takes an array of unsigned integers and its size, and returns an array of unsigned integers which are 0 if the corresponding element is in the first set of the partition, and 1 if it is in the second set. Note that the indices of this set correspond to the array of numbers after having been sorted.

#include <stdlib.h>

int compare_unsigned_ints_decreasing(const unsigned int *p1, const unsigned int *p2)
{
if (*p1 > *p2) {
return -1;
}
if (*p1 < *p2) {
return 1;
}
return 0;
}

unsigned int *partition(unsigned int *numbers, size_t n)
{
unsigned int i, sum0 = 0, sum1 = 0;
unsigned int *solution = calloc(n, sizeof(unsigned int));
if (solution == NULL) {
return NULL;
}
qsort(numbers, n, sizeof(unsigned int),
(int(*)(const void *, const void *))compare_unsigned_ints_decreasing);
for (i = 0; i < n; i++) {
if (sum1 < sum0) {
solution[i] = 1;
sum1 += numbers[i];
}
else {
sum0 += numbers[i];
}
}
return solution;
}


Here is an example program that partitions the set of numbers from 1 to 7:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
unsigned int numbers[] = {1, 2, 3, 4, 5, 6, 7};
const size_t len = sizeof(numbers) / sizeof(numbers[0]);
unsigned int i;
unsigned int *solution = partition(numbers, len);
printf("First set:\n");
for (i = 0; i < len; i++) {
if (solution[i] == 0) {
printf("%u\n", numbers[i]);
}
}
printf("Second set:\n");
for (i = 0; i < len; i++) {
if (solution[i] == 1) {
printf("%u\n", numbers[i]);
}
}
putchar('\n');
free(solution);
return 0;
}


Here is the output. In this case, the partition divides the set perfectly into two sets of sum 14, but that won’t generally be the case for this algorithm as it is an approximate one.

First set:
7
4
3
Second set:
6
5
2
1