Bin packing

Introduction

The bin packing problem is a classic problem with a long history. It’s one of the earliest problems shown to be intractable. The problem lends itself to simple algorithms that need clever analysis.
This post contains a number of classic approximate bin packing algorithms, showing their implementation in C and examples of the results they produce.

The problem is to find the minimum number k of identical bins of capacity C needed to store a finite collection of weights w1, w2, w3, … , wn so that no bin has weights stored in it whose sum exceeds the bin’s capacity. Traditionally, C is 1, and the weights wn are real numbers, but for this article I’m going to use a positive integer C, and the weights are positive integers less than C.

An important consideration in bin packing is whether we need to pack the items in a fixed order (in a real world application, the order in which they arrive), or if we are able to rearrange them in the hope of achieving a better packing. Algorithms for solving the first case are called online algorithms, while those for the second are offline algorithms. Offline algorithms will require additional storage for holding the rearranged items.

The rest of this post is a collection of online and offline algorithms for solving the bin packing problem. Each algorithm is in the form of a function with a prototype like this:

unsigned int fit(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)

The arguments are as follows:
binsize: This is C, the capacity of the bins.
sizes: These are the weights of the individual items, as a sequence of integers.
bins: This is an array of the same size as sizes that the caller passes in. The packing function will populate it with the 0-based bin number of the bin that has been used to store the corresponding item in the sizes array. Algorithms may rearrange the items in the sizes array, but they ensure that the bin numbers in the bins array still correspond.
numitems: The size of the sizes and bins arrays.

The return value is the number of bins used.

For the examples, C, the capacity of the bins, is 7, and we want to pack items with the following weights: 1, 4, 9, 4, 1, 5, 8, 3, 2, 5, 7, 3, 2, 6.

A typical test program will look like this:

#include <stdio.h>

#include <binpack.h>

int main(void)
{
    unsigned int sizes[] = {1, 4, 9, 4, 1, 5, 8, 3, 2, 5, 7, 3, 2, 6};
    const unsigned int numitems = 10;
    unsigned int bins[numitems];
    const unsigned int binsize = 7;
    unsigned int item;
    unsigned int bins_used = fit(binsize, sizes, bins, numitems);
    printf("%d bins were used\n", bins_used);
    for (item = 0; item < numitems; item++) {
        printf("Item #%d (size %d) is in bin %d\n", item, sizes[item], bins[item]);
    }
    return 0;
}

Next Fit (NF)

Our first bin-packing algorithm is very simple:

  1. Place each item in a single bin until an item will not fit
  2. When an item won’t fit, close that bin and begin another

You can imagine this as being for a real world situation where the bins are shipped off as soon as they cannot take the next item. Since only one bin is in use at once, this requires minimal space for the packing operation.

The Next Fit algorithm can’t use more than 2 × M, the optimal number of bins, since for any two adjacent bins, their combined weights can’t be less than 1, as in that case the contents of the second bin would have been placed in the first. Consequently the combined weight of all of the bins can’t be less than half the total weight, so no more than 2M bins are required.

unsigned int next_fit(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    unsigned int bin = 0;
    unsigned int capacity = binsize;
    unsigned int item;
    for (item = 0; item < numitems; item++) {
        if (sizes[item] > capacity) {
            /* Start a new bin */
            bin++;
            capacity = binsize;
        }
        /* Put item in bin */
        capacity -= sizes[item];
        bins[item] = bin;
    }
    return bin + 1;
}

Here are the results of the Next Fit algorithm:

7 bins were used
Item #0 (size 1) is in bin 0
Item #1 (size 4) is in bin 0
Item #2 (size 9) is in bin 1
Item #3 (size 4) is in bin 2
Item #4 (size 1) is in bin 2
Item #5 (size 5) is in bin 2
Item #6 (size 8) is in bin 3
Item #7 (size 3) is in bin 4
Item #8 (size 2) is in bin 4
Item #9 (size 5) is in bin 4
Item #10 (size 7) is in bin 5
Item #11 (size 3) is in bin 5
Item #12 (size 2) is in bin 6
Item #13 (size 6) is in bin 6

Bin packing by Next Fit (NF) algorithm

Figure 1: Bin packing by Next Fit (NF) algorithm

First Fit (FF)

Looking at the results of Next Fit, it’s clear that we could better fill the bins, and possibly even use fewer bins, if we could keep a bin open when it won’t accommodate the next item, in the hope that a smaller item might come along in future that will fit. This is the basis of First Fit, which is a greedy approximation algorithm:

  1. Try to place the item in the first bin that will accommodate it
  2. If no bin is found, start a new bin

You can easily see that First Fit never uses more than 2 × M, the optimal number of bins, since more than 1 bin can be half full, or otherwise the contents of two half-full bins could be combined. In fact, it has been proved that First Fit never uses more than 1.7M bins.
If we store the bins in a balanced binary search tree like an AVL tree the algorithm can be implemented in O(n log n) time.

First, we need an algorithm to find the first element greater than or equal to a value in an AVL tree:

void *avltree_first_fit(const avltree *tree, const void *data)
{
	void *found = NULL;

	avltreenode *node = tree->root;
	while (!found && node != NULL) {
		int rv = tree->compare(node->data, data);
		if (rv >= 0) {
			found = node->data;
		}
		else {
            node = node->right;
		}
	}
	return found;
}

Then we can use that to find the bin to use in the First Fit algorithm:

bin *find_first_bin(const avltree *tree, unsigned int size)
{
    bin b = {0, size};
    return avltree_first_fit(tree, &b);
}

unsigned int first_fit(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    unsigned int bins_used = 0;
    unsigned int item;
    avltree *tree = avltree_create((avltree_cmpfn)bin_compare);
    if (tree == NULL) {
        return 0;
    }
    for (item = 0; item < numitems; item++) {
        bin *b = find_first_bin(tree, sizes[item]);
        if (b) {
            /* Add to this bin */
            avltree_remove(tree, b);
            bins[item] = b->id;
            bin_use(b, sizes[item]);
            avltree_add(tree, b);
        }
        else {
            /* Create a new bin and add to it */
            b = bin_create(bins_used, binsize);
            bins[item] = bins_used;
            bin_use(b, sizes[item]);
            avltree_add(tree, b);
            bins_used++;
        }
    }
    avltree_for_each(tree, (avltree_forfn)bin_delete);
    avltree_delete(tree);
    return bins_used;
}

These are the results with First Fit. Notice how the bins are no longer being filled in sequence.

7 bins were used
Item #0 (size 1) is in bin 0
Item #1 (size 4) is in bin 0
Item #2 (size 9) is in bin 1
Item #3 (size 4) is in bin 0
Item #4 (size 1) is in bin 0
Item #5 (size 5) is in bin 2
Item #6 (size 8) is in bin 3
Item #7 (size 3) is in bin 2
Item #8 (size 2) is in bin 2
Item #9 (size 5) is in bin 4
Item #10 (size 7) is in bin 5
Item #11 (size 3) is in bin 4
Item #12 (size 2) is in bin 3
Item #13 (size 6) is in bin 6

Bin packing by First Fit (FF) algorithm

Figure 2: Bin packing by First Fit (FF) algorithm

Best Fit (BF)

The idea of Best Fit is to try to pack each item in the tightest spot available, in the hope that this will better fill the bins:

  1. Try to place the item in the fullest bin that will accommodate it, i.e., the bin that will leave the least room left over
  2. If no bin is found, start a new bin

This has been proved to use no more than 1.7 × M bins, the same upper bound as First Fit.
It can also be implemented in O(n log n) time with an AVL tree to store the bins.

Here is the function to find the least element greater than or equal to a value in an AVL tree:

void *avltree_best_fit(const avltree *tree, const void *data)
{
    void *best = NULL;
    unsigned int found = 0;
    avltreenode *node = tree->root;
    while (node != NULL && !found) {
        int result = tree->compare(node->data, data);
        if (result > 0) {
            if (best == NULL || tree->compare(node->data, best) < 0) {
                best = node->data;
            }
            node = node->left;
        }
        else if (result < 0) {
            node = node->right;
        }
        else {
            best = node->data;
            found = 1;
        }
    }
    return best;
}

We can now use this to find the bin to use in the Best Fit algorithm:

bin *find_best_bin(const avltree *tree, unsigned int size)
{
    bin b = {0, size};
    return avltree_best_fit(tree, &b);
}

unsigned int best_fit(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    unsigned int bins_used = 0;
    unsigned int item;
    avltree *tree = avltree_create((avltree_cmpfn)bin_compare);
    if (tree == NULL) {
        return 0;
    }
    for (item = 0; item < numitems; item++) {
        bin *b = find_best_bin(tree, sizes[item]);
        if (b) {
            /* Add to this bin */
            avltree_remove(tree, b);
            bins[item] = b->id;
            bin_use(b, sizes[item]);
            avltree_add(tree, b);
        }
        else {
            /* Create a new bin and add to it */
            b = bin_create(bins_used, binsize);
            bins[item] = b->id;
            bin_use(b, sizes[item]);
            avltree_add(tree, b);
            bins_used++;
        }
    }
    avltree_for_each(tree, (avltree_forfn)bin_delete);
    avltree_delete(tree);
    return bins_used;
}

These are the results of the Best Fit algorithm. Comparing them to First Fit, notice how item #11 was placed in bin 5, rather than bin 4, because bin 5 had the lesser amount of free space while still being able to accommodate it.

7 bins were used
Item #0 (size 1) is in bin 0
Item #1 (size 4) is in bin 0
Item #2 (size 9) is in bin 1
Item #3 (size 4) is in bin 0
Item #4 (size 1) is in bin 0
Item #5 (size 5) is in bin 2
Item #6 (size 8) is in bin 3
Item #7 (size 3) is in bin 2
Item #8 (size 2) is in bin 2
Item #9 (size 5) is in bin 4
Item #10 (size 7) is in bin 5
Item #11 (size 3) is in bin 5
Item #12 (size 2) is in bin 3
Item #13 (size 6) is in bin 6

Bin packing by Best Fit (BF) algorithm

Figure 3: Bin packing by Best Fit (BF) algorithm

Worst Fit (WF)

This is the dual of Best Fit, in which each item is placed in the largest space available.

  1. Try to place the item in the least full bin that will accommodate it, i.e., bin that will leave the most room left over
  2. If no bin is found, start a new bin

Although this is the dual of Best Fit, the data structure to use is a heap, rather than a binary search tree, because we want the bin with the most space.

unsigned int worst_fit(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    unsigned int bins_used = 0;
    unsigned int item;
    maxheap *heap = maxheap_create();
    if (heap == NULL) {
        return 0;
    }
    for (item = 0; item < numitems; item++) {
        bin *b = maxheap_remove_max(heap);
        if (b && b->capacity < sizes[item]) {
            /* Too small; put b back */
            maxheap_add(heap, b, b->capacity);
            b = NULL;
        }
        if (!b) {
            /* Create a new bin */
            b = bin_create(bins_used, binsize);
            bins_used++;
        }
        bins[item] = b->id;
        bin_use(b, sizes[item]);
        maxheap_add(heap, b, b->capacity);
    }
    maxheap_for_each(heap, (maxheap_forfn)bin_delete);
    maxheap_delete(heap);
    return bins_used;
}

These are the results for Worst Fit. Notice when comparing to Best Fit that item #12 was placed in bin 5 rather than bin 3, because bin 5 had more space.

7 bins were used
Item #0 (size 1) is in bin 0
Item #1 (size 4) is in bin 0
Item #2 (size 9) is in bin 1
Item #3 (size 4) is in bin 0
Item #4 (size 1) is in bin 0
Item #5 (size 5) is in bin 2
Item #6 (size 8) is in bin 3
Item #7 (size 3) is in bin 2
Item #8 (size 2) is in bin 2
Item #9 (size 5) is in bin 4
Item #10 (size 7) is in bin 5
Item #11 (size 3) is in bin 4
Item #12 (size 2) is in bin 5
Item #13 (size 6) is in bin 6

Bin packing by Worst Fit (WF) algorithm

Figure 4: Bin packing by Worst Fit (WF) algorithm

First Fit Decreasing (FFD)

A recurring theme with the algorithms we’ve seen is that they are very susceptible to the order of the items. In particular, if large items follow small ones it’s hard to accommodate them in the open bins and a new one often needs to be opened for them.

It’s this insight that drives the next algorithms, which operate by first sorting the items in descending order of size, so the largest items come first, with the intention that the smaller items will then fit into the gaps they leave. Note that this requires that we can collect all of the items together before we start packing, in other words, these are offline algorithms as described in the introduction.

The first such algorithm is a First Fit after sorting the items:

  1. Sort the items to be inserted in decreasing order by size
  2. Go through the items in sorted order
    1. Try to place the item in the first bin that will accommodate it
    2. If no bin is found, start a new bin

It can been proved that First Fit Decreasing uses at most (4M + 1) / 3 bins if the optimal is M.

For this algorithm we need a couple of utilities to compare and sort unsigned integers:

int compare_uints_decreasing(const void *v1, const void *v2)
{
    const unsigned int *p1 = v1;
    const unsigned int *p2 = v2;
    if (*p1 > *p2) {
        return -1;
    }
    if (*p1 < *p2) {
        return 1;
    }
    return 0;
}

void sort_uints_decreasing(unsigned int *sizes, unsigned int numitems)
{
    qsort(sizes, numitems, sizeof(unsigned int), compare_uints_decreasing);
}

With those in place, we just need to sort the items and then perform the existing First Fit on them:

unsigned int first_fit_decreasing(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    sort_uints_decreasing(sizes, numitems);
    return first_fit(binsize, sizes, bins, numitems);
}

These are the results. Notice that this time we have an optimal packing, with only 6 bins used. In fact I got an optimal packing like this for all of the offline algorithms.

6 bins were used
Item #0 (size 9) is in bin 0
Item #1 (size 8) is in bin 1
Item #2 (size 7) is in bin 2
Item #3 (size 6) is in bin 3
Item #4 (size 5) is in bin 4
Item #5 (size 5) is in bin 4
Item #6 (size 4) is in bin 3
Item #7 (size 4) is in bin 5
Item #8 (size 3) is in bin 2
Item #9 (size 3) is in bin 5
Item #10 (size 2) is in bin 1
Item #11 (size 2) is in bin 5
Item #12 (size 1) is in bin 0
Item #13 (size 1) is in bin 5

Bin packing by First Fit Decreasing (FFD) algorithm

Figure 5: Bin packing by First Fit Decreasing (FFD) algorithm

Best Fit Decreasing (BFD)

This is a Best Fit after sorting the items.

  1. Sort the items to be inserted in decreasing order by size
  2. Go through the items in sorted order
    1. Try to place the item in the fullest bin that will accommodate it, i.e., the one that will leave the least space remaining
    2. If no bin is found, start a new bin
unsigned int best_fit_decreasing(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    sort_uints_decreasing(sizes, numitems);
    return best_fit(binsize, sizes, bins, numitems);
}

The results were exactly the same as for First Fit Decreasing.

Worst Fit Decreasing (WFD)

This is Worst Fit after sorting.

  1. Sort the items to be inserted in decreasing order by size
  2. Go through the items in sorted order
    1. Try to place the item in the least full bin that will accommodate it, i.e., the one that will leave the most space remaining
    2. If no bin is found, start a new bin
unsigned int worst_fit_decreasing(unsigned int binsize, unsigned int *sizes, unsigned int *bins,
        unsigned int numitems)
{
    sort_uints_decreasing(sizes, numitems);
    return worst_fit(binsize, sizes, bins, numitems);
}

The results look exactly the same as the other offline algorithms, but examining the order of packing shows that item #10 and item #11 were packed the opposite way round, because there was more space in bin 5 when item #10 was packed. The end result is the same because the items are the same size.

6 bins were used
Item #0 (size 9) is in bin 0
Item #1 (size 8) is in bin 1
Item #2 (size 7) is in bin 2
Item #3 (size 6) is in bin 3
Item #4 (size 5) is in bin 4
Item #5 (size 5) is in bin 4
Item #6 (size 4) is in bin 3
Item #7 (size 4) is in bin 5
Item #8 (size 3) is in bin 5
Item #9 (size 3) is in bin 2
Item #10 (size 2) is in bin 5
Item #11 (size 2) is in bin 1
Item #12 (size 1) is in bin 0
Item #13 (size 1) is in bin 5

Conclusion

That concludes out tour of approximate bin-packing algorithms. The offline algorithms we’ve seen very often produce optimal results, but that hasn’t prevented a great deal of research on optimal algorithms. A new technique called Bin Completion (Korf, 2002) is believed to be the fastest known optimal algorithm.

References

Feature Column from the AMS: Bin Packing
Bin Packing Algorithms
A New Algorithm for Optimal Bin Packing, Richard E Korf

Cirque in C

A circular queue, or cirque, or ring buffer is an array that wraps around, so that it can be used as a queue without walking backwards in memory.

This implementation reallocates the buffer when it becomes full (i.e., when the head and tail of the queue meet).

The header file:

#ifndef CIRQUE_H
#define CIRQUE_H

struct cirque {
	unsigned int head; /* First element */
	unsigned int tail; /* 1 past the last element */
	unsigned int is_full;
	void ** entries;
	unsigned int size;
};
typedef struct cirque cirque;

typedef void (*cirque_forfn)(void*);

cirque * cirque_create(void);
void cirque_delete(cirque * queue);
unsigned int cirque_insert(cirque * queue, void * data);
void *  cirque_remove(cirque * queue);
void *cirque_peek(const cirque * queue);
unsigned int cirque_get_count(const cirque * queue);
void cirque_for_each(const cirque * queue, cirque_forfn fun);

#endif /* CIRQUE_H */

Implementation:

#include <stdlib.h>

#include <cirque.h>

cirque * cirque_create(void)
{
    const unsigned int size = 4;
	cirque * queue = malloc(sizeof(cirque));
	if (queue) {
		queue->entries = malloc(size * sizeof(void *));
		if (queue->entries) {
			queue->size = size;
			queue->head = 0;
			queue->tail = 0;
			queue->is_full = 0;
		}
		else {
			free(queue);
			queue = NULL;
		}
	}
	return queue;
}

void cirque_delete(cirque * queue)
{
	if (queue) {
		free(queue->entries);
		free(queue);
	}
}

static void cirque_resize(cirque * queue)
{
	void **temp = malloc(queue->size * 2 * sizeof(void *));
	if (temp) {
		unsigned int i = 0;
		unsigned int h = queue->head;
		do {
			temp[i] = queue->entries[h++];
			if (h == queue->size) {
				h = 0;
			}
			i++;
		} while (h != queue->tail);
		free(queue->entries);
		queue->entries = temp;
		queue->head = 0;
		queue->tail = queue->size;
		queue->size *= 2;
		queue->is_full = 0;
	}
}

static unsigned int cirque_is_empty(const cirque * queue)
{
	return (queue->head == queue->tail) && !queue->is_full;
}

unsigned int cirque_insert(cirque * queue, void * data)
{
	unsigned int result;
	if (queue->is_full) {
		cirque_resize(queue);
		if (queue->is_full) {
			result = 0;
		}
	}
	if (!queue->is_full) {
		queue->entries[queue->tail++] = data;
		if (queue->tail == queue->size) {
			queue->tail = 0;
		}
		if (queue->tail == queue->head) {
			queue->is_full = 1;
		}
		result = 1;
	}
	return result;
}

void * cirque_remove(cirque * queue)
{
	void * data = NULL;
	if (!cirque_is_empty(queue)) {
		if (queue->is_full) {
			queue->is_full = 0;
		} 
		data = queue->entries[queue->head++];
		if (queue->head == queue->size) {
			queue->head = 0;
		}
	}
	return data;
}
  
void *cirque_peek(const cirque * queue)
{
	void *data = NULL;
	if (!cirque_is_empty(queue)) {
		data = queue->entries[queue->head];
	}
	return data;
}

unsigned int cirque_get_count(const cirque * queue)
{
	unsigned int count;
   	if (cirque_is_empty(queue)) {
		count = 0;
	}
	else if (queue->is_full) {
		count = queue->size;
	}
	else if (queue->tail > queue->head) {
		count = queue->tail - queue->head;
	}
	else {
		count = queue->size - queue->head;
		if (queue->tail > 0) {
			count += queue->tail - 1;
		}
	}
	return count;
}

void cirque_for_each(const cirque * queue, cirque_forfn fun)
{
	if (!cirque_is_empty(queue)) {
		unsigned int h = queue->head;
		do {
			fun(queue->entries[h++]);
			if (h == queue->size) {
				h = 0;
			}
		} while (h != queue->tail);
	}
}

An example program:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#include <cirque.h>

int main(void)
{
    cirque * queue;
    char buf[16];
    unsigned int f;
    const unsigned int max = 32;
    const unsigned int limit = 16;

    queue = cirque_create();
    for (f = 0; f < max; f++) {
        sprintf(buf, "Item %d", f);
        if (f >= limit) {
            /* Start removing at limit to show the queue doesn't keep growing */
            char *data = cirque_remove(queue);
            printf("Removed %s\n", data);
            free(data);
        }
        printf("Inserting %s\n", buf);
        cirque_insert(queue, strdup(buf));
    }
    cirque_for_each(queue, (cirque_forfn)puts);
    printf("Size is %d (should be %d)\n", queue->size, limit);
    cirque_for_each(queue, free);
    cirque_delete(queue);

    return 0;
}

Sorted array in C

This is a sorted dynamic array. It allows finding elements in O(log n) time, the same as for a binary search tree. Inserting or deleting elements is in amortized O(n) time because of the need to shift elements to make room or fill gaps.

The main advantages of a sorted array over a binary search tree are:

  • Simpler implementation
  • Better locality of reference, so improved cache performance
  • Iteration does not involve following pointers, just increasing an index variable
  • Finding the nth element is O(1), while it is O(n) in a binary search tree
  • The array, or a range from it, can be copied to another memory location easily without iteration

Here is the header file:

#ifndef SORTEDARRAY_H
#define SORTEDARRAY_H

#include <dynarray.h>

typedef int (*sortedarray_cmpfn)(const void*, const void*);
typedef void (*sortedarray_forfn)(void*);

struct sortedarray {
	dynarray *array;
	sortedarray_cmpfn compare;
};

typedef struct sortedarray sortedarray;

sortedarray * sortedarray_create(sortedarray_cmpfn compare);
void sortedarray_delete(sortedarray * array);
void * sortedarray_add(sortedarray * array, void * data);
void * sortedarray_remove(sortedarray * array, const void * data);
void * sortedarray_find(const sortedarray * array, const void * data);
void sortedarray_for_each(const sortedarray * array, sortedarray_forfn forfn);
unsigned int sortedarray_get_count(const sortedarray *array);
void * sortedarray_get(const sortedarray *array, unsigned int pos);
void * sortedarray_remove_at(sortedarray *array, unsigned int pos);
int sortedarray_find_index(const sortedarray *array, const void *data);

#endif /* SORTEDARRAY_H */

The implementation:

#include <stdlib.h>

#include <sortedarray.h>

sortedarray * sortedarray_create(sortedarray_cmpfn compare)
{
	sortedarray *array = malloc(sizeof(sortedarray));
	if (array) {
		array->array = dynarray_create(0);
		array->compare = compare;
	}
	return array;
}

void sortedarray_delete(sortedarray * array)
{
	if (array) {
		dynarray_delete(array->array);
		free(array);
	}
}

typedef struct {
	void * data;
	unsigned int pos;
} search_result;

static search_result sortedarray_search(const sortedarray *array, const void *data)
{
	search_result sr;
	unsigned int elements = array->array->count;
	unsigned int offset = 0;
	unsigned int middle = 0;
	
	sr.data = NULL;
	while (elements > 0 && !sr.data) {
		int result;
		middle = elements / 2;
		result = array->compare(data, dynarray_get(array->array, offset + middle));
		if (result > 0) {
			offset = offset + middle + 1;
			elements = elements - (middle + 1);
		}
		else if (result < 0) {
			elements = middle;
		}
		else {
			sr.data = dynarray_get(array->array, offset + middle);
			offset += middle;
		}
	}
	sr.pos = offset;

	return sr;
}

void * sortedarray_add(sortedarray * array, void * data)
{
	search_result result = sortedarray_search(array, data);
	void *existing = result.data;
	if (existing) {
		/* Replace */
		dynarray_set(array->array, result.pos, data);
	}
	else {
		/* Add new */
		dynarray_insert(array->array, result.pos, data);
	}
	return existing;
}

void * sortedarray_remove(sortedarray * array, const void * data)
{
	search_result result = sortedarray_search(array, data);
	if (result.data) {
		dynarray_remove(array->array, result.pos);
	}
	return result.data;
}

void * sortedarray_find(const sortedarray * array, const void * data)
{
	search_result result = sortedarray_search(array, data);
	return result.data;
}

void sortedarray_for_each(const sortedarray * array, sortedarray_forfn forfn)
{
	dynarray_for_each(array->array, forfn);
}

unsigned int sortedarray_get_count(const sortedarray *array)
{
	return array->array->count;
}

void * sortedarray_get(const sortedarray *array, unsigned int pos)
{
	return dynarray_get(array->array, pos);
}

void * sortedarray_remove_at(sortedarray * array, unsigned int pos)
{
	return dynarray_remove(array->array, pos);
}

int sortedarray_find_index(const sortedarray *array, const void *data)
{
	int index = -1;
	search_result result = sortedarray_search(array, data);
	if (result.data) {
		index = result.pos;
	}
	return index;
}

An example program:

#include <stdio.h>
#include <string.h>

#include <sortedarray.h>

int main(void)
{
    sortedarray * array;
    const char * result;
    unsigned int e;
    char * elements[] = {"orange", "apple", "pear", "grapefruit", "cherry", "plum"};
    const unsigned int n = sizeof(elements) / sizeof(const char*);

    array = sortedarray_create((sortedarray_cmpfn)strcmp);
    for (e = 0; e < n; e++) {
        sortedarray_add(array, elements[e]);
    }
    sortedarray_for_each(array, (sortedarray_forfn)puts);
    for (e = 0; e < n; e++) {
        result = sortedarray_find(array, elements[e]);
        if (result) {
            printf("Found: %s at %d\n", result, 
                    sortedarray_find_index(array, elements[e]));
        }
        else {
            printf("Couldn't find %s\n", elements[e]);
        }
    }
    for (e = 0; e < n; e++) {
        result = sortedarray_remove(array, elements[e]);
        if (result) {
            printf("Removed: %s\n", result);
        }
        else {
            printf("Couldn't remove %s\n", elements[e]);
        }
    }
    sortedarray_delete(array);

    return 0;
}

Singly-linked list in C

A singly-linked list is a simple data structure in which each node has a pointer to the next node in the list. The main difference with a doubly-linked list is that to insert before a node, or remove a node, you need to have found the previous node as well, so that you can change its next pointer. This means that when iterating over the list to perform these modifications you need to keep track of a previous node shadowing the current one as follows:

snode *current, *previous = NULL;
for (current = list->head; current != NULL; current = current->next) {
    /* Modify current, using previous where necessary */
    previous = current;
}

Here is the header file:

#ifndef SLIST_H
#define SLIST_H

struct snode {
	void * data;
	struct snode * next;
};
typedef struct snode snode;

struct slist {
	snode * head;
	snode * tail;
	unsigned int count;
};
typedef struct slist slist;

typedef void (*slist_forfn)(void *);

slist * slist_create(void);
void slist_empty(slist * list);
void slist_delete(slist * list);
void slist_add_tail(slist * list, void * data);
void slist_add_head(slist * list, void * data);
void * slist_remove_head(slist * list);
void * slist_remove_tail(slist * list);
void * slist_remove(slist *list, snode *node, snode *previous);
void slist_insert_before(slist * list, snode * node, snode * previous, void * data);
snode * slist_insert_after(slist * list, snode * node, void * data);
void slist_for_each(const slist * list, slist_forfn fun);
unsigned int slist_get_count(const slist * list);

#endif /* SLIST_H */

The implemention:

#include <stdlib.h>

#include <slist.h>

static snode * snode_create(void * data)
{
	snode * node = malloc(sizeof(snode));
	if (node) {
		node->data = data;
		node->next = NULL;
	}

	return node;
}

slist * slist_create(void)
{
	slist * list = malloc(sizeof(slist));
	if (list) {
		list->head = NULL;
		list->tail = NULL;
		list->count = 0;
	}

	return list;
}

void slist_empty(slist * list)
{
	snode * node, * temp;
	node = list->head;
	while (node != NULL) {
		temp = node->next;
		free(node);
		node = temp;
	}
}

void slist_delete(slist * list)
{
	if (list) {
		slist_empty(list);
		free(list);
	}
}

void slist_add_tail(slist * list, void * data)
{
	snode * node = snode_create(data);
	if (list->head == NULL) {
		/* Adding the first node */
		list->head = node;
		list->tail = node;
	}
	else {
		list->tail->next = node;
		list->tail = node;
	}
	list->count++;
}

void slist_add_head(slist * list, void * data)
{
	snode * node = snode_create(data);
	if (list->tail == NULL) {
		/* Adding the first node */
		list->head = node;
		list->tail = node;
	}
	else {
		node->next = list->head;
		list->head = node;
	}
	list->count++;
}

void * slist_remove_head(slist * list)
{
	void *data = NULL;

	if (list->head) {
		snode *temp = list->head;
		if (list->head->next) {
			list->head = list->head->next;
		}
		else {
			/* List is now empty */
			list->head = NULL;
			list->tail = NULL;
		}
		data = temp->data;
		free(temp);
		list->count--;
		if (list->count == 1) {
			list->tail = list->head;
		}
	}
	
	return data;
}

void * slist_remove_tail(slist * list)
{
	void *data = NULL;

	if (list->tail) {
		snode *current, *previous = NULL;
		current = list->head;
		while (current->next) {
			previous = current;
			current = current->next;
		}
		data = current->data;
		free(current);
		if (previous) {
			previous->next = NULL;
		}
		else {
			/* List is now empty */
			list->head = NULL;
			list->tail = NULL;
		}
		list->count--;
		if (list->count == 1) {
			list->head = list->tail;
		}
	}

	return data;
}

void * slist_remove(slist *list, snode *node, snode *previous)
{
	void *data;
	if (node == list->head) {
		data = slist_remove_head(list);
	}
	else {
		previous->next = node->next;
		data = node->data;
		list->count--;
		if (list->count == 1) {
			list->tail = list->head;
		}
		else if (node == list->tail) {
			list->tail = previous;
		}
		free(node);
	}
	return data;
}

void slist_insert_before(slist * list, snode * node, snode * previous, void * data)
{
	if (node == list->head) {
		slist_add_head(list, data);
	}
	else {
		snode * newnode = snode_create(data);
		newnode->next = node;
		previous->next = newnode;
		list->count++;
	}
}

snode * slist_insert_after(slist * list, snode * node, void * data)
{
	snode * newnode;
	if (node == NULL) {
		slist_add_head(list, data);
		newnode = list->head;
	}
	else {
		newnode = snode_create(data);
		if (newnode) {
			newnode->next = node->next;
			node->next = newnode;
			if (node == list->tail) {
				list->tail = newnode;
			}
		}
		list->count++;
	}
	return newnode;
}

void slist_for_each(const slist * list, slist_forfn fun)
{
	snode * node;
	for (node = list->head; node != NULL; node = node->next) {
		fun(node->data);
	}
}

unsigned int slist_get_count(const slist * list)
{
	return list->count;
}

An example program:

#include <stdio.h>
#include <string.h>

#include <slist.h>

int main(void)
{
    slist * list = slist_create();
    char * elements[] = {"A", "B", "C", "D", "E", "F"};
    const unsigned int n = sizeof(elements) / sizeof(const char*);
    unsigned int i;
    snode * node, * previous = NULL;
    unsigned int found = 0;

    /* Populate the list with A, B, ..., F */
    for (i = 0; i < n; i++) {
        slist_add_tail(list, elements[i]);
    }
    
    /* Insert X and Y either side of C */
    for (node = list->head; node != NULL && !found; node = node->next) {
        if (strcmp((const char*)node->data, "C") == 0) {
            slist_insert_before(list, node, previous, "X");
            slist_insert_after(list, node, "Y");
            found = 1;
        }
        previous = node;
    }

    /* Forward iteration */
    for (node = list->head; node != NULL; node = node->next) {
        printf("%s\n", (const char*)node->data);
    }

    slist_delete(list);

    return 0;
}

Linked list in C

This is a doubly-linked list implementation. A linked list allows easy insertion and removal at either end or in the middle. Being doubly-linked allows traversal in reverse as well as forwards.

Here is an example program:

#include <stdio.h>
#include <string.h>

#include <linkedlist.h>

int main(void)
{
    linkedlist * list = linkedlist_create();
    char * elements[] = {"A", "B", "C", "D", "E", "F"};
    const unsigned int n = sizeof(elements) / sizeof(const char*);
    unsigned int i;
    listnode * node;
    unsigned int found = 0;

    /* Populate the list with A, B, ..., F */
    for (i = 0; i < n; i++) {
        linkedlist_add_tail(list, elements[i]);
    }
    
    /* Insert X and Y either side of C */
    for (node = list->head; node != NULL && !found; node = node->next) {
        if (strcmp((const char*)node->data, "C") == 0) {
            linkedlist_insert_before(list, node, "X");
            linkedlist_insert_after(list, node, "Y");
            found = 1;
        }
    }

    /* Forward iteration */
    for (node = list->head; node != NULL; node = node->next) {
        printf("%s\n", (const char*)node->data);
    }
    putchar('\n');

    /* Reverse iteration */
    for (node = list->tail; node != NULL; node = node->previous) {
        printf("%s\n", (const char*)node->data);
    }
    
    linkedlist_delete(list);

    return 0;
}

The header file:

#ifndef LINKEDLIST_H
#define LINKEDLIST_H

struct listnode {
	struct listnode * next;
	struct listnode * previous;
	void * data;
};
typedef struct listnode listnode;

struct linkedlist {
	listnode * head;
	listnode * tail;
	unsigned int count;
};
typedef struct linkedlist linkedlist;

typedef void (*linkedlist_forfn)(void *);

linkedlist * linkedlist_create(void);
void linkedlist_empty(linkedlist * list);
void linkedlist_delete(linkedlist * list);
void linkedlist_add_head(linkedlist * list, void * data);
void linkedlist_add_tail(linkedlist * list, void * data);
void linkedlist_insert_before(linkedlist * list, listnode * node, void * data);
void linkedlist_insert_after(linkedlist * list, listnode * node, void * data);
void * linkedlist_remove_head(linkedlist * list);
void * linkedlist_remove_tail(linkedlist * list);
void * linkedlist_remove_node(linkedlist * list, listnode * node);
void linkedlist_for_each(const linkedlist * list, linkedlist_forfn fun);
unsigned int linkedlist_get_count(const linkedlist * list);

#endif /* LINKEDLIST_H */

The implementation:

#include <stdlib.h>

#include <linkedlist.h>

listnode * listnode_create(void * data)
{
	listnode * node = malloc(sizeof(listnode));
	if (node) {
		node->next = NULL;
		node->previous = NULL;
		node->data = data;
	}
	return node;
}

linkedlist * linkedlist_create(void)
{
	linkedlist * list = malloc(sizeof(linkedlist));
	if (list) {
		list->head = NULL;
		list->tail = NULL;
		list->count = 0;
	}
	return list;
}

void linkedlist_empty(linkedlist * list)
{
	while(list->head != NULL) {
		linkedlist_remove_tail(list);
    }
}

void linkedlist_delete(linkedlist * list)
{
	if (list) {
		linkedlist_empty(list);
		free(list);
	}
}

void linkedlist_add_head(linkedlist * list, void * data)
{
	listnode * node;
	node = listnode_create(data);
	if (list->head != NULL) {
		list->head->previous = node;
		node->next = list->head;
		list->head = node;
	}
	else {
		list->head = node;
		list->tail = node;
	}
	list->count++;
}

void linkedlist_add_tail(linkedlist * list, void * data)
{
	listnode * node;
	node = listnode_create(data);
	if (list->tail != NULL) {
		list->tail->next = node;
		node->previous = list->tail;
		list->tail = node;
	}
	else {
		list->head = node;
		list->tail = node;
	}
	list->count++;
}

void linkedlist_insert_before(linkedlist * list, listnode * node, void * data)
{
	listnode * newnode;
	if (node == list->head) {
		linkedlist_add_head(list, data);
	}
	else {
		newnode = listnode_create(data);
		newnode->next = node;
		newnode->previous = node->previous;
		node->previous->next = newnode;
		node->previous = newnode;
		list->count++;
	}
}

void linkedlist_insert_after(linkedlist * list, listnode * node, void * data)
{
	listnode * newnode;
	if (node == list->tail) {
		linkedlist_add_tail(list, data);
	}
	else {
		newnode = listnode_create(data);
		newnode->previous = node;
		newnode->next = node->next;
		node->next->previous = newnode;
		node->next = newnode;
		list->count++;
	}
}

void * linkedlist_remove_head(linkedlist * list)
{
	void * data = NULL;
	if (list->head != NULL) {
		listnode * temp = list->head;
		list->head = list->head->next;
		if (list->head == NULL) {
			list->tail = NULL;
		}
		else {
			list->head->previous = NULL;
			if (list->head->next != NULL) {
				list->head->next->previous = list->head;
			}
			else {
				list->tail = list->head;
			}
		}
		data = temp->data;
		free(temp);
		list->count--;
	}
	return data;
}

void * linkedlist_remove_tail(linkedlist * list)
{
	void * data = NULL;
	if (list->tail != NULL) {
		listnode * temp = list->tail;
		list->tail = list->tail->previous;
		if (list->tail == NULL) {
			list->head = NULL;
		}
		else {
			list->tail->next = NULL;
			if (list->tail->previous != NULL) {
				list->tail->previous->next = list->tail;
			}
			else {
				list->head = list->tail;
			}
		}
		data = temp->data;
		free(temp);
		list->count--;
	}
	
	return data;
}

void * linkedlist_remove_node(linkedlist * list, listnode * node)
{
	void * data;
	if (node == list->head) {
		data = linkedlist_remove_head(list);
	}
	else if (node == list->tail) {
		data = linkedlist_remove_tail(list);
	}
	else {
		node->previous->next = node->next;
		node->next->previous = node->previous;
		data = node->data;
		free(node);
		list->count--;
	}
	return data;
}

void linkedlist_for_each(const linkedlist * list, forfn fun)
{
	listnode * node;
	for (node = list->head; node != NULL; node = node->next) {
		fun(node->data);
    }
}

unsigned int linkedlist_get_count(const linkedlist * list)
{
	return list->count;
}

Finding out if a binary search tree is AVL recursively

A binary search tree with the AVL property has no node whose left and right heights differ by more than 1. AVL trees maintain this property by maintaining balance information in their nodes, and rebalancing themselves when they find the property has been violated.
It is possible, however, to determine whether a tree without any balance information has the AVL property by a depth-first search of the tree, checking the property at every node and propagating the result upwards.

typedef struct {
	unsigned int height;
	unsigned int avl;
} avl_info;

avl_info binarytree_avl_recursive(const btnode *node)
{
	avl_info info;
	info.height = 0;
	info.avl = 1;
	unsigned int leftheight = 0, rightheight = 0;
    if (node->left || node->right) {
        unsigned int leftavl = 1, rightavl = 1;
        avl_info child_info;
        if (node->left) {
            child_info = binarytree_avl_recursive(node->left);
            leftheight = child_info.height + 1;
            leftavl = child_info.avl;
        }
        if (node->right) {
            child_info = binarytree_avl_recursive(node->right);
            rightheight = child_info.height + 1;
            rightavl = child_info.avl;
        }
        if (leftheight > rightheight) {
            info.height = leftheight;
        } 
        else {
            info.height = rightheight;
        }
        info.avl = leftavl && rightavl && abs(leftheight - rightheight) <= 1;
    }
	return info;
}

unsigned int binarytree_avl(const binarytree *tree)
{
    unsigned int result = 1;
    if (tree->root != NULL) {
        result = binarytree_avl_recursive(tree->root).avl;
    }
    return result;
}

Counting internal nodes in a binary tree recursively

This is the counterpart of counting leaves in a binary tree recursively. If we are an internal node, we count 1 for ourselves, then recurse into the left and right subtrees and sum the count of internal nodes in them.

unsigned int binarytree_count_internal_nodes_recursive(const btnode *root)
{
    unsigned int count = 0;
    if (root->left != NULL || root->right != NULL) {
        count = 1;
        if (root->left != NULL) {
		    count += binarytree_count_internal_nodes_recursive(root->left);
	    }
        if (root->right != NULL) {
            count += binarytree_count_internal_nodes_recursive(root->right);
        }
    }
	return count;
}

unsigned int binarytree_count_internal_nodes(const binarytree *tree)
{
    unsigned int count = 0;
    if (tree->root != NULL) {
	    count = binarytree_count_internal_nodes_recursive(tree->root);
    }
    return count;
}

Counting leaves in a binary tree recursively

This is another recursive algorithm. If we are a leaf node, we return 1, and if not, we recurse into the left and right subtrees and return the sum of the leaves in them.

unsigned int binarytree_count_leaves_recursive(const btnode *root)
{
    unsigned int count = 0;
    if (root->left == NULL && root->right == NULL) {
        count = 1;
    }
    else {
        if (root->left != NULL) {
		    count += binarytree_count_leaves_recursive(root->left);
	    }
        if (root->right != NULL) {
            count += binarytree_count_leaves_recursive(root->right);
        }
    }
	return count;
}

unsigned int binarytree_count_leaves(const binarytree *tree)
{
    unsigned int count = 0;
    if (tree->root != NULL) {
	    count = binarytree_count_leaves_recursive(tree->root);
    }
    return count;
}

Avoiding deeply-nested if…else chains

Sometimes you get code that looks like this:

if (check1()) {
    // Do something...
    if (check2()) {
        // Do more...
        if (check3()) {
            // And more...
        }
    }
}

The code is rapidly moving away from the left margin, and becoming difficult to read. You can’t just combine the checks with && because something is being done between them.

I think the best thing to do in this case is to put the code into a separate function, negate the checks, and use a return when they fail. This has the advantage that you can return a different value for each check failure, so it becomes much easier to test.

int f()
{
    if (!check1()) {
        return CHECK1_FAILED;
    }
    // Do something...
    if (!check2()) {
        return CHECK2_FAILED;
    }
    // Do more...
    if (!check3()) {
        return CHECK3_FAILED;
    }
    // And more...
    return SUCCEEDED;
}

About iostream::eof()

The iostream::eof() function detects that the stream has reached the end of the file, and no more data can be read. However, it only reaches this state when an attempt is made to read after the end of the file, so it can’t be used to determine if there is more data to be read.

For example, consider the following program:

#include <iostream>
#include <fstream>

int main()
{
    char filename[] = "numbers.dat";
    std::ifstream is(filename);
    if (is) {
        int i;
        while (!is.eof()) {
            is >> i;
            std::cout << i << "\n";
        }
    }
    else {
        std::cerr << "Couldn't open " << filename << " for reading\n";
    }
}

Imagine that numbers.dat contains the following:

1 2 3 4 5

The program won’t work properly because the loop condition !is.eof() will still be true after reading the last 5, so a subsequent read and assignment will be made, when there are no more numbers left in the file. When I run it I get this result, with the 5 repeated at the end:

1
2
3
4
5
5

The most common, and a correct way to loop while reading a file is to check that the read operation has succeeded in the loop condition instead:


        int i;
        while (is >> i) {
            std::cout << i << "\n";
        }

This produces the expected output:

1
2
3
4
5

It works correctly because after the last 5 has been read, the extraction operation returns false, and so the loop body is not entered.

The precise reason for this is that after the last 5 has been read the next read operation will fail, and the stream enters a failed state (is.fail() would return true). The loop is checking the return value of the extraction operation, which returns the stream itself. iostream::operator void*() is called, which returns NULL when fail() returns true, and this NULL pointer is implicitly converted to boolean false, which ends the loop.

The eof() function isn’t useless however; you can use it to determine whether a reading loop has ended because EOF has been reached, or if something else has gone wrong.

Consider the following:

        int i;
        while (is >> i) {
            std::cout << i << "\n";
        }
        if (is.eof()) {
            std::cout << "End of file reached\n";
        }
        else {
            std::cerr << "Error reading\n";
        }

If I run this on the file of numbers, I get:

1
2
3
4
5
End of file reached

But if I run it on a file containing letters as well:

1 2 3 a b 4 5

I get this result:

1
2
3
Error reading

So using eof() has allowed me to tell that there was a problem reading. Without it, I might have thought there were only 3 numbers in the file.