Tag Archives: Prefixes

Prefixes, suffixes, and substrings in C

Prefixes

To find all prefixes of a string, just find all substrings starting at the beginning of the string of length from 1 to the length of the string. A good way of returning these to the client is to allow the client to pass in a function pointer, which is called for each prefix found:

typedef void(*substring_fn)(const char *, size_t, void *);

void prefixes(const char *str, size_t len, substring_fn fun, void *data)
{
    unsigned int length;
    for (length = 1; length <= len; length++) {
        fun(str, length, data);
    }
}

The client function is given the beginning of the string and a length, rather than a nul-terminated string. This means that the original string does not need to be modified or copied, which is much more efficient. The client can use the length to do something with the string, like printing it using printf with the * format specifier to specify the length:

#include <stdio.h>
#include <string.h>

#include <substring.h>

void print(const char *str, size_t len, void *data)
{
    printf("%.*s\n", (int)len, str);
}

int main(void)
{
    char str[] = "abracadabra";
    prefixes(str, strlen(str), print, NULL);
    return 0;
}

This produces the following output:

a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra

The third argument to the client function is a void * pointer that the client can use to pass anything it needs into the function. For example, it might pass in a dynamic array and add the prefixes to that:

#include <stdio.h>
#include <string.h>

#include <substring.h>
#include <dynarray.h>

void add_to_dynarray(const char *str, size_t len, void *data)
{
    dynarray *array = data;
    char *token = calloc(len + 1, 1);
    memcpy(token, str, len);
    dynarray_add_tail(array, token);
}

int main(void)
{
    char str[] = "abracadabra";
    dynarray *array = dynarray_create(0);
    prefixes(str, strlen(str), add_to_dynarray, array);
    dynarray_for_each(array, (dynarray_forfn)puts);
    dynarray_for_each(array, free);
    dynarray_delete(array);
    return 0;
}

Suffixes

To find all suffixes of a string, walk a pointer from the beginning of the string to 1 before the end, passing the client the pointer and the length remaining to the end of the string.

void suffixes(const char *str, size_t len, substring_fn fun, void *data)
{
    unsigned int start;
    for (start = 0; start < len; start++) {
        fun(str + start, len - start, data);
    }
}
abracadabra
bracadabra
racadabra
acadabra
cadabra
adabra
dabra
abra
bra
ra
a

Substrings

A substring is simply a prefix of a suffix, or a suffix of a prefix, so you can find all substrings by finding all suffixes and then finding all of their prefixes, or vice-versa. We can use the function pointer interface above for this by wrapping the client function pointer and data in our own structure and passing that from prefixes to suffixes as the third argument as follows:

typedef struct {
    substring_fn fun;
    void *data;
} substrings_data;

void call_suffixes(const char *str, size_t len, void *data)
{
    substrings_data *sd = data;
    suffixes(str, len, sd->fun, sd->data);
}

void substrings(const char *str, size_t len, substring_fn fun, void *data)
{
    substrings_data sd = {fun, data};
    prefixes(str, len, call_suffixes, &sd);
}

This produces the following output:

a
ab
b
abr
br
r
abra
bra
ra
a
abrac
brac
rac
ac
c
abraca
braca
raca
aca
ca
a
abracad
bracad
racad
acad
cad
ad
d
abracada
bracada
racada
acada
cada
ada
da
a
abracadab
bracadab
racadab
acadab
cadab
adab
dab
ab
b
abracadabr
bracadabr
racadabr
acadabr
cadabr
adabr
dabr
abr
br
r
abracadabra
bracadabra
racadabra
acadabra
cadabra
adabra
dabra
abra
bra
ra
a

Unique substrings

The substrings function will produce all substrings, which can include duplicates, as you can see in the output above. You can use a set data structure like a hash table to find just the unique substrings. Again, we can make good use of the function pointer interface by wrapping the client function and the hashtable in a structure, and then passing that to hashtable‘s own iteration function hashtable_for_each2.

void add_to_hashtable(const char *str, size_t len, void *data)
{
    hashtable *table = data;
    char *buf = malloc(len + 1);
    strncpy(buf, str, len);
    buf[len] = '\0';
    buf = hashtable_add(table, buf);
    if (buf) {
        free(buf);
    }
}

void call_substring_fun(const char *substring, void *data)
{
    substrings_data *sd = data;
    sd->fun(substring, strlen(substring), data);
}

void unique_substrings(const char *str, size_t len, substring_fn fun, void *data)
{
    hashtable *table = hashtable_create(7, (hashtable_cmpfn)strcmp);
    substrings(str, len, add_to_hashtable, table);
    substrings_data sd = {fun, data};
    hashtable_for_each2(table, (hashtable_forfn2)call_substring_fun, &sd);
    hashtable_for_each(table, free);
    hashtable_delete(table);
}

The substrings produced are now unique:

abrac
abraca
abracad
acada
ad
adabr
b
brac
bracada
bracadab
cada
dabr
rac
acadabra
c
ca
abra
abracadabra
br
bracadabra
d
dabra
r
racada
racadabra
acad
adabra
bra
cadabra
da
ra
racad
racadabr
aca
acadabr
bracad
cad
ab
abr
abracada
abracadabr
adab
bracadabr
cadab
racadab
a
abracadab
ac
acadab
ada
braca
cadabr
dab
raca

Related