Prefixes
To find all prefixes of a string, just find all substrings starting at the beginning of the string of length from 1 to the length of the string. A good way of returning these to the client is to allow the client to pass in a function pointer, which is called for each prefix found:
typedef void(*substring_fn)(const char *, size_t, void *); void prefixes(const char *str, size_t len, substring_fn fun, void *data) { unsigned int length; for (length = 1; length <= len; length++) { fun(str, length, data); } }
The client function is given the beginning of the string and a length, rather than a nul
-terminated string. This means that the original string does not need to be modified or copied, which is much more efficient. The client can use the length to do something with the string, like printing it using printf
with the *
format specifier to specify the length:
#include <stdio.h> #include <string.h> #include <substring.h> void print(const char *str, size_t len, void *data) { printf("%.*s\n", (int)len, str); } int main(void) { char str[] = "abracadabra"; prefixes(str, strlen(str), print, NULL); return 0; }
This produces the following output:
a ab abr abra abrac abraca abracad abracada abracadab abracadabr abracadabra
The third argument to the client function is a void *
pointer that the client can use to pass anything it needs into the function. For example, it might pass in a dynamic array and add the prefixes to that:
#include <stdio.h> #include <string.h> #include <substring.h> #include <dynarray.h> void add_to_dynarray(const char *str, size_t len, void *data) { dynarray *array = data; char *token = calloc(len + 1, 1); memcpy(token, str, len); dynarray_add_tail(array, token); } int main(void) { char str[] = "abracadabra"; dynarray *array = dynarray_create(0); prefixes(str, strlen(str), add_to_dynarray, array); dynarray_for_each(array, (dynarray_forfn)puts); dynarray_for_each(array, free); dynarray_delete(array); return 0; }
Suffixes
To find all suffixes of a string, walk a pointer from the beginning of the string to 1 before the end, passing the client the pointer and the length remaining to the end of the string.
void suffixes(const char *str, size_t len, substring_fn fun, void *data) { unsigned int start; for (start = 0; start < len; start++) { fun(str + start, len - start, data); } }
abracadabra bracadabra racadabra acadabra cadabra adabra dabra abra bra ra a
Substrings
A substring is simply a prefix of a suffix, or a suffix of a prefix, so you can find all substrings by finding all suffixes and then finding all of their prefixes, or vice-versa. We can use the function pointer interface above for this by wrapping the client function pointer and data in our own structure and passing that from prefixes
to suffixes
as the third argument as follows:
typedef struct { substring_fn fun; void *data; } substrings_data; void call_suffixes(const char *str, size_t len, void *data) { substrings_data *sd = data; suffixes(str, len, sd->fun, sd->data); } void substrings(const char *str, size_t len, substring_fn fun, void *data) { substrings_data sd = {fun, data}; prefixes(str, len, call_suffixes, &sd); }
This produces the following output:
a ab b abr br r abra bra ra a abrac brac rac ac c abraca braca raca aca ca a abracad bracad racad acad cad ad d abracada bracada racada acada cada ada da a abracadab bracadab racadab acadab cadab adab dab ab b abracadabr bracadabr racadabr acadabr cadabr adabr dabr abr br r abracadabra bracadabra racadabra acadabra cadabra adabra dabra abra bra ra a
Unique substrings
The substrings
function will produce all substrings, which can include duplicates, as you can see in the output above. You can use a set data structure like a hash table to find just the unique substrings. Again, we can make good use of the function pointer interface by wrapping the client function and the hashtable in a structure, and then passing that to hashtable
‘s own iteration function hashtable_for_each2
.
void add_to_hashtable(const char *str, size_t len, void *data) { hashtable *table = data; char *buf = malloc(len + 1); strncpy(buf, str, len); buf[len] = '\0'; buf = hashtable_add(table, buf); if (buf) { free(buf); } } void call_substring_fun(const char *substring, void *data) { substrings_data *sd = data; sd->fun(substring, strlen(substring), data); } void unique_substrings(const char *str, size_t len, substring_fn fun, void *data) { hashtable *table = hashtable_create(7, (hashtable_cmpfn)strcmp); substrings(str, len, add_to_hashtable, table); substrings_data sd = {fun, data}; hashtable_for_each2(table, (hashtable_forfn2)call_substring_fun, &sd); hashtable_for_each(table, free); hashtable_delete(table); }
The substrings produced are now unique:
abrac abraca abracad acada ad adabr b brac bracada bracadab cada dabr rac acadabra c ca abra abracadabra br bracadabra d dabra r racada racadabra acad adabra bra cadabra da ra racad racadabr aca acadabr bracad cad ab abr abracada abracadabr adab bracadabr cadab racadab a abracadab ac acadab ada braca cadabr dab raca