Tag Archives: BFS

Generating sentences from a grammar in C++

Although phrase structure grammars are used in the development of parsers, they really describe how to generate sentences in a language. If a grammar is context-free, there is a simple algorithm to generate sentences from it. The algorithm uses a queue to perform a breadth-first search as follows:

  1. Put the start symbol in the queue
  2. While the queue is not empty and more sentences are wanted:
    1. Pop the first sentential form from the queue
    2. Find the first non-terminal in it
    3. For every right-hand side of that non-terminal
      1. Make a copy of the sentential form
      2. Substitute the right-hand side for the non-terminal
      3. Put this new sentential form into the queue
    4. If there are no non-terminals left in the sentential form, it is a complete sentence, so return it

Note that the queue, and consequent breadth-first search, are required because if the rule’s right-hand side is left-recursive, the algorithm would go into an infinite recursion without producing anything if it performed a depth-first search.

Here is the code in C++, using the class I described earlier in Context-free grammar in C++. The generator class is instantiated with a grammar, and its generate() method will write n sentences to the supplied iterator, if n sentences can be generated.

#ifndef GENERATOR_HPP
#define GENERATOR_HPP

#include <vector>
#include <list>
#include <queue>
#include <algorithm>

#include <grammar.hpp>

namespace MB
{

class generator
{
public:
    generator(const grammar& grammar)
        : grammar_(grammar)
    {
        // Store the left-hand sides
        std::transform(grammar_.rules().begin(), grammar_.rules().end(),
                std::back_inserter(lefts_),
                [](rule::const_ptr r)->const std::string& { return r->left(); }); 
    }
    template <class InputIt>
    InputIt generate(InputIt init, unsigned int n) const
    {
        std::queue<sentential_form> queue;
        unsigned int sentences = 0;
        // Add the start symbol to the queue
        queue.push(sentential_form(1, grammar_.get_start_left()));
        while (sentences < n && queue.size()) {
            sentential_form sf = queue.front();
            queue.pop();
            // Find the first non-terminal
            sentential_form::iterator it = std::find_first_of(sf.begin(), sf.end(), 
                    lefts_.begin(), lefts_.end());
            if (it != sf.end()) {
                // Replace with each right-hand side alternative
                std::string left = *it;
                std::vector<rule::const_ptr> rules;
                grammar_.get_rules_for_left(left, std::back_inserter(rules));
                for (rule::const_ptr rule : rules) {
                    for (const std::vector<std::string>& alternative : rule->right()) {
                        sentential_form new_sf(sf);
                        it = std::find(new_sf.begin(), new_sf.end(), left);
                        new_sf.insert(it, alternative.begin(), alternative.end());
                        new_sf.erase(it);
                        queue.push(new_sf);
                    }
                }
            }
            else {
                // Finished sentence
                *init++ = std::vector<std::string>(sf.begin(), sf.end());
                ++sentences;
            }
        }
        return init;
    }
private:
    typedef std::list<std::string> sentential_form;
    const grammar& grammar_;
    std::vector<std::string> lefts_;
};

} // namespace MB

#endif // GENERATOR_HPP

Here is an example grammar:

S -> NP VP
NP -> NP PP
NP -> DET N
VP -> V NP
VP -> V
PP -> P NP
DET -> a | the
N -> man | dog | house | telescope | spoon
V -> saw | fed | barked
P -> with

And an example program that generates the first 50 sentences:

#include <fstream>
#include <iostream>
#include <vector>
#include <algorithm>

#include <grammar.hpp>
#include <generator.hpp>

int main()
{
    std::string filename = "generator.dat";
    std::ifstream ifs(filename);
    if (ifs) {
        MB::grammar grammar(ifs);
        MB::generator gen(grammar);
        std::vector<std::vector<std::string> > sentences;
        gen.generate(std::back_inserter(sentences), 50);
        for (auto sentence : sentences) {
            std::copy(sentence.begin(), sentence.end(),
                    std::ostream_iterator<const std::string&>(std::cout, " "));
            std::cout << '\n';
        }
    }
    else {
        std::cerr << "Couldn't open " << filename << " for reading\n";
    }
}

This is the somewhat surreal output:

a man saw
a man fed
a man barked
a dog saw
a dog fed
a dog barked
a house saw
a house fed
a house barked
a telescope saw
a telescope fed
a telescope barked
a spoon saw
a spoon fed
a spoon barked
the man saw
the man fed
the man barked
the dog saw
the dog fed
the dog barked
the house saw
the house fed
the house barked
the telescope saw
the telescope fed
the telescope barked
the spoon saw
the spoon fed
the spoon barked
a man saw a man
a man saw a dog
a man saw a house
a man saw a telescope
a man saw a spoon
a man saw the man
a man saw the dog
a man saw the house
a man saw the telescope
a man saw the spoon
a man fed a man
a man fed a dog
a man fed a house
a man fed a telescope
a man fed a spoon
a man fed the man
a man fed the dog
a man fed the house
a man fed the telescope
a man fed the spoon

Breadth-First Search of a graph in C

A graph

A breadth-first search (BFS) of a graph visits every neighbour of the current vertex before visiting the neighbours’ neighbours. This means that a breadth-first search extends radially from the starting vertex.

The algorithm is as follows:

  1. Create an empty set for the visited vertices
  2. Create a queue
  3. Put the starting vertex in the visited set
  4. Put the starting vertex in the queue
  5. While the queue is not empty:
    1. Remove a vertex from the queue
    2. Perform any visit actions on it
    3. For each of the vertex’s neighbours:
      • If the neighbour is not in the visited set:
        1. Put the neighbour in the visited set
        2. Add the neighbour to the queue

Here is an implementation in C using a cirque for the queue. The function breadth_first_search() takes a graph in edge list form, the number of edges (size), the number of vertices (order), and a callback function that is called for each vertex visited.

#include <stdlib.h>

#include <cirque.h>

typedef struct {
    unsigned int first;
    unsigned int second;
} edge;

typedef void (*searchfn)(unsigned int);

void breadth_first_search(const edge *edges, unsigned int size, unsigned int order, searchfn fun)
{
    unsigned int start = 0;
    unsigned int *visited = calloc(order, sizeof(unsigned int));
    cirque *queue = cirque_create();
    if (!(visited && queue)) {
        free(visited);
        cirque_delete(queue);
        return;
    }
    visited[start] = 1;
    cirque_insert(queue, &start);
    while (cirque_get_count(queue)) {
        unsigned int e;
        unsigned int *current = cirque_remove(queue);
        fun(*current);
        for (e = 0; e < size; e++) {
            if (edges[e].first == *current || edges[e].second == *current) {
                const unsigned int *neighbour = edges[e].first == *current ?
                    &edges[e].second : &edges[e].first;
                if (!visited[*neighbour]) {
                    visited[*neighbour] = 1;
                    cirque_insert(queue, (void*)neighbour);
                }
            }
        }
    }
    free(visited);
    cirque_delete(queue);
}

Here is an example program that constructs the graph shown at the top and then performs a BFS on it, printing the numbers of the vertices as they are visited.

#include <stdio.h>
#include <stdlib.h>



/* Connect two edges */
void edge_connect(edge *edges, unsigned int first, unsigned int second, 
        unsigned int *pos)
{
    edges[*pos].first = first;
    edges[*pos].second = second;
    (*pos)++;
}

void print(unsigned int vertex)
{
    printf("Visiting vertex %u\n", vertex);
}

int main(void)
{
    const unsigned int size = 12; /* Edges */
    const unsigned int order = 13; /* Vertices */
    edge *edges = malloc(size * sizeof(edge));
    unsigned int i = 0;
    edge_connect(edges, 0, 1, &i);
    edge_connect(edges, 0, 2, &i);
    edge_connect(edges, 0, 3, &i);
    edge_connect(edges, 0, 4, &i);
    edge_connect(edges, 1, 5, &i);
    edge_connect(edges, 1, 6, &i);
    edge_connect(edges, 2, 7, &i);
    edge_connect(edges, 2, 8, &i);
    edge_connect(edges, 3, 9, &i);
    edge_connect(edges, 3, 10, &i);
    edge_connect(edges, 4, 11, &i);
    edge_connect(edges, 4, 12, &i);

    breadth_first_search(edges, size, order, print);

    free(edges);
    return 0;
}

The output:

Visiting vertex 0
Visiting vertex 1
Visiting vertex 2
Visiting vertex 3
Visiting vertex 4
Visiting vertex 5
Visiting vertex 6
Visiting vertex 7
Visiting vertex 8
Visiting vertex 9
Visiting vertex 10
Visiting vertex 11
Visiting vertex 12

Dijkstra’s Shortest Paths algorithm in C

A weighted graph

Dijksta’s algorithm finds the shortest path in a weighted graph from a single vertex to one or all other vertices. It is a greedy breadth-first search (BFS) algorithm which works as follows:

  1. Create a table of distances to all vertices
  2. Set the distance to 0 for the starting vertex, and infinity for the others
  3. Make a list of unvisited vertices, which starts off containing all vertices
  4. Set the current vertex to be the starting vertex
  5. While the unvisited list is not empty:
    1. For each neighbour of the current vertex:
      • If the distance to the current vertex + the distance to the neighbour < that the stored distance to the neighbour:
        • Update the stored distance to the neighbour to be the distance to the current vertex + the distance to the neighbour
    2. Remove the current vertex from the unvisited list
    3. Find the nearest unvisited vertex (i.e., the one with the lowest distance in the distances table)
    4. Make it the new current vertex

Below is an implemention in C. The function dijkstra(), takes an array of edges, the number of edges (size) and the number of vertices (order), and returns an array containing the distance to each vertex. For the unvisited list I used an array of integers which I set to 0 or 1 depending on whether the corresponding vertex was unvisited or visited respectively.

#include <stdlib.h>

typedef struct {
    unsigned int first;
    unsigned int second;
    unsigned int weight;
} weighted_edge;

unsigned int *dijkstra(const weighted_edge *edges, unsigned int size, unsigned int order,
        unsigned int vertex)
{
    unsigned int i;
    unsigned int *distances = malloc(order * sizeof(unsigned int));
    unsigned int *unvisited = malloc(order * sizeof(unsigned int));
    unsigned int unvisited_count = order;
    unsigned int current = vertex;
    if (distances == NULL || unvisited == NULL) {
        free(distances);
        free(unvisited);
        return NULL;
    }
    /* All distances start infinite and all vertices start unvisited */
    for (i = 0; i < order; i++) {
        distances[i] = UINT_MAX;
        unvisited[i] = 1;
    }
    /* Distance to starting vertex is 0 */
    distances[vertex] = 0;
    while (unvisited_count > 0) {
        /* Update the distances to all neighbours */
        unsigned int e, v;
        unsigned int min_distance;
        for (e = 0; e < size; e++) {
            if (edges[e].first == current || edges[e].second == current) {
                const unsigned int neighbour = edges[e].first == current ?
                    edges[e].second : edges[e].first;
                const unsigned int distance = distances[current] + edges[e].weight;
                if (distance < distances[neighbour]) {
                    distances[neighbour] = distance;
                }
            }
        }
        /* Finished with this vertex */
        unvisited[current] = 0;
        unvisited_count--;
        /* Find the nearest unvisited vertex */
        min_distance = 0;
        for (v = 0; v < order; v++) {
            if (unvisited[v] == 1 && (min_distance == 0 || distances[v] < min_distance)) {
                min_distance = distances[v];
                current = v;
            }
        }
    }
    free(unvisited);
    return distances;
}

Here is an example program that finds all of the distances from vertex 0 in the graph pictured above:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    const unsigned int size = 9; /* Edges */
    const unsigned int order = 6; /* Vertices */
    weighted_edge *edges = malloc(size * sizeof(weighted_edge));
    unsigned int *distances;
    unsigned int i = 0;

    weighted_edge_connect(edges, 0, 1, 2, &i);
    weighted_edge_connect(edges, 0, 2, 4, &i);
    weighted_edge_connect(edges, 1, 2, 1, &i);
    weighted_edge_connect(edges, 1, 3, 4, &i);
    weighted_edge_connect(edges, 1, 4, 2, &i);
    weighted_edge_connect(edges, 2, 4, 3, &i);
    weighted_edge_connect(edges, 3, 4, 3, &i);
    weighted_edge_connect(edges, 3, 5, 2, &i);
    weighted_edge_connect(edges, 4, 5, 2, &i);

    distances = dijkstra(edges, size, order, 0);

    for (i = 0; i < order; i++) {
        printf("%u: %u\n", i, distances[i]);
    }

    free(distances);
    free(edges);

    return 0;
}

The output:

0: 0
1: 2
2: 3
3: 6
4: 4
5: 6

Related

Connected components of a graph in C

Introduction

A connected component of a graph is a maximal subgraph in which the vertices are all connected, and there are no connections between the subgraph and the rest of the graph. A connected graph has only one connected component, which is the graph itself, while unconnected graphs have more than one component. For example, the graph shown below has three components, (0, 1, 2, 3), (4, 5, 6), and (7, 8).
Graph with three connected components
The connected components of a graph can be found using either a depth-first search (DFS), or a breadth-first search (BFS). We start at an arbitrary vertex, and visit every vertex adjacent to it recursively, adding them to the first component. Once this search has finished, we have visited all of the vertices in the first connected component, so we choose another unvisited vertex (if any) and perform the same search starting from it, adding the vertices we find to the second component. This process continues until all vertices have been visited, at which point we know the number of connected components in the graph, and which vertices they contain.

These are implementations of both connected components algorithms in C. An array is used to store the number of the connected component for each vertex, starting with component 0. The array elements are initialised to -1 so the array is also used to determine which vertices have not yet been visited, as their component number will still be -1.
The graph edges are represented as a struct as follows:

typedef struct {
    unsigned int first;
    unsigned int second;
} edge;

Depth-first search

#include <stdlib.h>

void connected_components_recursive(const edge *edges, unsigned int n, 
        int *components, unsigned int order, unsigned int vertex,
        unsigned int component)
{
    unsigned int i;
    /* Put this vertex in the current component */
    components[vertex] = component; 
    for (i = 0; i < n; i++) {
        if (edges[i].first == vertex || edges[i].second == vertex) {
            /* Adjacent */
            const unsigned int neighbour = edges[i].first == vertex ?
                    edges[i].second : edges[i].first;
            if (components[neighbour] == -1) {
                /* Not yet visited */
                connected_components_recursive(edges, n, components, order, neighbour, component);
            }
        }
    }
}

unsigned int connected_components(const edge *edges, unsigned int n, unsigned int order, 
        int **components)
{
    unsigned int i;
    unsigned int component = 0;
    *components = malloc(order * sizeof(int));
    if (components == NULL) {
        return 0;
    }
    for (i = 0; i < order; i++) {
        (*components)[i] = -1;
    }
    
    for (i = 0; i < order; i++) {
        if ((*components)[i] == -1) {
            connected_components_recursive(edges, n, *components, order, i, component);
            component++;
        }
    }
    return component;
}

Breadth-first search

#include <stdlib.h>

#include <cirque.h>

static void connected_components_internal(const edge *edges, unsigned int n,
        int *components, unsigned int order, unsigned int vertex,
        unsigned int component)
{
    unsigned int i;
    /* Put this vertex in the current component */
    components[vertex] = component;
    cirque *queue = cirque_create();
    if (!queue) {
        cirque_delete(queue);
        return;
    }
    cirque_insert(queue, &vertex);
    while (cirque_get_count(queue)) {
        unsigned int e;
        unsigned int *current = cirque_remove(queue);
        for (e = 0; e < n; e++) {
            if (edges[e].first == *current || edges[e].second == *current) {
                const unsigned int *neighbour = edges[e].first == *current ?
                    &edges[e].second : &edges[e].first;
                if (components[*neighbour] == -1) {
                    components[*neighbour] = component;
                    cirque_insert(queue, (void*)neighbour);
                }
            }
        }
    }
    cirque_delete(queue);
}

unsigned int connected_components(const edge *edges, unsigned int n, unsigned int order,
   int **components)
{
    unsigned int i;
    unsigned int component = 0;
    *components = malloc(order * sizeof(int));
    if (components == NULL) {
        return 0;
    }
    for (i = 0; i < order; i++) {
        (*components)[i] = -1;
    }

    for (i = 0; i < order; i++) {
        if ((*components)[i] == -1) {
            connected_components_internal(edges, n, *components, order, i, component);
            component++;
        }
    }
    return component;
}

Example program

Here is an example program that constructs the graph shown above and then finds its connected components:

#include <stdio.h>
#include <stdlib.h>

static void print_components(int *components, unsigned int order)
{
    unsigned int i;
    for (i = 0; i < order; i++) {
        printf("Vertex %u is in component %d\n", i, components[i]);
    }
}

int main(void)
{
    const unsigned int order = 9; /* Vertices */
    const unsigned int n = 8; /* Edges */
    edge *edges;
    int *components;
    unsigned int c;
   
    edges = malloc(n * sizeof(edge));
    if (edges == NULL) {
        return 1;
    }

    /* Square */
    edges[0].first = 0;
    edges[0].second = 1;
    edges[1].first = 1;
    edges[1].second = 2;
    edges[2].first = 2;
    edges[2].second = 3;
    edges[3].first = 3;
    edges[3].second = 0;

    /* Triangle */
    edges[4].first = 4;
    edges[4].second = 5;
    edges[5].first = 5;
    edges[5].second = 6;
    edges[6].first = 6;
    edges[6].second = 4;

    /* Line */
    edges[7].first = 7;
    edges[7].second = 8;

    c = connected_components(edges, n, order, &components);
    if (components == NULL) {
        free(edges);
        return 1;
    }
    printf("There are %u components:\n", c);
    print_components(components, order);
    free(edges);
    free(components);

    return 0;
}

The output:

There are 3 components:
Vertex 0 is in component 0
Vertex 1 is in component 0
Vertex 2 is in component 0
Vertex 3 is in component 0
Vertex 4 is in component 1
Vertex 5 is in component 1
Vertex 6 is in component 1
Vertex 7 is in component 2
Vertex 8 is in component 2

The connected components of a graph can also be represented as sets of edges, rather than vertices. This is called the spanning forest of a graph.