Generating sentences from a grammar in C++

Although phrase structure grammars are used in the development of parsers, they really describe how to generate sentences in a language. If a grammar is context-free, there is a simple algorithm to generate sentences from it. The algorithm uses a queue to perform a breadth-first search as follows:

  1. Put the start symbol in the queue
  2. While the queue is not empty and more sentences are wanted:
    1. Pop the first sentential form from the queue
    2. Find the first non-terminal in it
    3. For every right-hand side of that non-terminal
      1. Make a copy of the sentential form
      2. Substitute the right-hand side for the non-terminal
      3. Put this new sentential form into the queue
    4. If there are no non-terminals left in the sentential form, it is a complete sentence, so return it

Note that the queue, and consequent breadth-first search, are required because if the rule’s right-hand side is left-recursive, the algorithm would go into an infinite recursion without producing anything if it performed a depth-first search.

Here is the code in C++, using the class I described earlier in Context-free grammar in C++. The generator class is instantiated with a grammar, and its generate() method will write n sentences to the supplied iterator, if n sentences can be generated.

#ifndef GENERATOR_HPP
#define GENERATOR_HPP

#include <vector>
#include <list>
#include <queue>
#include <algorithm>

#include <grammar.hpp>

namespace MB
{

class generator
{
public:
    generator(const grammar& grammar)
        : grammar_(grammar)
    {
        // Store the left-hand sides
        std::transform(grammar_.rules().begin(), grammar_.rules().end(),
                std::back_inserter(lefts_),
                [](rule::const_ptr r)->const std::string& { return r->left(); }); 
    }
    template <class InputIt>
    InputIt generate(InputIt init, unsigned int n) const
    {
        std::queue<sentential_form> queue;
        unsigned int sentences = 0;
        // Add the start symbol to the queue
        queue.push(sentential_form(1, grammar_.get_start_left()));
        while (sentences < n && queue.size()) {
            sentential_form sf = queue.front();
            queue.pop();
            // Find the first non-terminal
            sentential_form::iterator it = std::find_first_of(sf.begin(), sf.end(), 
                    lefts_.begin(), lefts_.end());
            if (it != sf.end()) {
                // Replace with each right-hand side alternative
                std::string left = *it;
                std::vector<rule::const_ptr> rules;
                grammar_.get_rules_for_left(left, std::back_inserter(rules));
                for (rule::const_ptr rule : rules) {
                    for (const std::vector<std::string>& alternative : rule->right()) {
                        sentential_form new_sf(sf);
                        it = std::find(new_sf.begin(), new_sf.end(), left);
                        new_sf.insert(it, alternative.begin(), alternative.end());
                        new_sf.erase(it);
                        queue.push(new_sf);
                    }
                }
            }
            else {
                // Finished sentence
                *init++ = std::vector<std::string>(sf.begin(), sf.end());
                ++sentences;
            }
        }
        return init;
    }
private:
    typedef std::list<std::string> sentential_form;
    const grammar& grammar_;
    std::vector<std::string> lefts_;
};

} // namespace MB

#endif // GENERATOR_HPP

Here is an example grammar:

S -> NP VP
NP -> NP PP
NP -> DET N
VP -> V NP
VP -> V
PP -> P NP
DET -> a | the
N -> man | dog | house | telescope | spoon
V -> saw | fed | barked
P -> with

And an example program that generates the first 50 sentences:

#include <fstream>
#include <iostream>
#include <vector>
#include <algorithm>

#include <grammar.hpp>
#include <generator.hpp>

int main()
{
    std::string filename = "generator.dat";
    std::ifstream ifs(filename);
    if (ifs) {
        MB::grammar grammar(ifs);
        MB::generator gen(grammar);
        std::vector<std::vector<std::string> > sentences;
        gen.generate(std::back_inserter(sentences), 50);
        for (auto sentence : sentences) {
            std::copy(sentence.begin(), sentence.end(),
                    std::ostream_iterator<const std::string&>(std::cout, " "));
            std::cout << '\n';
        }
    }
    else {
        std::cerr << "Couldn't open " << filename << " for reading\n";
    }
}

This is the somewhat surreal output:

a man saw
a man fed
a man barked
a dog saw
a dog fed
a dog barked
a house saw
a house fed
a house barked
a telescope saw
a telescope fed
a telescope barked
a spoon saw
a spoon fed
a spoon barked
the man saw
the man fed
the man barked
the dog saw
the dog fed
the dog barked
the house saw
the house fed
the house barked
the telescope saw
the telescope fed
the telescope barked
the spoon saw
the spoon fed
the spoon barked
a man saw a man
a man saw a dog
a man saw a house
a man saw a telescope
a man saw a spoon
a man saw the man
a man saw the dog
a man saw the house
a man saw the telescope
a man saw the spoon
a man fed a man
a man fed a dog
a man fed a house
a man fed a telescope
a man fed a spoon
a man fed the man
a man fed the dog
a man fed the house
a man fed the telescope
a man fed the spoon