Tag Archives: C++

The C++ Rule of Three

The “Rule of Three” states that if you need to define one of assignment operator, copy constructor, or destructor then you probably need to define all three.
This is because defining any one implies that your class has state to manage, and if it does this needs to be considered in the other two cases.

You can prevent the managed state from being copied by making the assignment operator and copy constructor private. This makes the class non-copyable.
This can also be implemented by inheriting the class from boost::noncopyable.

If the primary purpose of your class is not resource management, you can avoid implementing any of the Rule of Three methods by wrapping the resources in classes that are designed for resource management, i.e., smart pointers like boost::shared_ptr.

Operator overloading in C++

Operators to overload

  • Assignment operator
  • Bitwise shift operators
  • Function call operator
  • Comparison operators
  • Arithmetic operators
  • Array subscripting operators
  • Pointer operators

Assignment operator

You only need to implement the assignment operator if you need to implement the copy constructor and destructor – and vice-versa.
Use the copy-and-swap idiom to implement the assignment operator:

  1. Take other by value – this invokes the copy constructor
  2. Swap the contents of this with other – you need to implement swap
  3. other is destroyed at the end of the function, taking the old state with it
X& X::operator=(X other)
{
  swap(other);
  return *this;
}

Note that this does not do a self-assignment check, but self-assignment is guaranteed to be harmless.

Bitwise shift operators

These are most commonly overloaded to insert into and extract from iostreams, or some other sort of stream.
They need to be friend functions if they need to access private members, as they can’t be members themselves.

When used with iostreams, these functions need to take the stream by non-const reference (iostreams cannot be copied and will be modified) and return it by reference.

std::ostream& operator<<(std::ostream& os, const T& obj)
{
  // Write the object to the stream
  return os;
}

std::istream& operator>>(std::istream& is, T& obj)
{
  // Read the object from the stream
  return is;
}

Function call operator

Use the function all operator to make a functor.
This is always a member function.
It can be overloaded to take any number and type of arguments.
It can return any type, or void.
Functors are often copied, so the class must be copyable and inexpensive to copy.

Comparison operators

These should be implemented as non-member functions.
Missing operators are not automatically deduced from the others.
The standard library only uses < (strict weak ordering), but you should overload them all for consistency.

You only need to fully implement operator== and operator<, as the others can be implemented in terms of them. They also make your class model the EqualityComparable and LessThanComparable concepts.

bool operator ==(const X& lhs, const X& rhs)
{
    // Do actual comparison
}
bool operator < (const X& lhs, const X& rhs)
{
    // Do actual comparison
}
bool operator !=(const X& lhs, const X& rhs)
{
    return !operator==(lhs, rhs);
}
bool operator > (const X& lhs, const X& rhs)
{
    return operator< (rhs, lhs);
}
bool operator <=(const X& lhs, const X& rhs)
{
    return !operator> (lhs, rhs);
}
bool operator >=(const X& lhs, const X& rhs)
{
    return !operator< (lhs, rhs);
}

Arithmetic operators

Unary arithmetic operators

If you overload the prefix form, you should also overload the postfix form, and vice-versa.

class X
{
    X& operator++()
    {
        // Do actual increment
        return *this;
    }

The postfix form is distinguished by a dummy int argument.
You need to use a temporary to save the old value before incrementing it so that the old value can be returned.

    X operator++(int)
    {
       X temp(*this);
       operator++();
       return temp;
    }
};

Binary arithmetic operators

You should overload op and op=.
op= is a member while op is a non-member.
You can implement op in terms of op=.
op needs to take its lhs by copy so it can modify and return it without modifying the original.

class X
{
    X& operator+=(const X& rhs)
    {
        // Do actual addition of rhs to *this
        return *this;
    }
};

X operator+(X lhs, const X& rhs)
{
    lhs += rhs;
    return lhs;
}

Array subscripting operators

These must be members. They can only take 1 argument (although it can be of any type).
You should implement const and non-const versions to allow const overloading.

class X 
{
    value_type& operator[](size_type index);
    const value_type& operator[](size_type index) const;
};

Pointer operators

This is how you implement a smart pointer.
Overload the -> and * operators.
You should implement const and non-const versions to allow const overloading.

class SmartPtr 
{
    value_type& operator*();
    const value_type& operator*() const;
    value_type* operator->();
    const value_type* operator->() const;
};

Related

Default operator== in C++

Casting operators in C++

  • static_cast<>
  • const_cast<>
  • dynamic_cast<>
  • reinterpret_cast<>
  • C-style casts

static_cast<>
Is for widening numeric conversions (e.g., int to float), converting between pointer types and void*.
Can also be used to call explicit (or implicit) conversion functions.
Can cast around the hierarchy but it isn’t checked.

const_cast<>
Add or remove const to a variable.
Not so important because of mutable.
Can be used to call const-overloaded member functions.
Can also add or remove volatile

dynamic_cast<>
For casting around hierarchies.
Will return nullptr on failure for a pointer, or throw std::bad_cast for references.
Can cast sideways and up other hierarchies
Doesn’t work if you have a diamond and haven’t used virtual inheritance
Only works for public inheritance

reinterpret_cast<>
Will convert anything into anything.
Use to cast a pointer to or from an opaque type, or to cast a blob of memory to a POD type
Is reversible unless you cast to a type with less storage space

C-style casts
T(object) or (T)object
This is the same thing as a "function-style cast"
Perform the new casts in order:

  1. const_cast
  2. static_cast
  3. static_cast then const_cast
  4. reinterpret_cast
  5. reinterpret_cast then const_cast

Note that dynamic_cast is not in this list – you can cast around the hierarchy with C-style casts, but it isn’t checked

Differences Between pointers and references in C++

  1. A pointer can be declared uninitialized, but a reference must be declared referring to something
  2. A pointer can be reassigned, but a reference cannot be reassigned – assigning to it changes the referent
  3. Pointers can be NULL, but there is no equivalent for references – they always need to refer to a valid object
  4. You can take the address of a pointer, but you can’t take the address of a reference (you’ll get the address of the referent)
  5. You can make a pointer to a pointer, or a reference to a pointer, but you can’t make a reference to a reference
  6. You can do arithmetic with pointers, but you can’t do arithmetic with references (it would attempt to do arithmetic with the referent)
  7. You de-reference pointers, but you don’t de-reference references (they are automatically de-referenced)
  8. When you call a function that takes a pointer, you can always see at the call site that the argument is a pointer (because it’s declared as a pointer variable, or you take the address of it), but the creation of a reference by function call isn’t explicit at the call site

How to split a string in C++

Java has String.split(), Python has string.split(), Perl has split. There is no simple string-splitting method in C++, but there are plenty of ways of doing it. Here are some methods:

  1. Put it in a stringstream and extract the tokens
  2. Put it in a stringstream and use getline() with a delimiter
  3. Use string::find progressively
  4. Use string::find_first_of progressively with a number of delimiters
  5. Use boost::split()
  6. Use boost::split_iterator
  7. Use boost::tokenizer
  8. Use boost::sregex_token_iterator
  9. Use pystring::split
  10. Use my C split function

1. Put it in a stringstream and extract the tokens

#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

template <class Container>
void split1(const std::string& str, Container& cont)
{
    std::istringstream iss(str);
    std::copy(std::istream_iterator<std::string>(iss),
         std::istream_iterator<std::string>(),
         std::back_inserter(cont));
}

2. Put it in a stringstream and use getline() with a delimiter

#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

template <class Container>
void split2(const std::string& str, Container& cont, char delim = ' ')
{
    std::stringstream ss(str);
    std::string token;
    while (std::getline(ss, token, delim)) {
        cont.push_back(token);
    }
}

3. Use string::find progressively

#include <string>
#include <algorithm>
#include <iterator>

template <class Container>
void split3(const std::string& str, Container& cont,
              char delim = ' ')
{
    std::size_t current, previous = 0;
    current = str.find(delim);
    while (current != std::string::npos) {
        cont.push_back(str.substr(previous, current - previous));
        previous = current + 1;
        current = str.find(delim, previous);
    }
    cont.push_back(str.substr(previous, current - previous));
}

4. Use string::find_first_of progressively with a number of delimiters

#include <string>
#include <algorithm>
#include <iterator>

template <class Container>
void split4(const std::string& str, Container& cont,
              const std::string& delims = " ")
{
    std::size_t current, previous = 0;
    current = str.find_first_of(delims);
    while (current != std::string::npos) {
        cont.push_back(str.substr(previous, current - previous));
        previous = current + 1;
        current = str.find_first_of(delims, previous);
    }
    cont.push_back(str.substr(previous, current - previous));
}

5. Use boost::split()

#include <string>
#include <boost/algorithm/string.hpp>

template <class Container>
void split5(const std::string& str, Container& cont,
              const std::string& delims = " ")
{
    boost::split(cont, str, boost::is_any_of(delims));
}

Reference: Function template split

6. Use boost::split_iterator

#include <string>
#include <boost/algorithm/string.hpp>

template <class Container>
void split6(const std::string& str, Container& cont,
              char delim = ' ')
{
    typedef boost::split_iterator<std::string::const_iterator> spliterator;
    std::string sdelim(1, delim);
    for (spliterator it = boost::make_split_iterator(str, 
               boost::first_finder(sdelim, boost::is_equal()));
               it != spliterator(); ++it) {
        cont.push_back(boost::copy_range<std::string>(*it));
    }
}

Reference: Function template make_split_iterator

7. Use Use boost::tokenizer

#include <string>
#include <algorithm>
#include <boost/tokenizer.hpp>

template <class Container>
void split7(const std::string& str, Container& cont,
              const std::string& delims = " ")
{
    typedef boost::char_separator<char> separator;
    boost::tokenizer<separator> tokens(str, separator(delims.c_str()));
    std::copy(tokens.begin(), tokens.end(), std::back_inserter(cont)); 
}

Reference: Tokenizer Class

8. Use boost::sregex_token_iterator

#include <string>
#include <algorithm>
#include <boost/regex.hpp>

template <class Container>
void split8(const std::string& str, Container& cont,
              const std::string delim = "\\s+")
{
    boost::regex re(delim);
    std::copy(boost::sregex_token_iterator(str.begin(), str.end(), re, -1),
            boost::sregex_token_iterator(), 
            std::back_inserter(cont)); 
}

Reference: regex_token_iterator

9. Use pystring::split()

#include <pystring.h>

template <class Container>
void split9(const std::string& str, Container& cont,
              const std::string delim = " ")
{
    std::vector<std::string> vec;
    pystring::split(str, vec, delim);
    std::copy(vec.begin(), vec.end(), std::back_inserter(cont));
}

Reference: pystring/pystring.h

10. Use my C split function

template <class Container>
void add_to_container(const char *str, size_t len, void *data)
{
    Container *cont = static_cast<Container*>(data);
    cont->push_back(std::string(str, len));
}

template <class Container>
void split10(const std::string& str, Container& cont, char delim = ' ')
{
    split(str.c_str(), delim, static_cast<split_fn>(add_to_container<Container>), &cont);
}

Reference: Split a string in C

An example program

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <vector>

int main()
{
    char str[] = "The quick brown fox jumps over the lazy dog";
    std::vector<std::string> words;
    split1(str, words);
    std::copy(words.begin(), words.end(),
         std::ostream_iterator<std::string>(std::cout, "\n"));
}
The
quick
brown
fox
jumps
over
the
lazy
dog

Related

Sorting algorithms

These are some sorting algorithms I’ve written in C++.

Contents

  • Bubble sort
  • Cocktail sort
  • Tree sort
  • Selection sort
  • Min-max sort
  • Example program

Bubble Sort

In a bubble sort, a series of passes is made over the data, and in each pass, every pair of adjacent elements is compared. If they are in the wrong order, they are swapped. The sort completes when no changes have been made in a pass. In each pass the maximum element is “bubbled” to the end, as it will be swapped with every succeeding element, and consequently becoming a part of the next comparison. The effect of this is that a sorted sublist is built up at the right of the input list. The end of the sublist to be sorted is shortened by one element after each pass to avoid examining the sorted sublist.

template <class RandomAccessIterator>
void bubble_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    bool finished = false;
    RandomAccessIterator it, end = last - 1;
    while (!finished) {
        bool changed = false;
        for (it = first; it < end; ++it) {
            if (*it > *(it + 1)) {
                std::swap(*it, *(it + 1));
                changed = true;
            }
        }
        if (changed) {
            --end;
        }
        else {
            finished = true;
        }
    }
}

Cocktail Sort

By implementing a bubble sort bidirectionally, with the direction of each pass alternating, both the smallest and largest elements of the unsorted range are put in their proper place on each pass. This has the advantage of moving small elements at the end of the range to their proper place much more quickly. The effect is that a sorted sublist is built up at each end of the input list, and the sublist still to be sorted is correspondingly reduced by one element at each end after each pass.

template <class RandomAccessIterator>
void cocktail_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    bool finished = false;
    RandomAccessIterator it, begin = first, end = last - 1;
    while (!finished) {
        // Forward pass
        bool changed = false;
        for (it = begin; it < end; ++it) {
            if (*it > *(it + 1)) {
                std::swap(*it, *(it + 1));
                changed = true;
            }
        }
        if (changed) {
            --end;
        }
        else {
            finished = true;
        }
        if (!finished) {
            // Reverse pass
            changed = false;
            for (it = end; it > begin; --it) {
                if (*it < *(it - 1)) {
                    std::swap(*it, *(it - 1));
                    changed = true;
                }
            }
            if (changed) {
                ++begin;
            }
            else {
                finished = true;
            }
        }
    }
}

Tree Sort

A binary tree is a sorted data structure, and a list can be sorted by inserting its elements into a binary tree, and then retrieving them in order.

template <class RandomAccessIterator>
void tree_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    treenode<RandomAccessIterator>* tree = treenode<RandomAccessIterator>::build(first, last);
    tree->write(&first);
    tree->destroy();
}

This sort requires auxilliary storage, and perhaps in C++ it should have a different interface in order to allow an Allocator to be supplied by the client.

This is the process of building up the tree:

    static treenode<RandomAccessIterator>* build(RandomAccessIterator first, RandomAccessIterator last)
    {
        treenode<RandomAccessIterator>* root = NULL;
        for (RandomAccessIterator it = first; it < last; ++it) {
            treenode<RandomAccessIterator>* node = new treenode<RandomAccessIterator>(*it);
            if (root == NULL) {
                root = node;
            }
            else {
                treenode<RandomAccessIterator>* current = root,* previous = NULL;
                bool less;
                while (current != NULL) {
                    less = node->value_ < current->value_;
                    previous = current;
                    current = less ? current->left_ : current->right_;
                }
                if (less) {
                    previous->left_ = node;
                }
                else {
                    previous->right_ = node;
                }
            }
        }
        return root;
    }

Once the tree has been assembled, it is traversed in order using a recursive function and its contents are stored in the iterator. Note that the iterator is passed by address; this is how you ensure that a value is shared across all stack frames in a recursive algorithm.

    void write(RandomAccessIterator* it) const
    {
        if (left_) {
            left_->write(it);
        }
        **it = value_;
        ++(*it);
        if (right_) {
            right_->write(it);
        }
    }

After the sorted data has been stored back to the iterator, the tree is recursively destroyed. This needs to be performed in postorder so that the subtrees are destroyed before the parent node.

    void destroy()
    {
        if (left_) {
            left_->destroy();
        }
        if (right_) {
            right_->destroy();
        }
        delete this;
    }

Selection Sort

In a selection sort the data is searched for the minimum (or maximum) element, and this is swapped with the element in its place at the end. As with a bubble sort, the end of the range to be sorted is adjusted after each pass as a sorted sublist is built up at the end. The construction of the sorted range is identical to that in a bubble sort, but it is performed much more efficiently because there are fewer swaps.

template <class RandomAccessIterator>
void selection_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    RandomAccessIterator begin = first;
    while (begin < last) {
        RandomAccessIterator it, min;
        bool started = false;
        for (it = begin; it < last; ++it) {
            if (!started || *it < *min) {
                min = it;
            }
            started = true;
        }
        std::swap(*min, *begin);
        ++begin;
    }
}

Min-Max Sort

One can perform a selection sort and simultaneously search for the minimum and maximum element and move them to their proper places in each pass. This has the effect of building up a sorted sublist at each end, with the unsorted part becoming smaller from each ends, as in a cocktail sort. Again, the construction of the sorted ranges is performed much more efficiently in this sort than a cocktail sort because there are fewer swaps.

template <class RandomAccessIterator>
void minmax_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    RandomAccessIterator begin = first;
    RandomAccessIterator end = last - 1;
    while (begin < end) {
        RandomAccessIterator it, min, max;
        bool started = false;
        for (it = begin; it <= end; ++it) {
            if (!started) {
                min = it;
                max = it;
                started = true;
            }
            else {
                if (*it < *min) {
                    min = it;
                }
                else if (*it > *max) {
                    max = it;
                }
            }
        }
        if (max == begin && min == end) {
            // Swap min and max only
            std::swap(*max, *min);
        }
        else if (max == begin) {
            // Swap max and end first
            std::swap(*max, *end);
            std::swap(*min, *begin);
        }
        else {
            // Swap min and begin first
            std::swap(*min, *begin);
            std::swap(*max, *end);
        }
        ++begin;
        --end;
    }
}

Example Program

Below is an example program to exercise all of the sorting algorithms:

#include <vector>
#include <iostream>
#include <functional>
#include <iterator>

#include "bubble_sort.h"
#include "cocktail_sort.h"
#include "tree_sort.h"
#include "selection_sort.h"
#include "minmax_sort.h"

namespace
{

template <class ForwardIterator>
bool is_sorted(ForwardIterator first, ForwardIterator last)
{
    typedef typename std::iterator_traits<ForwardIterator>::value_type T;
    return std::adjacent_find(first, last, std::greater<T>()) == last;
}

};

int main()
{
    typedef std::vector<int>::iterator I;
    typedef void (*sortfn)(I, I);
    sortfn sorts[] = { bubble_sort<I>, cocktail_sort<I>, tree_sort<I>, selection_sort<I>, minmax_sort<I> };
    const size_t numsorts = sizeof(sorts) / sizeof(sortfn);
    std::vector<int> numbers;
    for (unsigned int i = 0; i < 100; i++) {
        numbers.push_back(rand());
    }
    std::copy(numbers.begin(), numbers.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << "\n";
    for (unsigned int s = 0; s < numsorts; s++) {
        std::vector<int> vec(numbers.begin(), numbers.end());
        sorts[s](vec.begin(), vec.end());
        std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, " "));
        std::cout << is_sorted(vec.begin(), vec.end()) << "\n";
    }

    return 0;
}

Consistent hash ring

Consistent hashing was first described in a paper, Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web (1997) by David Karger et al. It is used in distributed storage systems like Amazon Dynamo and memcached.

Consistent hashing is a very simple solution to a common problem: how can you find a server in a distributed system to store or retrieve a value identified by a key, while at the same time being able to cope with server failures and network partitions?

Simply finding a server for value is easy; just number your set of s servers from 0 to s – 1. When you want to store or retrieve a value, hash the value’s key modulo s, and that gives you the server.

The problem comes when servers fail or become unreachable through a network partition. At that point, the servers no longer fill the hash space, so the only option is to invalidate the caches on all servers, renumber them, and start again. Given that, in a system with hundreds or thousands of servers, failures are commonplace, this solution is not feasible.
The solution

In consistent hashing, the servers, as well as the keys, are hashed, and it is by this hash that they are looked up. The hash space is large, and is treated as if it wraps around to form a circle – hence hash ring. The process of creating a hash for each server is equivalent to placing it at a point on the circumference of this circle. When a key needs to be looked up, it is hashed, which again corresponds to a point on the circle. In order to find its server, one then simply moves round the circle clockwise from this point until the next server is found. If no server is found from that point to end of the hash space, the first server is used – this is the “wrapping round” that makes the hash space circular.

The only remaining problem is that in practice hashing algorithms are likely to result in clusters of servers on the ring (or, to be more precise, some servers with a disproportionately large space before them), and this will result in greater load on the first server in the cluster and less on the remainder. This can be ameliorated by adding each server to the ring a number of times in different places. This is achieved by having a replica count, which applies to all servers in the ring, and when adding a server, looping from 0 to the count – 1, and hashing a string made from both the server and the loop variable to produce the position. This has the effect of distributing the servers more evenly over the ring. Note that this has nothing to do with server replication; each of the replicas represents the same physical server, and replication of data between servers is an entirely unrelated issue.

I’ve written an example implementation of consistent hashing in C++. As you can imagine from the description above, it isn’t terribly complicated. Here is the main class:

template <class Node, class Data, class Hash = HASH_NAMESPACE::hash<const char*> >
class HashRing
{
public:
    typedef std::map<size_t, Node> NodeMap;

    HashRing(unsigned int replicas)
        : replicas_(replicas), hash_(HASH_NAMESPACE::hash<const char*>())
    {
    }

    HashRing(unsigned int replicas, const Hash& hash)
        : replicas_(replicas), hash_(hash)
    {
    }

    size_t AddNode(const Node& node);
    void RemoveNode(const Node& node);
    const Node& GetNode(const Data& data) const;

private:
    NodeMap ring_;
    const unsigned int replicas_;
    Hash hash_;
};

template <class Node, class Data, class Hash>
size_t HashRing<Node, Data, Hash>::AddNode(const Node& node)
{
    size_t hash;
    std::string nodestr = Stringify(node);
    for (unsigned int r = 0; r < replicas_; r++) {
        hash = hash_((nodestr + Stringify(r)).c_str());
        ring_[hash] = node;
    }
    return hash;
}

template <class Node, class Data, class Hash>
void HashRing<Node, Data, Hash>::RemoveNode(const Node& node)
{
    std::string nodestr = Stringify(node);
    for (unsigned int r = 0; r < replicas_; r++) {
        size_t hash = hash_((nodestr + Stringify(r)).c_str());
        ring_.erase(hash);
    }
}

template <class Node, class Data, class Hash>
const Node& HashRing<Node, Data, Hash>::GetNode(const Data& data) const
{
    if (ring_.empty()) {
        throw EmptyRingException();
    }
    size_t hash = hash_(Stringify(data).c_str());
    typename NodeMap::const_iterator it;
    // Look for the first node >= hash
    it = ring_.lower_bound(hash);
    if (it == ring_.end()) {
        // Wrapped around; get the first node
        it = ring_.begin();
    }
    return it->second;
}

A few points to note:

  • The default hash function is hash from <map>.
    In practice you probably don’t want to use this. Something like MD5 would probably be best.
  • I had to define HASH_NAMESPACE because g++ puts the non-standard hash in a different namespace than that which other compilers do.
    Hopefully this will all be resolved with the widespread availablity of std::unordered_map.
  • The Node and Data types need to have operator << defined for a std::ostream.
    This is because I write them to an ostringstream in order to "stringify" them before getting the hash.