Of internal and external iterators, and generators

Internal iterators

• Are easy to write because you can use loops and recursion
• Can’t be controlled by the client (except for stopping early)
• Don’t suffer from invalidation because only one can be in operation at once (unless the container is being accessed by multiple threads)
• Don’t need to hold on to precious resources like files and network and database connections for an unpredictable length of time
• Can’t run in parallel, or be interleaved
• Can only be nested by passing callbacks to callbacks

External iterators

• Are more difficult to write because you need to maintain state and can’t use loops and recursion
• Are controlled by the client
• Can be run in parallel or interleaved
• May be invalidated – you need to specify invalidation behaviour or make them robust
• May need to hold on to precious resources like files and network and database connections for an unpredictable length of time
• – although you can implement timeouts and lazy connects

Generators

Generators give you the best of both worlds, in the sense that they are as easy to write as internal iterators, but are exposed as external iterators.
Because they are external iterators, they can be invalidated, and can hold on to resources for an unpredictable length of time.

Why STL iterator ranges go [begin, end)

At first sight, it seems strange that STL iterators go [begin, end), including begin but not end in the range, but when you think about it more you realise that this is exactly the way conventional loops work.

STL iteration loop

A typical STL iteration loop goes:

for (it = begin; it != end; ++it) {
// do something with *it
}

Note that:

• The iterator end is not part of the range processed
• After the end of the loop it is equal to end, i.e., one past the end of the range
• The size of range traversed is endbegin

Now consider the following loops:

Loop over an array with an index

int numbers[] = {1, 2, 3, 4, 5, 6};
const size_t n = sizeof(numbers) / sizeof(int);
int i;
for (int i = 0; i < n; ++i) {
std::cout << numbers[i] << "\n";
}
std::cout << i << "\n";

Here, our range of indices processed is [0, n).

• numbers[n] is not part of the range processed
• After the end of the loop, i is n, i.e., one past the end of the range
• The size of range traversed is n - 0 = n , i.e., the end – the beginning

Loop over a string

const char str[] = "The quick brown fox jumps over the lazy dog";
for (i = 0; str[i] != '\0'; ++i) {
std::cout << str[i] << "\n";
}
for (const char* c = str; c != '\0'; ++c) {
std::cout << *str << "\n";
}

A string is defined by [str, '\0'), or [*str, '\0')

• The '\0' character is not part of the string
• After the end of the loop, the pointer or index points to the '\0'.
• The size of range traversed is (position of the '\0') – 0 = strlen(str), i.e., the end – the beginning

Loop over an array of pointers with sentinel

const char* strings[] = {"apples", "pears", "bananas", "cherries", NULL};
for (i = 0; strings[i] != NULL; ++i) {
std::cout << strings[i] << "\n";
}
for (const char** pc = strings; *pc != NULL ; ++pc) {
std::cout << *pc << "\n";
}

Similarly, here the range is [strings, NULL), or [*strings, NULL)

• The NULL is not considered part of the range processed
• After the end of the loop, the pointer or index points to the NULL
• The size of range traversed is (position of the NULL) – 0 = the position of the NULL, i.e., the end – the beginning

Hopefully, this has demonstrated that STL iterators have exactly the same semantics as ordinary loop variables.

I’ll leave the last word to E. W. Dijkstra, with his explanation of Why numbering should start at zero.

Erasing from a vector by index

Use the vector::erase function to erase an element, or range of elements from a vector. It returns an iterator pointing to the element after the last one erased:

v.erase(v.begin() + pos);
v.erase(v.begin() + start, v.begin() + end);

For example, this program populates a vector with the numbers 0 to 9 and then erases 8, before erasing 3 to 5:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>

int main()
{
typedef std::vector<int> intvec;
intvec v(10);
std::iota(v.begin(), v.end(), 0);
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";

std::cout << "erase 8:\n";
intvec::const_iterator it = v.erase(v.begin() + 8);
std::cout << "next element is " << *it << "\n";
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";

std::cout << "erase 3 to 5:\n";
it = v.erase(v.begin() + 3, v.begin() + 6);
std::cout << "next element is " << *it << "\n";
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
}

You can make a generic function to erase from a container by index by using std::advance to move to the first (and optionally last) element to remove and then calling the container’s own erase method:

template <class Container>
typename Container::iterator erase(Container& cont, size_t pos, size_t len = 1)
{
typename Container::iterator it1 = cont.begin();
typename Container::iterator it2 = it1;
return cont.erase(it1, it2);
}

Use it like this:

erase(vec, 8);    // erase 8
erase(vec, 3, 3); // erase 3 to 5

Note that this will be O(n) for lists because their iterators are not RandomAccessIterators.

Getting the Index of an Iterator

Method 1: Subtract the iterator from the beginning of the container

#include <vector>
#include <algorithm>
#include <iostream>

int main()
{
typedef std::vector<int> intvec;
intvec vec(10);
std::iota(vec.begin(), vec.end(), 0);
for (intvec::const_iterator it = vec.begin(); it != vec.end(); ++it) {
std::cout << *it << " is at " << it - vec.begin() << "\n";
}
}

Note that this will only work with a RandomAccessIterator.

Method 2: Use std::distance on the beginning of the container and the iterator

#include <list>
#include <algorithm>
#include <iostream>

int main()
{
typedef std::list<std::string> stringlist;
stringlist list;
const char *words[] = {"The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"};
const size_t n = sizeof(words) / sizeof(const char*);
std::copy(words, words + n, std::back_inserter(list));
for (stringlist::const_iterator it = list.begin(); it != list.end(); ++it) {
std::cout << *it << " is at " << std::distance(list.cbegin(), it) << "\n";
}
}

This will work with any kind of iterator, but will be O(n) for a list iterator.
Notice the use of cbegin() to get a const iterator to the beginning. The iterator returned by begin() is non-const, and so cannot be compared to a const iterator by distance() as it’s a different type.

Iterator invalidation rules for C++ containers

If you need to worry about iterator invalidation it may be that you should be using an existing algorithm so that you do not need to modify the container’s contents directly.
You definitely need to understand invalidation however, if you want to write an algorithm.

Insertion

Sequence containers

vector

All iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated).
Reason: A vector is a contiguous array, so an insertion causes a block shift which moves all elements after the point of insertion. If the capacity is increased, the whole block is reallocated, potentially changing its address, and so the addresses of all elements.

deque

All iterators and references are invalidated, unless the inserted member is at an end (front or back) of the deque (in which case all iterators are invalidated, but references to elements are unaffected).
Reason: A deque is an array of fixed-size arrays (see Deque in C). Addition to the either end (i.e., to the free space in one of the arrays on an end), cannot cause elements to move, so references remain valid. However, as iterators are based on counting, they are all invalidated by addition.

list

All iterators and references unaffected.
Reason: A list is a linked list. Only the connections between nodes are affected by addition, not the nodes themselves.

Associative containers

set, map, multiset, multimap

All iterators and references unaffected.
Reason: They are all binary trees. Only the connections between nodes are affected by addition, not the nodes themselves.

stack and queue

Inherited from underlying container.

priority_queue

Inherited from underlying container.

Erasure

Sequence containers

vector

Every iterator and reference after the point of erase is invalidated.
Reason:An erasure causes a block shift leftwards of the succeeding elements, so iterators to them become invalid.

deque

All iterators and references are invalidated, unless the erased members are at an end (front or back) of the deque (in which case only iterators and references to the erased members are invalidated)
Reason:Removing from the end arrays cannot affect the other elements.

list

Only the iterators and references to the erased element is invalidated

Associative containers

set, map, multiset, multimap

Only iterators and references to the erased elements are invalidated.