# Graph data structures

## Introduction

In this post, I introduce the concept of a graph and describe some ways of representing graphs in C.

## Definitions

### Graphs, vertices and edges

A graph is a collection of nodes called vertices, and the connections between them, called edges.

### Undirected and directed graphs

When the edges in a graph have a direction, the graph is called a directed graph or digraph, and the edges are called directed edges or arcs.
Here, I shall be exclusively concerned with directed graphs, and so when I refer to an edge, I mean a directed edge.
This is not a limitation, since an undirected graph can easily be implemented as a directed graph by adding edges between connected vertices in both directions.

A representation can often be simplified if it is only being used for undirected graphs, and I’ll mention in passing how this can be achieved.

A vertex that is the end-point of an edge is called a neighbour of the vertex that is its starting-point.
The first vertex is said to be adjacent to the second.

### An example

The following diagram shows a graph with 5 vertices and 7 edges.
The edges between A and D and B and C are pairs that make a bidirectional connection, represented here by a double-headed arrow.

### Mathematical definition

More formally, a graph is an ordered pair, G = <V, A>, where V is the set of vertices, and A, the set of arcs, is itself a set of ordered pairs of vertices.

For example, the following expressions describe the graph shown above in set-theoretic language:

V = {A, B, C, D, E}
A = {<A, B>, <A, D>, <B, C>, <C, B>, <D, A>, <D, C>, <D, E>}


## Functions

A graph implementation needs a basic set of functions to assemble and modify graphs, and to enumerate vertices, edges and neighbours.

The following functions are provided by each representation.
These are the declarations for the intuitive representation, graph1:

graph1 *graph1_create(void);
Create an empty graph
void graph1_delete(graph1 *graph);
Delete a graph
vertex *graph1_add(graph1 *graph, const char *name, void *data);
Add a vertex to the graph with a name, and optionally some data
vertex *graph1_get_vertex(const graph1 *graph, const char *name);
Retrieve a vertex by name
void *graph1_remove(graph1 *graph, vertex *vertex);
Remove a vertex
void graph1_add_edge(graph1 *graph, vertex *vertex1, vertex *vertex2);
Create a directed edge between vertex1 and vertex2
void graph1_remove_edge(graph1 *graph, vertex *vertex1, vertex *vertex2);
Remove the directed edge from vertex1 to vertex2
unsigned int graph1_get_adjacent(const graph1 *graph, const vertex *vertex1, const vertex *vertex2);
Determine if there is an edge from vertex1 to vertex2
iterator *graph1_get_neighbours(const graph1 *graph, const vertex *vertex);
Get the neighbours of a vertex
iterator *graph1_get_edges(const graph1 *graph);
Get all of the edges in the graph
iterator *graph1_get_vertices(const graph1 *graph);
Get all of the vertices in the graph
unsigned int graph1_get_neighbour_count(const graph1 *graph, const vertex *vertex);
Get the count of neighbours of a vertex
unsigned int graph1_get_edge_count(const graph1 *graph);
Get the count of edges in the graph
unsigned int graph1_get_vertex_count(const graph1 *graph);
Get the count of vertices in the graph

## Representation of vertices and edges

### Vertices

All of the graph representations use the following definition of a vertex:

typedef struct {
char *name;
void *data;
void *body;
deletefn del;
} vertex;


Note the body field, which is not of interest to clients, but is used by some representations (Adjacency List and Incidence List) to add per-vertex strucure.

The following functions are provided for working with vertices:

const char *vertex_get_name(const vertex *v);
Get the vertex’s name
void *vertex_get_data(const vertex *v);
Get the data associated with a vertex

### Edges

How edges are implemented internally varies with the representation.
In fact, in three representations, Adjacency List, Adjacency Matrix and Incidence Matrix, edges do not exist internally as objects at all.
From the viewpoint of clients however, edges, as enumerated by the iterator returned by the function to retrieve edges, are this structure:

typedef struct {
vertex *from;
vertex *to;
} edge;


The following functions are provided for working with edges:

const vertex *edge_get_from(const edge *e);
Get the vertex that is the starting-point of an edge
const vertex * edge_get_to(const edge *e);
Get the vertex that is the end-point of an edge

## Example program

The following program constructs the graph shown in the introduction using the intuitive representation, graph1, and then enumerates the vertices, neighbours and edges:

#include <stdio.h>

#include <graph1.h>

int main(void)
{
graph1 *graph;
vertex *v;
vertex *A, *B, *C, *D, *E;
iterator *vertices, *edges;
edge *e;

/* Create a graph */
graph = graph1_create();

/* Display */
printf("Vertices (%d) and their neighbours:\n\n", graph1_get_vertex_count(graph));
vertices = graph1_get_vertices(graph);
while ((v = iterator_get(vertices))) {
iterator *neighbours;
vertex *neighbour;
unsigned int n = 0;
printf("%s (%d): ", vertex_get_name(v), graph1_get_neighbour_count(graph, v));
neighbours = graph1_get_neighbours(graph, v);
while ((neighbour = iterator_get(neighbours))) {
printf("%s", vertex_get_name(neighbour));
if (n < graph1_get_neighbour_count(graph, vertex) - 1) {
fputs(", ", stdout);
}
n++;
}
putchar('\n');
iterator_delete(neighbours);
}
putchar('\n');
iterator_delete(vertices);
printf("Edges (%d):\n\n", graph1_get_edge_count(graph));
edges = graph1_get_edges(graph);
while ((e = iterator_get(edges))) {
printf("<%s, %s>\n", vertex_get_name(edge_get_from(e)), vertex_get_name(edge_get_to(e)));
}
putchar('\n');
iterator_delete(edges);

/* Delete */
graph1_delete(graph);

return 0;
}


## Graph representations

There are essentially 5 ways of representing a graph:

### The intuitive representation: graph1

What I call the "intuitive" and can also called the "object-oriented" representation is a direct translation of the mathematical definition of a graph into a data type:

typedef struct {
set *vertices;
set *edges;
} graph1;

• Adding a vertex simply requires adding it to the vertex set.
• Adding an edge simply requires adding it to the edge set.
• Removing vertices and edges simply means removing them from the respective sets.
• To find a vertex’s neighbours, search the edge set for edges having the vertex as the from field.
• To determine if two vertices are adjacent, search the edge set for an edge having the first vertex as its from field, and the second vertex as its to field.
• Getting all of the edges is easy; just return an iterator over the edge set.
• For undirected graphs, each edge would be stored only once, and getting neighbours and adjacency testing would look at both vertices.

The edge object would not be from and to but simply first and second, i.e., an unordered pair.

• This is one of the representations where edges exist internally as objects (Incidence List is the other).
• This is most like a sparse Adjacency Matrix, with the edge set holding those pairs that are adjacent, and non-adjacent pairs being absent.

The graph is made up of a set of vertices.
Each vertex contains a set of vertices for its neighbours.

typedef struct {
set *vertices;
} graph2;

typedef struct {
set *neighbours;
} vertex_body;


For the graph shown in the introduction, the sets of neighbours would look like this:

A: {B, D}
B: {C}
C: {B}
D: {A, C, E}
E: {}

• Adding a vertex just means adding it to the vertex set.
• Adding an edge means adding the end-point of it to the starting vertex’s neighbour set.
• It is easy to go from a vertex to its neighbours, because the vertex stores them all.

Just return an iterator over them.

This makes the graph argument in the function to retrieve neighbours unnecessary in this implementation.

• Testing for adjacency is easy; just search the first vertex’s neighbours for the second vertex.
• Getting all edges is more difficult to implement in this representation, because edges don’t exist as objects.

You need to iterate over the neighbours of each vertex in turn, and construct the edge from the vertex and the neighbour.

The graph is made up of a set of vertices and a matrix, whose rows and columns are indexed by vertices, and which contains a 1 entry if the vertices are connected.

typedef struct {
set    *vertices;
matrix *edges;
} graph3;


The adjacency matrix for the graph shown in the introduction would look like this:

$$\begin{array}{c c} & \begin{array}{c c c} A & B & C & D & E \\ \end{array} \\ \begin{array}{c c c} A\\ B\\ C\\ D\\ E\\ \end{array} & \left[ \begin{array}{c c c} 0 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right] \end{array}$$

• When adding a vertex, add a row and column to the matrix.
• When removing a vertex, remove its row and column.
As adding and removing rows and columns is expensive, these make the adjacency matrix unsuitable for graphs in which vertices are frequently added and removed.

• Adding and removing edges is easy however, and requires no allocation or deallocation of memory, just setting a matrix element.
• To get neighbours, look along the vertex’s row for 1s.
• To determine adjacency, look for a 1 at the intersection of the first vertex’s row and the second vertex’s column.
• To get the edge set, find all of the 1s in the matrix and construct the edges from the corresponding vertices.
• If the graph is undirected, the matrix will be symmetrical about the main diagonal.
This means that you can drop half of it, making a triangular matrix.

• The vertex set needs to be ordered so that the index number of vertices can be looked up, or the matrix needs to be a 2-d map keyed by the vertices themselves.
• Memory used for edges is a constant |V|2.
The best use of this is a graph that is nearly complete, i.e., has a lot of edges.

• The matrix can be sparse; this relates the memory usage more closely to the number of edges.
It also makes addition and removal of columns easier (no block shifts), but requires renumbering afterwards.

• You can use booleans or bits in the matrix to save memory.

### Incidence Matrix: graph4

The graph is made up of a set of vertices and a matrix, as in Adjacency Matrix, but the matrix is vertices × edges, with each column containing two non-zero entries, one for the starting-point vertex and one for the end-point.

typedef struct {
set    *vertices;
matrix *edges;
} graph4;


The incidence matrix for the graph shown in the introduction looks like this (1 means "from" and 2 means "to"):

$$\begin{array}{c c} & \\ \begin{array}{c c c} A\\ B\\ C\\ D\\ E\\ \end{array} & \left[ \begin{array}{c c c} 1 & 1 & 0 & 0 & 2 & 0 & 0 \\ 2 & 0 & 1 & 2 & 0 & 0 & 0 \\ 0 & 0 & 2 & 1 & 0 & 2 & 0 \\ 0 & 2 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 2 \\ \end{array} \right] \end{array}$$

• When you add a vertex, you add a row to the matrix.
• When you add an edge, you add a column to the matrix.
• When you remove a vertex, you need to remove all of the columns containing the vertex from the matrix.
• Getting the edges means iterating over the columns and constructing the edges from the two values.
• To find neighbours, look for 1s in the vertex’s row, and in each such column look for the 2 value, which is the neighbour.
• To determine adjacency, find a column containing a 1 in the starting-point vertex’s row, and a 2 in the end-point’s row.
• For an undirected graph, you have one column per edge, and just the value 1 for "connected", so each column contains two 1s.

### Incidence List: graph5

There is a set of vertices as in Adjacency List, but each vertex stores a list of the edges that it is the starting-point of, rather than neighbours.

typedef struct {
set * vertices;
} graph5;

typedef struct {
set *edges;
} vertex_body;


For the graph shown in the introduction, the sets of edges would look like this:

A: {<A, B>, <A, D>}
B: {<B, C>}
C: {<C, B>}
D: {<D, A>, <D, C>, <D, E>}
E: {}

• Adding a vertex just means adding it to the vertex set.
• Adding an edge means adding it to its starting vertex’s edge set.
• Finding if two vertices are adjacent requires searching the first vertex’s edge set for an edge containing the second vertex as its to field.
• Getting the neighbours requires retrieving them from the pairs in the set of edges for the vertex.
• Getting the edge set requires enumerating each of the vertices’ edge sets in turn.
• You can store the edges in the graph object as well as in each vertex.
• ## 2 thoughts on “Graph data structures”

1. Victor Rodriguez says:

Where is the library graph1.h ? Could you share it pls ?

1. martin says:

Hi Victor,

I’ve been reworking all of my graph code, hence its disappearance.
I have placed the source code for the graph1 implementation in a post here: Graph in C