Open source SQL engines

Contents

MySQL

The world’s most popular open source database

Language: C, C++

Connectors

PostgreSQL

The world’s most advanced open source database

Language: C

Connectors

SQLite

SQLite is a in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine

Language: C

Connectors

Apache Hive

The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage.
Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Language: Java

Connectors

MariaDB

One of the most popular database servers.
Made by the original developers of MySQL.
Guaranteed to stay open source.

Language: C

Connectors

Firebird

Firebird is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms.
Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers.
It has been used in production systems, under a variety of names, since 1981.

Language: C, C++

Connectors

HyperSQL

HSQLDB (HyperSQL DataBase) is the leading SQL relational database software written in Java.
It offers a small, fast multithreaded and transactional database engine with in-memory and disk-based tables and supports embedded and server modes.
It includes a powerful command line SQL tool and simple GUI query tools.

Language: Java

Connectors

  • JDBC

H2

The main features of H2 are:

  • Very fast, open source, JDBC API
  • Embedded and server modes; in-memory databases
  • Browser based Console application
  • Small footprint: around 1.5 MB jar file size

Language: Java

Connectors

Apache Derby

Apache Derby,
an Apache DB subproject,
is an open source relational database implemented entirely in Java
and available
under the Apache License, Version 2.0.
Some key advantages include:

  • Derby has a small footprint — about 2.6 megabytes for the base engine
    and embedded JDBC driver.
  • Derby is based on the Java, JDBC, and
    SQL
    standards.
  • Derby provides an embedded JDBC driver that lets you
    embed
    Derby in any Java-based solution.
  • Derby also supports the more familiar client/server mode with the
    Derby Network Client JDBC
    driver and Derby Network Server
    .
  • Derby is easy to install, deploy, and use.

Language: Java

Connectors

OpenLink Virtuoso

At core, Virtuoso is a high-performance object-relational SQL database.
As a database, it provides transactions, a smart SQL compiler, powerful stored-procedure language with optional Java and .Net server-side hosting, hot backup, SQL-99 support and more.
It has all major data-access interfaces, such as ODBC, JDBC, ADO .Net and OLE/DB.

Language: C++

  • Web server
  • SOAP
  • REST
  • WS-*
  • BPEL
  • WebDAV
  • SPARQL

Connectors

MemSQL

MemSQL is a high-performance, in-memory database that combines the horizontal scalability of distributed systems with the familiarity of SQL.

Language: C, C++

Apache Drill

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Language: Java

Connectors

Percona Server

Is a free, fully compatible, enhanced, open source drop-in replacement for MySQL that provides superior performance, scalability and instrumentation.
With over 1,900,000 downloads, Percona Server’s self-tuning algorithms and support for extremely high-performance hardware delivers excellent performance and reliability.

Language: C, C++

Connectors

MonetDB

For high-performance applications in data mining, OLAP, GIS, XML Query, text and multimedia retrieval

Connectors

CUBRID

Object-relational database management system (DBMS) optimized for Web services

Langauge: C, C++

Connectors

Apache Phoenix

Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data.
Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.
The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Language: Java

Connectors

  • JDBC

Presto

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

Connectors

Apache Tajo

Language: Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop.
Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.
By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities.

Connectors

EffiProz

C#

h2sharp

An ADO.NET wrapper for the H2 Database Engine written in C#

Language: Java

Connectors

  • ADO.NET

SmallSQL

SmallSQL is small, fast, embeddable, SQL 99 conformance, platform independent and
100% pure Java

Language: Java

Connectors

  • JDBC

SQL::Statement

The SQL::Statement module implements a pure Perl SQL parsing and execution engine.
While it by no means implements full ANSI standard, it does support many features including column and table aliases, built-in and user-defined functions, implicit and explicit joins, complex nested search conditions, and other features.

SQL::Statement is a small embeddable Database Management System (DBMS).
This means that it provides all of the services of a simple DBMS except that instead of a persistent storage mechanism, it has two things:
1) an in-memory storage mechanism that allows you to prepare, execute, and fetch from SQL statements using temporary tables and
2) a set of software sockets where any author can plug in any storage mechanism.

There are three main uses for SQL::Statement.
One or another (hopefully not all) may be irrelevant for your needs:
1) to access and manipulate data in CSV, XML, and other formats
2) to build your own DBD for a new data source
3) to parse and examine the structure of SQL statements.

Language: Perl

  • Pure Perl SQL parsing and execution engine
  • Acts as an abstract base class for RDBMS implementations

tinySQL

tinySQL is a SQL engine written in Java.
It uses dBase format files compatible with MS Excel.
tinySQL comes with a command line interpreter which you can use to create, drop, or update tables.

Language: Java

  • Java SQL engine
  • Uses dBase format files

Graph algorithms

Introduction

This is a list of graph algorithms with links to references and implementations.
The graph libraries included are igraph, NetworkX, and Boost Graph Library.

Contents

Generators

Deterministic

Star

Wikipedia: https://en.wikipedia.org/wiki/Star_%28graph_theory%29

Complete

Wikipedia: https://en.wikipedia.org/wiki/Complete_graph

Note that the igraph function can produce directed complete graphs, as well as those containing 1 or more self-loops.

Tree

Famous

Bull

Wikipedia: https://en.wikipedia.org/wiki/Bull_graph

Chvátal

Wikipedia: https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph

Cubic

Wikipedia: https://en.wikipedia.org/wiki/Cubic_graph

Diamond

Wikipedia: https://en.wikipedia.org/wiki/Diamond_graph

Dodecahedral

Wolfram MathWorld: http://mathworld.wolfram.com/DodecahedralGraph.html

Frucht

Wikipedia: https://en.wikipedia.org/wiki/Frucht_graph

Heawood

Wikipedia: https://en.wikipedia.org/wiki/Heawood_graph

House

Wolfram MathWorld: http://mathworld.wolfram.com/HouseGraph.html

Icosahedral

Wolfram MathWorld: http://mathworld.wolfram.com/IcosahedralGraph.html

Krackhardt Kite

Wolfram MathWorld: http://mathworld.wolfram.com/KrackhardtKite.html

Octahedral

Wolfram MathWorld: http://mathworld.wolfram.com/OctahedralGraph.html

Petersen

Wolfram MathWorld: http://mathworld.wolfram.com/PetersenGraph.html

Tetrahedral

Wolfram MathWorld: http://mathworld.wolfram.com/TetrahedralGraph.html

Tutte

Wikipedia: https://en.wikipedia.org/wiki/Tutte_graph

Random

Erdos-Rényi

Wikipedia: https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model

Watts-Strogatz

Wikipedia: https://en.wikipedia.org/wiki/Watts_and_Strogatz_model

Barabási–Albert

Wikipedia: https://en.wikipedia.org/wiki/Barab%C3%A1si%E2%80%93Albert_model

Traversal

Lecture notes: http://web.cse.ohio-state.edu/~gurari/course/cis680/cis680Ch14.html

Depth-first search

Wikipedia: https://en.wikipedia.org/wiki/Depth-first_search

Breadth-first search

Wikipedia: https://en.wikipedia.org/wiki/Breadth-first_search

Isomorphism

Wikipedia: https://en.wikipedia.org/wiki/Graph_isomorphism

Connected components

Wikipedia: https://en.wikipedia.org/wiki/Connectivity_%28graph_theory%29

Connectivity

Connectivity testing

Decomposition into connected components

Strong connectivity

Strong connectivity testing

Decomposition into strongly connected components

Weak connectivity

Weak connectivity testing

Decomposition into weakly connected components

Biconnectivity

Biconnected components

Articulation points

Cliques

Wikipedia: https://en.wikipedia.org/wiki/Clique_%28graph_theory%29

Cliques

Maximal cliques

Clique number

Number of maximal cliques

Flows and Cuts

Lecture notes: http://www.cs.princeton.edu/courses/archive/spr04/cos226/lectures/maxflow.4up.pdf

Maximum flow

Maximum flow value

Minimum cut

Minimum cut value

Shortest paths

Wikipedia: https://en.wikipedia.org/wiki/Shortest_path_problem

Shortest paths

Dijkstra

Bellman-Ford

Johnson

A*

Distance measures

Wikipedia: https://en.wikipedia.org/wiki/Distance_%28graph_theory%29

Diameter

Eccentricity

Radius

Centrality measures

Lecture notes: http://cs.brynmawr.edu/Courses/cs380/spring2013/section02/slides/05_Centrality.pdf

Closeness

Betweenness

Google PageRank

Paper: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Directed Acyclic Graphs (DAGs)

Wikipedia: https://en.wikipedia.org/wiki/Directed_acyclic_graph

DAG testing

Topological sort

Transitive closure

Bipartite graphs

Wolfram Mathworld: http://mathworld.wolfram.com/BipartiteGraph.html

Creation

Testing

Creation from adjacency matrix

Note that the igraph documentation calls this an incidence matrix.

Conversion to adjacency matrix

Note that the igraph documentation calls this an incidence matrix.

Projection

Cores

Paper: http://arxiv.org/abs/cs.DS/0310049

Core number

Chordal graphs

Wolfram Mathworld: http://mathworld.wolfram.com/ChordalGraph.html

Chordal testing

Maximum cardinality search

Spanning trees

Wikipedia: https://en.wikipedia.org/wiki/Spanning_tree

Note that in NetworkX they are called arborescences (http://networkx.github.io/documentation/networkx-1.10/reference/algorithms.tree.html#recognition-tests).

Minimum spanning tree

Clustering

Article: http://dollar.biz.uiowa.edu/~street/graphClustering.pdf

Transitivity

Clustering coefficient (local transitivity)

Average clustering (average local transitivity)

Degree sequences

http://mathworld.wolfram.com/DegreeSequence.html

Simple graph testing

Pseudograph testing

Spectral properties

Book chapter: http://www.cs.yale.edu/homes/spielman/PAPERS/SGTChapter.pdf

Laplacian matrix

Reading and writing graphs

Edge list

GraphML

The GraphML File Format: http://graphml.graphdrawing.org/

GML

GML: a portable Graph File Format: http://www.fim.uni-passau.de/fileadmin/files/lehrstuhl/brandenburg/projekte/gml/gml-technical-report.pdf

Pajek

Pajek / Pajek-XXL: http://mrvar.fdv.uni-lj.si/pajek/

GraphViz

GraphViz: http://www.graphviz.org/

Sorting algorithms

These are some sorting algorithms I’ve written in C++.

Contents

  • Bubble sort
  • Cocktail sort
  • Tree sort
  • Selection sort
  • Min-max sort
  • Example program

Bubble Sort

In a bubble sort, a series of passes is made over the data, and in each pass, every pair of adjacent elements is compared. If they are in the wrong order, they are swapped. The sort completes when no changes have been made in a pass. In each pass the maximum element is “bubbled” to the end, as it will be swapped with every succeeding element, and consequently becoming a part of the next comparison. The effect of this is that a sorted sublist is built up at the right of the input list. The end of the sublist to be sorted is shortened by one element after each pass to avoid examining the sorted sublist.

template <class RandomAccessIterator>
void bubble_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    bool finished = false;
    RandomAccessIterator it, end = last - 1;
    while (!finished) {
        bool changed = false;
        for (it = first; it < end; ++it) {
            if (*it > *(it + 1)) {
                std::swap(*it, *(it + 1));
                changed = true;
            }
        }
        if (changed) {
            --end;
        }
        else {
            finished = true;
        }
    }
}

Cocktail Sort

By implementing a bubble sort bidirectionally, with the direction of each pass alternating, both the smallest and largest elements of the unsorted range are put in their proper place on each pass. This has the advantage of moving small elements at the end of the range to their proper place much more quickly. The effect is that a sorted sublist is built up at each end of the input list, and the sublist still to be sorted is correspondingly reduced by one element at each end after each pass.

template <class RandomAccessIterator>
void cocktail_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    bool finished = false;
    RandomAccessIterator it, begin = first, end = last - 1;
    while (!finished) {
        // Forward pass
        bool changed = false;
        for (it = begin; it < end; ++it) {
            if (*it > *(it + 1)) {
                std::swap(*it, *(it + 1));
                changed = true;
            }
        }
        if (changed) {
            --end;
        }
        else {
            finished = true;
        }
        if (!finished) {
            // Reverse pass
            changed = false;
            for (it = end; it > begin; --it) {
                if (*it < *(it - 1)) {
                    std::swap(*it, *(it - 1));
                    changed = true;
                }
            }
            if (changed) {
                ++begin;
            }
            else {
                finished = true;
            }
        }
    }
}

Tree Sort

A binary tree is a sorted data structure, and a list can be sorted by inserting its elements into a binary tree, and then retrieving them in order.

template <class RandomAccessIterator>
void tree_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    treenode<RandomAccessIterator>* tree = treenode<RandomAccessIterator>::build(first, last);
    tree->write(&first);
    tree->destroy();
}

This sort requires auxilliary storage, and perhaps in C++ it should have a different interface in order to allow an Allocator to be supplied by the client.

This is the process of building up the tree:

    static treenode<RandomAccessIterator>* build(RandomAccessIterator first, RandomAccessIterator last)
    {
        treenode<RandomAccessIterator>* root = NULL;
        for (RandomAccessIterator it = first; it < last; ++it) {
            treenode<RandomAccessIterator>* node = new treenode<RandomAccessIterator>(*it);
            if (root == NULL) {
                root = node;
            }
            else {
                treenode<RandomAccessIterator>* current = root,* previous = NULL;
                bool less;
                while (current != NULL) {
                    less = node->value_ < current->value_;
                    previous = current;
                    current = less ? current->left_ : current->right_;
                }
                if (less) {
                    previous->left_ = node;
                }
                else {
                    previous->right_ = node;
                }
            }
        }
        return root;
    }

Once the tree has been assembled, it is traversed in order using a recursive function and its contents are stored in the iterator. Note that the iterator is passed by address; this is how you ensure that a value is shared across all stack frames in a recursive algorithm.

    void write(RandomAccessIterator* it) const
    {
        if (left_) {
            left_->write(it);
        }
        **it = value_;
        ++(*it);
        if (right_) {
            right_->write(it);
        }
    }

After the sorted data has been stored back to the iterator, the tree is recursively destroyed. This needs to be performed in postorder so that the subtrees are destroyed before the parent node.

    void destroy()
    {
        if (left_) {
            left_->destroy();
        }
        if (right_) {
            right_->destroy();
        }
        delete this;
    }

Selection Sort

In a selection sort the data is searched for the minimum (or maximum) element, and this is swapped with the element in its place at the end. As with a bubble sort, the end of the range to be sorted is adjusted after each pass as a sorted sublist is built up at the end. The construction of the sorted range is identical to that in a bubble sort, but it is performed much more efficiently because there are fewer swaps.

template <class RandomAccessIterator>
void selection_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    RandomAccessIterator begin = first;
    while (begin < last) {
        RandomAccessIterator it, min;
        bool started = false;
        for (it = begin; it < last; ++it) {
            if (!started || *it < *min) {
                min = it;
            }
            started = true;
        }
        std::swap(*min, *begin);
        ++begin;
    }
}

Min-Max Sort

One can perform a selection sort and simultaneously search for the minimum and maximum element and move them to their proper places in each pass. This has the effect of building up a sorted sublist at each end, with the unsorted part becoming smaller from each ends, as in a cocktail sort. Again, the construction of the sorted ranges is performed much more efficiently in this sort than a cocktail sort because there are fewer swaps.

template <class RandomAccessIterator>
void minmax_sort(RandomAccessIterator first, RandomAccessIterator last)
{
    RandomAccessIterator begin = first;
    RandomAccessIterator end = last - 1;
    while (begin < end) {
        RandomAccessIterator it, min, max;
        bool started = false;
        for (it = begin; it <= end; ++it) {
            if (!started) {
                min = it;
                max = it;
                started = true;
            }
            else {
                if (*it < *min) {
                    min = it;
                }
                else if (*it > *max) {
                    max = it;
                }
            }
        }
        if (max == begin && min == end) {
            // Swap min and max only
            std::swap(*max, *min);
        }
        else if (max == begin) {
            // Swap max and end first
            std::swap(*max, *end);
            std::swap(*min, *begin);
        }
        else {
            // Swap min and begin first
            std::swap(*min, *begin);
            std::swap(*max, *end);
        }
        ++begin;
        --end;
    }
}

Example Program

Below is an example program to exercise all of the sorting algorithms:

#include <vector>
#include <iostream>
#include <functional>
#include <iterator>

#include "bubble_sort.h"
#include "cocktail_sort.h"
#include "tree_sort.h"
#include "selection_sort.h"
#include "minmax_sort.h"

namespace
{

template <class ForwardIterator>
bool is_sorted(ForwardIterator first, ForwardIterator last)
{
    typedef typename std::iterator_traits<ForwardIterator>::value_type T;
    return std::adjacent_find(first, last, std::greater<T>()) == last;
}

};

int main()
{
    typedef std::vector<int>::iterator I;
    typedef void (*sortfn)(I, I);
    sortfn sorts[] = { bubble_sort<I>, cocktail_sort<I>, tree_sort<I>, selection_sort<I>, minmax_sort<I> };
    const size_t numsorts = sizeof(sorts) / sizeof(sortfn);
    std::vector<int> numbers;
    for (unsigned int i = 0; i < 100; i++) {
        numbers.push_back(rand());
    }
    std::copy(numbers.begin(), numbers.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << "\n";
    for (unsigned int s = 0; s < numsorts; s++) {
        std::vector<int> vec(numbers.begin(), numbers.end());
        sorts[s](vec.begin(), vec.end());
        std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, " "));
        std::cout << is_sorted(vec.begin(), vec.end()) << "\n";
    }

    return 0;
}

TCP Proxy

This is about as simple as a TCP proxy can be. It accepts one client connection at a time, but you can easily allow it to accept more by using the methods demonstrated in the server examples. You can use it as a starting point to write a more sophisticated proxy, that, for example, writes the transfers to a file, or modifies the data being exchanged.

The thing to remember when writing a proxy is that you need to call select before every call to read or recv, unless you know how many more bytes there are. This is because if there are no more bytes left, read and recv will block, which will cause the proxy to freeze. read and recv only return 0 when the peer has disconnected.

 /* 
 *  A simple TCP proxy
 *  by Martin Broadhurst (www.martinbroadhurst.com)
 *  Usage: tcpproxy local_host local_port remote_host remote_port
 */

#include <stdio.h>
#include <string.h> /* memset() */
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <netdb.h>
#include <signal.h>

#define BACKLOG  10      /* Passed to listen() */
#define BUF_SIZE 4096    /* Buffer for  transfers */

unsigned int transfer(int from, int to)
{
    char buf[BUF_SIZE];
    unsigned int disconnected = 0;
    size_t bytes_read, bytes_written;
    bytes_read = read(from, buf, BUF_SIZE);
    if (bytes_read == 0) {
        disconnected = 1;
    }
    else {
        bytes_written = write(to, buf, bytes_read);
        if (bytes_written == -1) {
            disconnected = 1;
        }
    }
    return disconnected;
}

void handle(int client, const char *remote_host, const char *remote_port)
{
    struct addrinfo hints, *res;
    int server = -1;
    unsigned int disconnected = 0;
    fd_set set;
    unsigned int max_sock;

    /* Get the address info */
    memset(&hints, 0, sizeof hints);
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    if (getaddrinfo(remote_host, remote_port, &hints, &res) != 0) {
        perror("getaddrinfo");
        close(client);
        return;
    }

    /* Create the socket */
    server = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
    if (server == -1) {
        perror("socket");
        close(client);
        return;
    }

    /* Connect to the host */
    if (connect(server, res->ai_addr, res->ai_addrlen) == -1) {
        perror("connect");
        close(client);
        return;
    }

    if (client > server) {
        max_sock = client;
    }
    else {
        max_sock = server;
    }

    /* Main transfer loop */
    while (!disconnected) {
        FD_ZERO(&set);
        FD_SET(client, &set);
        FD_SET(server, &set);
        if (select(max_sock + 1, &set, NULL, NULL, NULL) == -1) {
            perror("select");
            break;
        }
        if (FD_ISSET(client, &set)) {
            disconnected = transfer(client, server);
        }
        if (FD_ISSET(server, &set)) {
            disconnected = transfer(server, client);
        }
    }
    close(server);
    close(client);
}

int main(int argc, char **argv)
{
    int sock;
    struct addrinfo hints, *res;
    int reuseaddr = 1; /* True */
    const char *local_host, *local_port, *remote_host, *remote_port;

    /* Get the local and remote hosts and ports from the command line */
    if (argc < 5) {
        fprintf(stderr, "Usage: tcpproxy local_host local_port remote_host remote_port\n");
        return 1;
    }
    local_host = argv[1];
    local_port = argv[2];
    remote_host = argv[3];
    remote_port = argv[4];

    /* Get the address info */
    memset(&hints, 0, sizeof hints);
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    if (getaddrinfo(local_host, local_port, &hints, &res) != 0) {
        perror("getaddrinfo");
        return 1;
    }

    /* Create the socket */
    sock = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
    if (sock == -1) {
        perror("socket");
        freeaddrinfo(res);
        return 1;
    }

    /* Enable the socket to reuse the address */
    if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &reuseaddr, sizeof(int)) == -1) {
        perror("setsockopt");
        freeaddrinfo(res);
        return 1;
    }

    /* Bind to the address */
    if (bind(sock, res->ai_addr, res->ai_addrlen) == -1) {
        perror("bind");
        freeaddrinfo(res);
        return 1;
    }

    /* Listen */
    if (listen(sock, BACKLOG) == -1) {
        perror("listen");
        freeaddrinfo(res);
        return 1;
    }

    freeaddrinfo(res);

    /* Ignore broken pipe signal */
    signal(SIGPIPE, SIG_IGN);
 
    /* Main loop */
    while (1) {
        socklen_t size = sizeof(struct sockaddr_in);
        struct sockaddr_in their_addr;
        int newsock = accept(sock, (struct sockaddr*)&their_addr, &size);

        if (newsock == -1) {
            perror("accept");
        }
        else {
            printf("Got a connection from %s on port %d\n",
                    inet_ntoa(their_addr.sin_addr), htons(their_addr.sin_port));
            handle(newsock, remote_host, remote_port);
        }
    }

    close(sock);

    return 0;
}

TCP Proxy for Windows

This is the Windows version of TCP Proxy.

/* 
 *  A simple TCP proxy for Windows
 *  by Martin Broadhurst (www.martinbroadhurst.com)
 *  Link with Ws2_32.lib
 *  Usage: tcpproxy local_host local_port remote_host remote_port
 */

#ifndef WIN32_LEAN_AND_MEAN
#define WIN32_LEAN_AND_MEAN
#endif

#define _WINSOCK_DEPRECATED_NO_WARNINGS /* inet_ntoa */

#include <stdio.h>
#include <windows.h>
#include <winsock2.h>
#include <ws2tcpip.h>

#define BACKLOG  10      /* Passed to listen() */
#define BUF_SIZE 4096    /* Buffer for  transfers */

unsigned int transfer(SOCKET from, SOCKET to)
{
	char buf[BUF_SIZE];
	unsigned int disconnected = 0;
	size_t bytes_read, bytes_written;
	bytes_read = recv(from, buf, BUF_SIZE, 0);
	if (bytes_read == 0) {
		disconnected = 1;
	}
	else {
		bytes_written = send(to, buf, bytes_read, 0);
		if (bytes_written == -1) {
			disconnected = 1;
		}
	}
	return disconnected;
}

void handle(SOCKET client, const char *remote_host, const char *remote_port)
{
	struct addrinfo hints, *res;
	SOCKET server = -1;
	unsigned int disconnected = 0;
	fd_set set;
	unsigned int max_sock;

	/* Get the address info */
	ZeroMemory(&hints, sizeof hints);
	hints.ai_family = AF_INET;
	hints.ai_socktype = SOCK_STREAM;
	if (getaddrinfo(remote_host, remote_port, &hints, &res) != 0) {
		perror("getaddrinfo");
		closesocket(client);
		return;
	}

	/* Create the socket */
    server = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
	if (server == INVALID_SOCKET) {
		perror("socket");
		closesocket(client);
		return;
	}

	/* Connect to the host */
	if (connect(server, res->ai_addr, res->ai_addrlen) == -1) {
		perror("connect");
		closesocket(client);
		return;
    }

	if (client > server) {
		max_sock = client;
	}
	else {
		max_sock = server;
	}

	/* Main transfer loop */
	while (!disconnected) {
		FD_ZERO(&set);
		FD_SET(client, &set);
		FD_SET(server, &set);
		if (select(max_sock + 1, &set, NULL, NULL, NULL) == SOCKET_ERROR) {
			perror("select");
			break;
		}
		if (FD_ISSET(client, &set)) {
			disconnected = transfer(client, server);
		}
		if (FD_ISSET(server, &set)) {
			disconnected = transfer(server, client);
		}
	}
	if (server != -1) {
		closesocket(server);
	}
	closesocket(client);
}

int main(int argc, char **argv)
{
	WORD wVersion = MAKEWORD(2, 2);
	WSADATA wsaData;
	int iResult;
    SOCKET sock;
	struct addrinfo hints, *res;
	int reuseaddr = 1; /* True */
	const char *local_host, *local_port, *remote_host, *remote_port;

	/* Initialise Winsock */
	if (iResult = (WSAStartup(wVersion, &wsaData)) != 0) {
		fprintf(stderr, "WSAStartup failed: %d\n", iResult);
		return 1;
	}

	/* Get the local and remote hosts and ports from the command line */
	if (argc < 5) {
		fprintf(stderr, "Usage: tcpproxy local_host local_port remote_host remote_port\n");
		return 1;
	}
	local_host = argv[1];
	local_port = argv[2];
	remote_host = argv[3];
	remote_port = argv[4];

	/* Get the address info */
	ZeroMemory(&hints, sizeof hints);
	hints.ai_family = AF_INET;
	hints.ai_socktype = SOCK_STREAM;
	if (getaddrinfo(local_host, local_port, &hints, &res) != 0) {
		perror("getaddrinfo");
		WSACleanup();
		return 1;
	}

	/* Create the socket */
    sock = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
	if (sock == INVALID_SOCKET) {
		perror("socket");
		WSACleanup();
		return 1;
	}

	/* Enable the socket to reuse the address */
	if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, (const char*)&reuseaddr, 
		sizeof(int)) == SOCKET_ERROR) {
		perror("setsockopt");
		WSACleanup();
		return 1;
	}

	/* Bind to the address */
    if (bind(sock, res->ai_addr, res->ai_addrlen) == SOCKET_ERROR) {
        perror("bind");
		WSACleanup();
        return 1;
    }

	/* Listen */
    if (listen(sock, BACKLOG) == SOCKET_ERROR) {
        perror("listen");
		WSACleanup();
        return 1;
    }

	freeaddrinfo(res);
 
	/* Main loop */
	while (1) {
		size_t size = sizeof(struct sockaddr_in);
		struct sockaddr_in their_addr;
		int newsock = accept(sock, (struct sockaddr*)&their_addr, &size);

		if (newsock == INVALID_SOCKET) {
			perror("accept");
		}
		else {
			printf("Got a connection from %s on port %d\n", 
					inet_ntoa(their_addr.sin_addr), htons(their_addr.sin_port));
			handle(newsock, remote_host, remote_port);
		}
	}

    closesocket(sock);
	WSACleanup();

    return 0;
}

K-Permutations

The k-permutations of a set are the permutations of the combinations of size k. They are also known as sequences without repetitions.

There are 24 3-permutations of the 4-set {0, 1, 2, 3}:

[0, 1, 2]
[0, 2, 1]
[1, 0, 2]
[1, 2, 0]
[2, 0, 1]
[2, 1, 0]
[0, 1, 3]
[0, 3, 1]
[1, 0, 3]
[1, 3, 0]
[3, 0, 1]
[3, 1, 0]
[0, 2, 3]
[0, 3, 2]
[2, 0, 3]
[2, 3, 0]
[3, 0, 2]
[3, 2, 0]
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

The algorithm simply finds the next permutation of the array. If the array is at the last permutation, the next combination from the set is constructed.

Note that this algorithm does not produce the k-permutations in lexicographic order.

unsigned int next_k_permutation(unsigned int *ar, size_t n, unsigned int k)
{
	unsigned int result = next_permutation(ar, k);
	if (result == 0) {
		result = next_combination(ar, n, k);
	}
	return result;
}

Subsets of a multiset

This is the set of all subsets of a multiset, which are themselves multisets, i.e., the power set of a multiset.

For example, the power set of the multiset [a, b, b, c] consists of the sets:

()
(c)
(b)
(b, c)
(b, b)
(b, b, c)
(a)
(a, c)
(a, b)
(a, b, c)
(a, b, b)
(a, b, b, c)

The set of all subsets of a particular size are combinations of a multiset.

The multiset and its subsets are represented as a vector containing, for each element, the count of its occurrences in the multiset. For example, the multiset [a, b, b, c] is represented as [1, 2, 1]. This is similar to the characteristic vector used for power set, but with counts rather than boolean values.

The correspondences for the subsets are then as follows:

[0, 0, 0] = ()
[0, 0, 1] = (c)
[0, 1, 0] = (b)
[0, 1, 1] = (b, c)
[0, 2, 0] = (b, b)
[0, 2, 1] = (b, b, c)
[1, 0, 0] = (a)
[1, 0, 1] = (a, c)
[1, 1, 0] = (a, b)
[1, 1, 1] = (a, b, c)
[1, 2, 0] = (a, b, b)
[1, 2, 1] = (a, b, b, c)

The algorithm can then simply count from [0, 0, 0] to [1, 2, 1], using the values in the source multiset as the upper limit for the value of an element.

Note that this algorithm does not produce the subsets in lexicographic order.

unsigned int next_multiset_subset(const unsigned int *multiset, unsigned int *ar, size_t n)
{
	unsigned int changed = 0;
	int i;

	for (i = n - 1; i >= 0 && !changed; i--) {
		if (ar[i] < multiset[i]) {
			/* Increment */
			ar[i]++;
			changed = 1;
		}
		else {
			/* Roll over */
			ar[i] = 0;
		}
	}
	if (!changed) {
		/* Reset to first combination */
		for (i = 0; i < n; i++) {
			ar[i] = 0;
		}
	}
	return changed;
}

Here is an example program:

#include <stdio.h>

#include <multiset-subset.h>

static void print_list(const unsigned int *ar, size_t len, FILE *fptr)
{
	unsigned int i;
	fputc('(', fptr);
	for (i = 0; i < len; i++) {
		fprintf(fptr, "%d", ar[i]);
		if (i < len - 1) {
			fputs(", ", fptr);
		}
	}
	fputc(')', fptr);
}

int main(void)
{
	unsigned int multiset[] = {1, 2, 1};
	const size_t n = sizeof(multiset) / sizeof(unsigned int);
	unsigned int numbers[] = {0, 0, 0};

	do {
		print_list(numbers, n, stdout);
		putchar('\n');
	} while (next_multiset_subset(multiset, numbers, n));

	return 0;
}

Here is a second example that prints the elements of the multisets:

#include <stdio.h>

#include <multiset-subset.h>

static void print_array(const unsigned int *ar, size_t len, FILE *fptr)
{
	unsigned int i;
	fputc('[', fptr);
	for (i = 0; i < len; i++) {
		fprintf(fptr, "%d", ar[i]);
		if (i < len - 1) {
			fputs(", ", fptr);
		}
	}
	fputc(']', fptr);
}


static void print_multiset_subset(const unsigned int *ar, size_t len, 
		const void **elements, printfn print, FILE *fptr)
{
	unsigned int i, started = 0;
	fputc('(', fptr);
	for (i = 0; i < len; i++) {
		unsigned int j;
		for (j = 0; j < ar[i]; j++) {
			if (started) {
				fputs(", ", fptr);
			}
			print(elements[i], fptr);
			started = 1;
		}
	}
	fputc(')', fptr); 
}


int main(void)
{
	unsigned int multiset[] = {1, 2, 1};
	const size_t n = sizeof(multiset) / sizeof(unsigned int);
	char *elements[] = {"a", "b", "c"};
	unsigned int numbers[] = {0, 0, 0};

	do {
        print_array(numbers, n, stdout);
        printf(" = "); 
		print_multiset_subset(numbers, n, (void*)elements, (printfn)fputs, stdout);
		putchar('\n');
	} while (next_multiset_subset(multiset, numbers, n));

	return 0;
}

Combinations of a multiset

These are the combinations of k elements chosen from a set that can contain duplicates (a multiset).

For example, given the multiset {0, 1, 1, 2, 2, 2, 3}, the 4-combinations are:

[0, 1, 1, 2]
[0, 1, 1, 3]
[0, 1, 2, 2]
[0, 1, 2, 3]
[0, 2, 2, 2]
[0, 2, 2, 3]
[1, 1, 2, 2]
[1, 1, 2, 3]
[1, 2, 2, 2]
[1, 2, 2, 3]
[2, 2, 2, 3]

This algorithm produces the combinations in lexicographic order, with the elements in each combination in increasing order.

Begin with the first combination, which is the first k elements of the multiset ([0, 1, 1, 2] in the example above), and then at each step:

  1. Find the rightmost element that is less than the maximum value it can have (which is the element in the multiset that is the same distance from the right).
  2. Replace it with the first multiset element greater than it.
  3. Replace the remainder of the combination with the elements that follow the replacement in the multiset.
unsigned int next_multiset_combination(const unsigned int *multiset, unsigned int *ar, size_t n, unsigned int k)
{
	unsigned int finished = 0;
	unsigned int changed = 0;
	unsigned int i;

	for (i = k - 1; !finished && !changed; i--) {
		if (ar[i] < multiset[i + (n - k)]) {
			/* Find the successor */
			unsigned int j;
			for (j = 0; multiset[j] <= ar[i]; j++);
			/* Replace this element with it */
			ar[i] = multiset[j];
			if (i < k - 1) {
				/* Make the elements after it the same as this part of the multiset */
				unsigned int l;
				for (l = i + 1, j = j + 1; l < k; l++, j++) {
					ar[l] = multiset[j];
				}
			}
			changed = 1;
		}
		finished = i == 0;
	}
	if (!changed) {
		/* Reset to first combination */
		for (i = 0; i < k; i++) {
			ar[i] = multiset[i];
		}
	}
	return changed;
}

Here is an example program:

#include <stdio.h>

#include <multiset-combination.h>

static void print_array(const unsigned int *ar, size_t len, FILE *fptr)
{
	unsigned int i;
	fputc('[', fptr);
	for (i = 0; i < len; i++) {
		fprintf(fptr, "%d", ar[i]);
		if (i < len - 1) {
			fputs(", ", fptr);
		}
	}
	fputc(']', fptr);
}

int main(void)
{
	unsigned int multiset[] = {0, 1, 1, 2, 2, 2, 3};
	unsigned int n = sizeof(multiset) / sizeof(unsigned int);
	unsigned int numbers[] = {0, 1, 1, 2};
	const unsigned int k = sizeof(numbers) / sizeof(unsigned int);

	do {
		print_array(numbers, k, stdout);
		putchar('\n');
	} while (next_multiset_combination(multiset, numbers, n, k));

	return 0;
}

Here is a second example that prints the elements of the multisets:

#include <stdio.h>

#include <multiset-combination.h>

static void print_set(const unsigned int *ar, size_t len, const void **elements, 
		const char *brackets, printfn print, FILE *fptr)
{
	unsigned int i;
	fputc(brackets[0], fptr);
	for (i = 0; i < len; i++) {
		print(elements[ar[i]], fptr);
		if (i < len - 1) {
			fputs(", ", fptr);
		}
	}
	fputc(brackets[1], fptr); 
}

int main(void)
{
	unsigned int multiset[] = {0, 1, 1, 2, 2, 2, 3};
	char *elements[] = {"a", "b", "c", "d"};
	const size_t n = sizeof(multiset) / sizeof(unsigned int);
	unsigned int numbers[] = {0, 1, 1, 2};
	const size_t k = sizeof(numbers) / sizeof(unsigned int);

	do {
		print_set(numbers, k, (void*)elements, "[]", (printfn)fputs, stdout);
		putchar('\n');
	} while (next_multiset_combination(multiset, numbers, n, k));

	return 0;
}