Open source XQuery implementations

These are all of the open source XQuery implementations I’m aware of, as described in their own words.

BaseX

Java

BaseX is a very light-weight, high-performance and scalable XML Database engine and XPath/XQuery 3.0 Processor, including full support for the W3C Update and Full Text extensions. An interactive and user-friendly GUI frontend gives you great insight into your XML documents.

  • High-performance database storage with text, attribute, full-text and path indexes.
  • Efficient support of the W3C XPath/XQuery Recommendations and Full Text and Update Extensions.
  • One of the highest available compliance rates for all supported specifications.
  • Client/Server architecture, supporting ACID safe transactions, user management, logging.
  • Highly interactive visualizations, supporting very large XML documents.
  • The only realtime XQuery editor available, with syntax highlighting and error feedback.
  • Wide range of interfaces: REST/RESTXQ, WebDAV, XQJ, XML:DB; clients in many different languages

Berkeley DB XML

C

Oracle Berkeley DB XML is an open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB and inherits its rich features and attributes.

  • Built on top of Oracle Berkeley DB, inheriting all its rich features, such as transactions and replication
  • Provides a document parser, XML indexer, and XQuery engine on top of Oracle Berkeley DB to enable the fastest, most efficient retrieval of data
  • Applications perform database administration, eliminating the need for a DBA and allowing continuous, unattended operation
  • Lowers total cost of ownership with extreme performance to reduce hardware costs

eXist-db

Java

Schema-less Database

The high-performance native XML database engine stores textual or binary data and documents without requiring a database schema.

Rapid Prototyping

Upload your data and start writing code immediately.

Application Packages

eXist-db applications are packaged as single archive files that are installed directly in the database. Deployment, upgrading to new versions and distribution become a breeze.

Open

eXist-db is fully based upon Open Standards and Open Source making it a future-proof and substainable choice.

Browser-based IDE

A Browser-based IDE allows managing and editing all artifacts belonging to an application. Syntax-coloring, code-completion and error-checking help to get it right.

Forms Framework

Being a complete solution, eXist-db tightly integrates with XForms for complex form development.

Rich Stack of Libraries

Develop entire applications in XQuery using eXist-db’s rich set of libraries.

Community Driven

Being Open Source since 2001, eXist-db development has always been driven by the needs of a large user community.

Galax

OCaml

Galax is an open-source implementation of XQuery, the W3C XML Query Language. It includes several advanced extensions for XML updates, scripting, and distributed programming.

Galax comes with a state of the art compiler and optimizer. Most of Galax’s architecture is formally documented, making it ideal for users interested in teaching XQuery, in building new language extensions, or developing new optimizations.

Galax is implemented in O’Caml, a fast, modern, type-inferring, functional programming language descended from the ML (Meta Language) family.

Key Features.

  • XQuery 1.0 and XPath 2.0 support with 99.4% conformance.
  • XQuery type checking, modules, and XML schema import.
  • XQuery Updates, XQuery Scripting.
  • Web Services, and Distributed XQuery.

GCX

C++

The G(arbage) C(ollected) X(Query) engine is an in-memory XQuery engine, which is the first streaming XQuery engine that implements active garbage collection, a novel buffer management strategy in which both static and dynamic analysis are exploited. This technique actively purges main memory buffers at runtime based on the current status of query evaluation. In our paper, we show the various stages in evaluating composition-free XQuery with the GCX engine. Our technique has a significant impact on reducing both main memory consumption and query execution time, as can be seen in our experiments.

HXQ

Haskell

HXQ is a fast and space-efficient translator from XQuery (the standard query language for XML) to embedded Haskell code. The translation is based on Template Haskell. HXQ takes full advantage of Haskell’s lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation for forward queries (queries that do not contain backward steps). This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since it is based on XML trees, rather than SAX events. Since HXQ uses lazy evaluation, you get the first results of non-blocking queries immediately, while the non-streaming XQuery processors must first parse the entire input file and construct the whole XML tree in memory before they produce any output.

Finally, HXQ can store XML documents in a relational database (currently MySQL or SQLite), by shredding XML into relational tuples, and by translating XQueries over the shredded documents into optimized SQL queries. The mapping to relational tables is based on the document’s structural summary, which is derived from the document data rather than from a schema. It uses hybrid inlining to inline attributes and non-repeating elements into a single table, thus resulting to a compact relational schema. For each such mapping, HXQ synthesizes an XQuery that reconstructs the original XML document from the shredded data. This XQuery is fused with the user queries using partial evaluation techniques and parts of the resulting query are mapped to SQL queries using code folding rules so that all relevant predicates are promoted to SQL. This pushes most evaluation to the database query engine, thus resulting to a fast execution over large data sets.

MXQuery

Java

MXQuery is a low-footprint, extensible Open-Source XQuery Engine implemented in Java
Besides a high level of compliance with XQuery 1.0, XQuery Update Facility 1.0, XPath/XQuery Fulltext 1.0, it provides a wide coverage of upcoming W3C standards proposals (XQuery 3.0, Scripting), support for a wide range of Java Platforms (including mobile/embedded devices), cross-compilation to Javascript (for browser usage) and support for data stream processing/CEP .

Nux

Java

Nux is an open-source Java toolkit making efficient and powerful XML processing easy. It is geared towards embedded use in high-throughput XML messaging middleware such as large-scale Peer-to-Peer infrastructures, message queues, publish-subscribe and matchmaking systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc.

Have you ever tried to take advantage of a robust and natural commodity Java tool set for XML, XQuery, XPath, schema validation, binary XML, fuzzy fulltext similarity search and related technologies, yet were not ready to accept a significant performance penalty? Chances are most tool sets turned out not to be particularly robust and natural, that they incurred dramatic penalties when used in straightforward manners, and that their complex idiosyncracies had a strong tendency to distract from the real job and use cases you wanted to get done in a timely manner.

Nux helps to avoid XML nightmares, enabling you to mix and match powerful main-memory XML tools in natural, straightforward, seamless, effective and standards compliant manners.

Nux reliably processes whatever data fits into main memory (even, say, 250 MB messages), but it is not an XML database system, and does not attempt to be one. Nux integrates best-of-breed components, containing extensions of the XOM, Saxon and Lucene open-source libraries.

OrientX

Java

OrientX is a native XML database system,developed under Renmin University of China.
The word ‘OrientX’ is an abbreviation of Original RUC IDKE Native XML.
The name is pronouced orient-X.
OrientX system stores XML data and preserves its tree structure.
It also allows users to retrieve XML data in the form of XPath/XQuery query language.

Patternist

C++

Patternist is an XPath 2.0, XQuery 1.0 and XSL-T 2.0 implementation, licensed under the GNU LGPL license.

Development priority is:

  • Conformance and interoperability
  • The HCI aspect, that it is user friendly and has good usability
  • Clean, compact implementation that compiles without warnings and is well documented

PHP XML Classes

PHP

A collection of classes and resources to process XML using PHP

Qexo

Java

Qexo is a partial implementation of the XML Query language. It achieves high performance because a query is compiled down to Java bytecodes using the Kawa framework.
Kawa also includes a proof-of-concept implementation of XSLT.

Rainbow

Java

As more and more data becomes available in XML format, a general purpose XML repository management system (XRMS) is needed. Rather than developing it from scratch, there is great interest to exploit existing relational database technology as backend engine to store, retrieve, and query XML data set due to its maturity and performance.

However, due to the mismatch between the complexity of semi-structured XML data model and the simplicity of flat relational model, there are many ways to store one document in a relational database, and a number of heuristic techniques need to overcome. These include the data model mapping strategies, the transformation of XML queries into SQL queries, XML order handling, XML update propagation, XML query optimization, and XML indexing.

The Rainbow System we are proposing is based on a flexible strategy of mapping XML to the relational data model using generic XQuery loading statements, supports optimized XML order-sensitive query processing via an XML Query Algebra, XQuery to SQL translation, and serve as a solid yet scalable foundation for extended XML-based applications.

Specific next research goals within the larger Rainbow project that we are targeting include Update Query Processing through the XML virtual Views, Incremental XML view maintenance, XQuery Multiple Query Optimization, and XML Query Optimization by exploiting Materialized XML Views.

Sedna

C and C++

Sedna is a free native XML database which provides a full range of core database services – persistent storage, ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.

XBird

Java

XBird is a light-weight XQuery processor and database system written in Java. The light-weight means reasonably fast and embeddable.

Features

XBird introduces the following features:

  • XQuery Processor
  • Native XML Database Engine
  • Embeddable Database Engine
  • Distributed XQuery Processor
  • Support for HTML web page scraping

XBird is currently optimized for read-oriented workloads. It passes about 91% of the minimal conformance of XQuery Test Suite.

Xidel

Pascal

Xidel is a command line tool to download and extract data from HTML/XML pages as well as JSON APIs.

xq2xml

Java and XSLT

This distribution consists of a set of XSLT2 stylesheets to manipulate XQuery Expressions and express the query using other syntaxes. Currently stylesheets converting to XQueryX and XSLT are supplied. Also stylesheets to transform Xquery expressions, firstly an identity transform and secondly a stylesheet to remove any use of axes not supported in systems that do not implement the full axis feature of XQuery.

As XSLT requires XML input, or at least an XML view of non-XML input, and XQuery does not use an XML syntax, the stylesheet distribution is augmented by a Java based parser that parses an XQuery expression (or in its most general form, a series of XQuery modules) and returns an XML document. This parser is a trivial (10 or so line) wrapper around the Java class provided by Scott Boag on behalf of the the XQuery/XSLT working groups as part of the test parser applet distribution. (Scott has kindly included this functionality now in the parser distribution, so a separate wrapper is no longer needed).

The XQuery working group also provides an XQuery test suite, and this distribution contains (for xq2xqx) a set of test files converted to XQueryX syntax) and (for xq2xsl) a set of test files in XSLT syntax, and a test report in the official test report syntax for xq2xsl once coupled to an XSLT2 execution system (assumed to be Saxon8, for this release) considered as a new implementation of XQuery. The stylesheets and auxiliary Java code to process the test suite are also provided.

XQEngine

Java

XQEngine is a full-text search engine for XML documents. Utilizing XQuery as its front-end query language, it lets you interrogate collections of XML documents for boolean combinations of keywords, much as Google and other search engines let you do for HTML. XQuery, however, provides much more powerful search capabilities than equivalent HTML-based engines, since its XPath component lets you specify constraints on attributes and element hierarchies, in addition to the specific word content you’re searching on. Refer to the W3C’s XML Query website to see what the W3C and other vendors are doing with XQuery and XPath.

XQEngine is a compact (roughly 300K) embeddable component written in Java. It’s not a standalone application and requires a reasonable amount of Java programming skill to use. It has a straightforward programming interface that makes that fairly easy to do. It should work well as a personal productivity tool on a single desktop, as part of a CD-based application, or on a server with low to moderate traffic.

XQP

Java

XQP is a dynamic and scalable architecture for indexing and querying a large distributed repository of schema-less XML data, implemented on top of a structured peer-to-peer (P2P) system (Pastry). Unlike other approaches, XQP can process most forms of XQuery extended with full-text search, even those queries that search for multiple documents that are related through join conditions. The indexing is based on both the text terms and the structural summary of a document. Given an XQuery, our system can find all the plausible structural summaries applicable to the query using one peer lookup only. Each structural summary is associated with a small, dynamically adapting sub-space of peers who share the inverted lists related to all the documents that conform to this particular structural summary. Peers may participate in multiple sub-spaces, while the size of each sub-space may grow and shrink dynamically, depending on the workload. To locate multiple documents that are related through join conditions, XQP uses value histograms distributed over the P2P network.

XQuare

Java

The XQuare project provides a set of Java components for extending J2EE platforms with XML-based, heterogeneous information integration capabilities, using the XQuery language. The XQuare components allow enterprise information stored in a large variety of data sources to be accessed in real-time and shared as uniform XML views, ready for further filtering, processing and publishing. Incoming XML data can also be stored directly in relational databases, without the need to transform it first into procedural API calls (such as JDBC) or Java objects.

The XQuare components are designed to be embedded into Java-based Web or application servers, and rely on the standard J2EE services for exchanging, processing and publishing XML information. By adding to those services a new configurable data integration service, able to collect in real-time business information in distributed, heterogeneous data sources, the XQuare components can help dramatically reduce development costs for applications such as enterprise portals, B2B information exchange automation, database publishing…

XQuilla

C++

XQilla is an XQuery and XPath 2 library and command line utility written in C++, implemented on top of the Xerces-C library. It is made available under the terms of the Apache License v2.

Zorba

C++

Modern Query Processing

By providing a high level query language, Zorba allows you to process large amounts of data productively and. For example, it supports complex SQL-like queries such as (outer-)joins, grouping, sorting, transformation, full-text, indexes, and declarative updates. Zorba brings powerful queries database users take for granted to NoSQL without giving up on scalability.

Designed for Flexible Data: JSON and XML

Our query language is not your grandma’s SQL. It supports novel concepts purposely designed for flexible, schema-less data. The language covers the entire spectrum, from the simplicity of JSON to the completeness of XML.

Advanced Data Processing Libraries

Zorba users have immediate access to advanced, out of the box, data processing libraries. Built-in functions empower queries to use advanced math processing, string matching, data cleaning, or full-text features. Beyond that, conversion (e.g. CSV, PDF, HTML, ZIP), communication (e.g. HTTP, SMTP, IMAP), and connector (e.g. SQLLite, JDBC, CouchBase) libraries allow the user to productively process the Web’s data.

Pluggable Store

Zorba allows for plug-in alternative store implementations under the hood of the query processor. This enables users to seamlessly process data stored in different places: main memory, browser, disk, or cloud stores.