Data querying is a critical component of most applications. With the advance of rich client-driven Ajax applications and document oriented databases, new querying techniques are needed, and Resource Query Language (RQL) defines a very simple but extensible query language specifically designed to work within URIs and query for collections of resources. The NoSQL movement is opening the way for a more modular approach to databases, and separating out modeling, validation, and querying concerns from storage concerns, but we need new querying approaches to match more modern architectural design.

RQL

RQL simply consists of named operators that take a set of arguments and syntactically follows the standard parenthesis’s-based call syntax used by most modern languages (JavaScript, Java, C, PHP, Python, etc) as prefix notation operators. Everything in RQL can be expressed as a set of these operators by nesting and sequencing them. Nothing needs to be altered to add new operators. The only additional syntax in RQL are comparisons that are simply sugar or shorthand for basic RQL operators, and thus do not introduce any parsing or execution complexities. For example, to express a query for all the resources where the rating equals 5 and the price is less than 10:

and(eq(rating,5),lt(price,10))

RQL is specifically designed for the web, using URI appropriate delimiters, and using URI character encoding for all text/string values. It provides a shorthand or sugar for certain operators that makes RQL a completely compatible superset of the default form encoding, application/x-www-urlencoded, as well as Feed Item Query Language (FIQL). The example above could alternately be written in RQL as:

rating=5&price=lt=10

This syntax makes it easy to use RQL with existing forms and query processors. The documentation in the RQL git repository contains many more details about the defined operators and syntax. Let’s look at more of the motivation for RQL.

What’s wrong with SQL?

SQL has long been the query language of choice for applications, but a couple factors have motivated alternate query languages. First, the NoSQL movement continues to gain ground, eroding the notion of using SQL as the basis for all database operations. Second, the explosion in Ajax applications has moved significant application logic to the browser, increasing the need for capable queries to be delivered from the browser to the server within URLs. SQL predates URLs by some 20 years, and does not fit well with URLs. SQL is not designed for document, graph, or object-oriented databases and is not only awkward within URLs, it is a terrible security issue to provide any pass-through of query strings directly to SQL (known as SQL injection). SQL is particularly hazardous since retrieval and modifying operations can all be combined in the same query. SQL’s syntax is also completely unlike that of modern programming languages. Trying to shore up SQL for URIs and modern databases with an SQL-derivative query language is an exercise in futility.

Again, RQL is designed to be URI friendly, leveraging URI encoding and designated delimiters for syntax that works perfectly in web requests. RQL also borrows from the familiar parenthesis-based call syntax of JavaScript, Python, C, C++, Java, PHP, etc. to give a highly composable and extensible query language.

What’s wrong with Map (and reduce) functions and key-value stores?

Using map and reduce functions for generating indexes is awesome! CouchDB’s use of mapping and reducing functions as the basis for highly scalable, incredibly flexible (Turing complete) indexed queries is brilliant, and is one of the main reasons for CouchDB’s enormous popularity. Map/Reduce functions are also an integral part of many other NoSQL databases like Riak, Hadoop, MongoDB, and more. Map/Reduce functions are somewhat low-level, however, and there are a lot of aspects to getting appropriate data out of databases besides just creating flexible indexes. Map/Reduce creates a great foundation to build on, but let’s look at some of the other useful tools for database querying. For many of these, SQL actually has constructs that we can learn from. It is quite beneficial to utilize a query language layer on top of a Map/Reduce layer.

SQL is a domain specific language (DSL). DSLs are powerful for improving our productivity by providing a syntax that is especially well suited for the task at hand. We have seen the power of DSLs in other areas of programming. For example, CSS selector querying has revolutionized how we retrieve DOM nodes in the browser. This functionality is the core of most modern JavaSscript libraries. Of course it is certainly still possible to retrieve nodes by purely programmatic APIs, but CSS selector querying makes life much easier. A query language plays the same role. While it is possible to query a database through programmatic means, and as I discussed in NoSQL architecture, is very important for maximizing control, modularity, and efficiency, this does not negate the remarkable benefit that can be afforded also being able to utilize a query language on top of a database.

RQL continues the tradition of providing a language specifically designed for the needs of querying data, but doing so with a much simpler, easy to use syntax. RQL is also highly extensible, making it extremely easy to utilize custom map-function-based indexes/views to compose and combine with other querying mechanisms. For example, we could have RQL translate queries to retrieve data from simple single-property indexes and from complex indexes:

price<1000&customProductEvaluationFunction(4)

SQL does a great job of handling massive permutations of queries and finding the most efficient usage of a fixed set of indexes to find results. With map-reduce alone, you create an index or view for every different type of query. With queries that can take many different forms, it can often be unfeasible to generate a mapping function for every conceivable permutation. This approach simply doesn’t scale, developers often can’t create large number of views and it wouldn’t be efficient to keep a large number of views/indexes up to date. With SQL, the query execution engine can take queries in many forms, including multiple parameters, various constraining columns, and more, and find the most efficient execution path utilizing existing indexes. Query engine implementations have the freedom to make appropriate optimizations because the query language is decoupled from the indexes.

Unfettered query permutations is not without hazards. One of the advantages to key-value stores and mapping functions is the guarantee of O(log n) queries. SQL tends to make it far too easy to generate extremely expensive queries which may not appear to be problematic until a database grows large enough to cause problems. Because of this, RQL is designed in complement with a RQL templating form. This is essentially an application of URI templating with RQL, and allows one to define the set of acceptable RQL queries (without having to write out each individual form). See the RQL templates section below for more on this.

A semantically well-defined query language also serves to make querying more transparent. Interaction transparency is a key concept of REST, and allows intermediaries and components to participate in a meaningful way that can’t be achieved with opaque queries. Frameworks can provide client-side querying of cached or replicated data, proxies can understand queries, and queries can be generated by reusable code that can be used across many applications.

What’s wrong with JSONQuery?

With the increased popularity of JSON-based data representations, we have sought to provide a convenient syntax for querying JSON data by extending and improving JSONPath. This syntax is called JSONQuery. The JSONPath syntax that JSONQuery inherits is still not well-aligned with URL structures and is very difficult to extend due to the fact that each operator is based on a different syntax. Creating new operators thus requires modifications to the parsing engine.

RQL Templates

RQL templates provide a means for defining a query or a constrained set of queries and the variables that may be substituted into the query. When queries can be made from the web, we must deal with the challenge of untrusted users, and unmitigating querying capabilities typically makes a server highly vulnerable to overload and resource exhaustion. With RQL templates, we can specify a RQL template (or set) that we know can be efficiently processed by the server (typically O(log n) time). This is also one of the benefits of map-reduce functions, but RQL templates provide more flexibility, still allowing various permutations with adjustable constraint.

One of the key concepts about the Map/Reduce approach is the emphasis on utilizing information about expected query forms up front to generate customized indexes. Then most of the work only needs to be performed once per data change rather than doing large amounts of work for each query (and most applications read and query much more than write/change data). RQL templates not only serve to constrain queries, but RQL in template form can also serve to conveniently inform the creation of views/indexes and their map functions. One can auto-generate map functions and indexes based on RQL templates, fully leveraging the DSL approach. Here is an example of a template, that would indicate that based on the available queries, the price and rating properties should be indexed:

{&price,rating}

Because RQL is URI-based, templates can naturally be written using the standard URI templating syntax. For example, a simple RQL template might look like:

{&price,rating}&limit({count})

This indicates acceptable properties to search on (price and rating) and that the query must include a limit on the number of items returned (with the “count” variable).

While standard URI templating provides a good foundation for templating, RQL templates have some additional forms for greater expressibility. We can also use square brackets to indicate a fragment of a query that may or may not exist, or may occur a variable number of times. For example, we could indicate support for sorting, by allowing an optional sort operation:

{&price,rating}&limit({count})[&sort([+,-][price,rating])]?

With RQL templates, we can give a “menu” of possible queries that the server supports. This follows the fundamental hyperlink principle of REST, providing self-descriptive navigation to the user agent. Clients and users can easily discover what queries can be made against a server and the appropriate format.

RQL templates can also be used like parameterized prepared statements in SQL, where values can be provided outside the query. This can simplify query generation by automating the value encoding process which can be a source of vulnerability if done improperly (hence SQL injection is such a frequent vulnerability of web applications).

Finally, a set of RQL templates can also be used to generate appropriate indexes. Indexed properties can be selected or map (and reduce) functions can be determined from templates.

RQL templates are still a relatively new mechanism with RQL, and we will explore implementation possibilities in a later post.

Implementation

A JavaScript implementation of RQL is available on the RQL github project. This version runs a CommonJS module with async support and runs on NodeJS, Dojo 1.6, RequireJS, Narwhal and any other CommonJS platform (it is pure JavaScript), and is integrated into Persevere 2.0.

The most basic way to use the JavaScript implementation is to query a JavaScript array. To do this we prepare our query, and execute:

var query = require("rql/js-array").query;
// some sample data:
var products = [
  {price:14.99, rating: 5},
  {price:5.99, rating: 3}];

var underTen = query("price<10");
var productsUnderTen = underTen(products);
productsUnderTen.length -> 1

Using RQL with Persevere

RQL is the core query language for Persevere 2.0. Persevere runs on NodeJS and provides a JSON-oriented HTTP/REST interface to various data stores. This interface includes support for RQL. With Persevere, you can create a new data store (using the included data explorer) or by creating one programmatically. The Persevere example wiki includes a Page model as an example store. We can easily query the store with RQL-based URLs. For example, to find the first 10 pages that have a status of “published”:

GET /Page/?status=published&limit(10)

In this example, we sort the pages by the author, and list the author and status, limiting to 10 items, starting at an offset of 10:

GET /Page/?sort(createdBy)&select(createdBy,status)&limit(10,10)

The composibility of RQL gives Persevere powerful web-based querying to data stores. Persevere uses the RQL parser module to parse the queries and deliver them to the underlying data store (converts to MongoDB queries, SQL queries, etc.).

The RQL implementation in Persevere also lets us generate queries using JavaScript chaining, allowing us to create queries in JavaScript with a similar look to URL-based queries. The previous two examples can be done in JavaScript. First we can filter and limit:

var Page = require("model/page").Page;
Page.query().eq("status","published").limit(10).forEach(function(page){
  // this is called for each item returned from the query
});

And we can sort, select, and limit:

var Page = require("model/page").Page;
Page.query().sort("createdBy").select("createdBy","status").limit(10,10).forEach(function(page){
  // process each item
});

The query is sent to the underlying data source (can be in memory, MongoDB, Redis, etc) once an array method is called. Array methods include forEach, map, filter, and other iterative array methods.

Using RQL with Dojo

We can also use RQL from Dojo. Download the RQL source files into your JavaScript directory and then you can make queries with RQL. A particularly powerful use case for RQL is as a query engine for Dojo object stores (the new store API introduced in 1.6). Replacing the query engine of an object store is as simple as setting the queryEngine to the RQL query executor:

define("my-module", ["dojo/store/Memory", "rql/js-array"],
function(Memory, jsArray){
  var memoryStore = new Memory({data:myData});
  memoryStore.queryEngine = jsArray.query;
  memoryStore.query("price<10&sort(rating)").forEach(function(product){
    // handle each product
  });
});

We can also utilize the JavaScript chaining API with the rql/query module in Dojo as well.

Adding Operators

RQL is designed to be extensible. The JavaScript makes it easy to add new operators. We can simply add a new operator to the operators object, exported from rql/js-array. Let’s imagine we want to add a operator that finds all products that are on sale for at least a given percentage:

define("my-module", ["rql/js-array"],
function(jsArray){
  jsArray.operators.saleAt = function(percent){
    var result = [];
    for(var i = 0, length = this.length; i < length; i++){
      var item = this[i];
      if((item.regularPrice - item.salePrice) / item.salePrice * 100 > percent){
        result.push(item);
      } 
    }
    return result;
  };
  var productsOnSaleForMoreThan20PercentOff = jsArray.query("saleAt(20)")(products);
});

Future implementation work will include a tool for generating map and reduce functions based on RQL queries, and templating tools.

RQL: A Modern Query Language

RQL is designed for modern application development. It is built for the web, ready for NoSQL, and highly extensible with simple syntax. This is a query language for next generation database interaction.