The Guide for Time Series Data Projects is out.

Blog

# Lab Notes: Implementing Non-Collapsable Subselects

For a while now, CrateDB has supported subselects in the `FROM` clause, as long as the subselect can be collapsed into as a single select.

In one of our recent batches of work, my team addressed this limitation and now CrateDB supports non-collapsable subselects.

In this post, I will introduce you to the problem in a little more detail, and show you the steps we took to solve it.

## Introduction

### Collapsable Subselects

Here’s an example of a very simple query using a subselect:

``````  SELECT column_a,
column_b
FROM (
SELECT *
FROM my_table
WHERE column_a > 10
) AS subselect_a
ORDER BY column_b
``````

This is functionally the same as:

``````  SELECT column_a,
column_b
FROM my_table
WHERE column_a > 10
ORDER BY column_b
``````

That’s because, in the original query, the inner select applies the `WHERE` operator and the outer select applies the `ORDER BY` operator. And these two operators can be merged and executed in a single step.

We do not have to produce an intermediate result set from the inner select and then apply the operations of the outer select.

These are the only sorts of subselects that CrateDB was able to handle before we started our work.

### Non-Collapsable Subselects

Let’s take a look at a more complex example of a subselect:

``````  SELECT *
FROM (
SELECT *
FROM my_table
ORDER BY column_a DESC
LIMIT 10
) AS subselect_a
ORDER BY column_b
LIMIT 5
``````

Here, the inner select applies `ORDER BY` and `LIMIT` operators, and the outer select applies `ORDER BY` and `LIMIT` operators. There is no way to merge these operators and execute them in a single step. Instead, we must first produce an intermediary result set by applying the operators from the inner select to the table, and then apply the operators from the outer select to the intermediary result set.

The first select operation looks like this:

``````  SELECT *
FROM my_table
ORDER BY column_a DESC
LIMIT 10
``````

If the result set produced by this query was a table called `subselect_a`, the final select operation would look like this:

``````  SELECT *
FROM subselect_a
ORDER BY column_b
LIMIT 5
``````

It is the combination of these two queries that is needed.

In the past CrateDB did not support any sort of subqueries that required an intermediary result set, so these sorts of queries would raise an error.

Here’s another example:

``````  SELECT *
FROM (
SELECT *
FROM (
SELECT column_a,
column_b
FROM my_table
WHERE column_c > 10
) AS subselect_a
ORDER BY column_a DESC
LIMIT 10
) AS subselect_b
ORDER BY column_b
LIMIT 5
``````

This can be partially collapsed, to this:

``````  SELECT *
FROM (
SELECT column_a,
column_b
FROM my_table
WHERE column_c > 10
ORDER BY column_a DESC
LIMIT 10
) AS subselect_a
ORDER BY column_b
LIMIT 5
``````

Here, the innermost select has been collapsed. But we cannot collapse it any further, for the same reasons as before.

Adding support for non-collapsable subselects (which includes partially non-collapsable subselects) was our primary goal for this batch of work. And in the next section, I will show you how we did it.

## How We Implemented Non-Collapsable Subselects

### Changes to Query Rewriting

To support non-collapsable subselects, we had to make some changes to our `SubselectRewriter`. This is the module responsible for collapsing subselects, per the examples in the previous section.

Previously, the `SubselectRewriter` would construct a nested hierarchy of objects, modeling the nested hierarchy of selects. `QueriedSelectRelation` objects are used to model selects that operate on the results of a select operation, and `QueriedDocTable` objects are used to model selects that operate directly on a table.

Take this select:

``````  SELECT column_a,
column_b
FROM (
SELECT *
FROM my_table
WHERE column_a > 10
) AS subselect_a
ORDER BY column_b
``````

This would be modelled with the following nested object tree:

The top level object is always a `QueriedSelectRelation`, because the outermost select of a nested collection of selects is always operating on the result set of another select. And the innermost select is always operating directly on a table.

Previously, the `SubselectRewriter` would recursively collapse and rewrite this nested hierarchy of objects until only the `QueriedDocTable` object was remaining. This corresponds to a single select operating directly on a table.

The final tree would look like this:

But let’s take this select:

``````  SELECT *
FROM (
SELECT *
FROM my_table
ORDER BY column_a DESC
LIMIT 10
) AS subselect_a
ORDER BY column_b
LIMIT 5
``````

We would model this with the following nested object tree:

But we cannot collapse it any further.

The `SubselectRewriter` collapses the object tree from the inside out. Every time it encounters a `QueriedSelectRelation` above the innermost `QueriedDocTable`, it replaces both of them with a new `QueriedDocTable`.

If it was unable to complete one of replacements, it would give up, and the end result would be a partially collapsed tree, with a `QueriedDocTable` on the inside and at least one uncollapsed `QueriedSelectRelation` object wrapped around it. CrateDB was unable to handle `QueriedSelectRelation` objects, so an error would be raised.

To add support for `QueriedSelectRelation` objects, we first had to change something about the way this collapsing was being handled.

In the past, we never bothered to update the parent object’s references to the fields of the child object. Because we assumed that the only useful end result would be a single collapsed `QueriedDocTable` table. So a hierarchy of object with broken references was fine. Because this was an error condition.

But now we want to support a hierarchy of objects, every time we collapse an object pair, we need to update the parent object’s references.

To collapse this, we need to replace the references to `xx` with references to `(x+x)`:

We can collapse again by replacing `x` with `(x+x) + (x+x)`:

Implementing this object rewriting was a major chunk of the work we undertook.

Implementing this object rewriting was a major chunk of the work we undertook.

### Changes to Data Fetching

The most complex part of the implementation though has to do with optimizing the fetch operations that take place during the execution of non-collapsable subselects.

#### What Is a Fetch Operation?

Query execution fetches row data in two steps:

1. CrateDB fetches the minimum amount of data necessary for doing distributed filtering and ordering on specific columns
2. Once CrateDB knows precisely which rows need returning to the client, it requests the rest of the data needed for those rows only

Consider this query:

``````  SELECT column_a,
column_b,
column_c
FROM my_table
WHERE column_a > 10
ORDER BY column_b
LIMIT 100
``````

This is what happens when CrateDB executes this query:

1. Each `my_table` shard is asked to produce a set of `_id`, `column_a`) tuples where `column_a` is greater than 10.
2. These tuples are then sent to the query handling node, which merges them together into a single set. This is the first fetch operation.
3. The merged set of tuples is then ordered.
4. The set of tuples is then limited to 100.
5. The 100 document IDs we now have are used to request the rest of the data for each row. This is the second fetch.

This approach avoids unnecessary disk operations and unnecessary network traffic, thereby improving the performance of both overall.

#### What We Changed

To support non-collapsable subqueries, we need to modify this behaviour in a way that continues to optimise for both disk and network performance.

So, let’s take this modified example:

``````  SELECT *
FROM (
SELECT column_a,
column_b,
column_c
FROM my_table
WHERE column_a > 10
ORDER BY column_b DESC
LIMIT 10
) AS subselect_a
ORDER BY column_a DESC
LIMIT 5
``````

The inner query, if it was a standalone query, would fetch a set of (`_id`, `column_a`) tuples, and then fetch the values of `column_b` and `column_c` as the final step.

As the inner query of a subselect, this process is modified so that the outer query is notified that there’s a pending fetch operation for the values of `column_b` and `column_c`.

Because `column_b` and `column_c` are also not required for the execution of the outer query, this fetch can be postponed. Whether or not this fetch needs to be done later depends on whether the outer query's final fetch would fetch this data.

Because `column_b` and `column_c` are also not required for the execution of the outer query, but only for the final response, this fetch can be postponed. Whether the fetch needs to be done immediately or not, depends on whether the outer query operates on those columns (e.g. uses them in the `WHERE` clause, `ORDER BY` clause, and so on). In some cases, the fetch can be ignored entirely, because the fetch that the outer query produces fetches all the necessary data.

### Group By on Subselects

A `GROUP BY` on a subselect is more complex than on a table.

Consider this non-collapsable subselect:

``````  SELECT column_a,
count(*)
FROM (
SELECT column_a
FROM my_table
ORDER BY column_a
LIMIT 100
) AS subselect_a
GROUP BY column_a
HAVING count(*) > 10
ORDER BY 2
LIMIT 10
``````

Let’s take a look at how CrateDB now handles this.

First of all, the inner select is planned as if it was an independent query.

If that plan only touches the node that is handling the client request, then the `GROUP BY` of the outer select will be handled with a non-distributed grouping plan. However, if the plan for the inner select touches additional nodes, the `GROUP BY` of the outer select will be handled with a distributed grouping plan.

In the case of non-distributed grouping plan, the grouping operation takes place on the handler node.

In the case of distributed grouping plan, the grouping operation takes place on all nodes that hold the data being operated on.

In both cases, the grouping operation also involves calculating aggregate functions like `min`, `max`, `avg`, and so on.

If the grouping operation was performed across multiple nodes, the handling node fetches that data and merges it together. This includes merging the results of aggregate functions.

`HAVING`, `ORDER BY`, and `LIMIT` clauses are performed the `GROUP BY` is done (which for distributed grouping plans means after the merge is complete).

### Group By on Joins

A `GROUP BY` on a join requires similar handling.

Consider this query:

``````    SELECT table_a.column_b
table_b.column_b,
count(*)
FROM FROM table_a
INNER JOIN table_b
ON table_a.column_a = table_b.column_a
GROUP BY table_a.column_c,
table_b.column_c
HAVING count(*) > 10
ORDER BY 3
LIMIT 10
``````

To handle this, we need to rewrite the query so that the join conditions are separated from the `GROUP BY` operation and any subsequent operations.

We can do that, like so:

``````   SELECT subselect_a.ta_column_b,
subselect_a.tb_column_b,
count(*)
FROM (
SELECT table_a.column_b AS ta_column_b,
table_b.column_b AS tb_column_b,
table_a.column_c AS ta_column_c,
table_b.column_c AS tb_column_c,
FROM table_a
INNER JOIN table_b
ON table_a.column_a = table_b.column_a
) AS subselect_a
GROUP BY subselect_a.ta_column_c,
subselect_a.tb_column_c
HAVING count(*) > 10
ORDER BY 3
LIMIT 10
``````

The inner select is planned as if it was a independent select and a distributed or non-distributed plan is produced. On top of this plan, the grouping plan is created along with any subsequent operations.

This whole process functions like it does with grouping on subselects, as detailed in the previous section.

### Implement DISTINCT for All Queries

The `DISTINCT` operator is implemented by actually adding a group by operation on top of the existing query. Example:

When you use a `DISTINCT` operation, like this:

``````SELECT DISTINCT column_a,
column_b,
column_c
FROM my_table
WHERE column_a > 10
ORDER BY 2
``````

Internally this is rewritten by adding a `GROUP BY` operation, like so:

``````  SELECT column_a,
column_b,
column_c
FROM my_table
WHERE column_a > 10
ORDER BY 2
GROUP BY column_a,
column_b,
column_c
``````

Previously doing work already covered in this post, it was not possible to use `DISTINCT` with queries that already had a `GROUP BY` clause.

Why?

Consider this query:

``````SELECT DISTINCT column_a,
max(column_b)
FROM my_table
GROUP BY column_a,
column_b
``````

Internally, this would be rewritten as:

``````  SELECT column_a,
max(column_b)
FROM (
SELECT DISTINCT column_a,
max(column_b)
FROM my_table
GROUP BY column_a,
column_b
) AS subselect_a
GROUP BY column_a,
max(column_b)
``````

This is a non-collapsable subselect, and so wasn’t supported.

With the addition of support for non-collapsable subselects and grouping on subselects, `DISTINCT` operations can now be rewritten internally and are thereby now supported for all queries.

## Wrap Up

In this batch of work we:

• Implemented nested subquery class attribute rewriting, finally ;)
• Rewrote how row fetching works so that subqueries could be supported without intermediary result sets prematurely fetching data (which would degrade disk and network performance, increasing infrastructure costs)
• Modified CrateDB’s grouping logic (which includes aggregate functions) so that subselects are supported
• Added support for grouping on joins, because now they can be internally rewritten as subselects
• Added support for DISTINCT for all queries, because the queries we rewrite them to internally are now also supported

This isn’t the end of our work on subselects.

In my next lab notes post I’m going to take a look at some additional subselects improvements we’ve been working on.

Stay tuned.