Join performance to the rescue

Written by Michael Kleen | 2023-07-24

Introduction

CrateDB is a SQL standard-compatible distributed database. This means each query executed on CrateDB goes through a query optimizer, which tries to find the best logical plan possible for a given query. In some situations, the query optimizer cannot generate the optimal execution plan for a given query. In cases when the performance of a query suffers from an inefficient plan, it is helpful to have more control over the query execution. In this post, we will show how the query optimizer in CrateDB can be fine-tuned for specific queries.

Problem description

Let's explore this together with a real-world example. Imagine the following tables:

Now assume the following use-case: We would like to send a special offer to our cloud customers, especially those running the big clusters. We need to know the total number of nodes each customer has in its clusters and all the necessary customer details to contact them. So we need all the customer's details from the customers table and the summary of all nodes each customer owns. Therefore we need to join customers with clusters and aggregate the number_of_nodes:

This query takes over 300 ms on my M1 Macbook with a single node setup with 150k rows for customers and 100k rows for clusters.

Investigation

Let's investigate and see if we can improve the performance. First, we will have a look into the execution plan using the EXPLAIN command:

Let’s dive into the details of the query plan to understand what is happening here. The query plan is a tree of operators that gives us insights into how this query is handled internally. The order of execution is handled from the bottom to the top. Thus we will also be looking into the operations backward:

The operators Collect[doc.customers] and Collect[doc.clusters] collect data from the tables customers and clusters.
The operator HashJoin[(customer_id = id)] performs a hash-join on the rows from the underlying collect operators with the join condition customer_id = id.
The Eval operator will pick the values [id, name, address, email, country, number_of_nodes, customer_id] from the underlying source and return them in the described order.
The operator GroupHashAggregate[id, name | sum(number_of_nodes)] aggregates the values of number_of_nodes for each row.
The operator OrderBy[sum(number_of_nodes) DESC] will apply to descend ordering on the aggregation value.
The final Eval operator will rearrange the order and naming of the result tuple.

So how can this query plan be improved? A common case where the optimizer may be wrong is the order in which tables are joined together. The optimizer usually places the smaller table on the right side of a hash-join because, in most cases, this improves performance, and exactly this happened here. However, one exception is when the data volume of the smaller table exceeds the larger table. Let's investigate further into this. We can gather more information about the two tables customers and clusters from sys.shards:

This indicates that the table customers with fewer rows has a larger data volume than clusters, and this means the chosen join-order by the optimizer is not ideal for a hash-join. There are statistics available in pg_catalog.pg_stats for more details:

Now we can see exactly how the data volume is distributed over the columns of the tables. So for these problematic cases, there is the option to disable join-reordering and rely exactly on the order of the tables as described in the query. The join-reordering for the different join operators CrateDB supports can be disabled using the following session settings:

Please be aware that these settings are experimental and may change in the future. We have in our query only a hash-join operator. Therefore, it is sufficient to disable the join-reordering only for those. Let’s give this a try and check the new execution plan:

As expected, the order of the Collect operators remains now exactly as described in the original query. To evaluate the performance of these two variants properly, we should run a benchmark. We can use the cr8 CrateDB command line tools for that purpose. The run-spec functionality of cr8 runs crate-benchmarks and helps to evaluate the results:

This looks good! Execution time for this query improved on average from 324ms to 180ms. So, preventing the optimizer from re-arranging the order of the joined tables improved the performance by 80%.

Summary

In this post, we covered how the performance of a query containing a join and an aggregation can be improved by fixating the join order. We usually recommend relying on the optimizer, but in certain cases, it’s worth investigating further. We also currently improving our query optimizer, moving to a cost-based approach to make the query optimization better.

Are you interested in CrateDB? Have a look at the Documentation. If you have questions, check out our CrateDB Community. We're happy to help!

Try CrateDB Cloud for free

View full post