Query Optimization 101¶

Essential principles for optimizing queries in CrateDB…

…while avoiding the most common pitfalls. The patterns are relevant to both the troubleshooting of slow queries and the proactive tuning of CrateDB deployments, and they show how small adjustments to filters, data transformations, and schemas can yield dramatic improvements in execution speed and resource utilization.

Early Filtering and Data Reduction¶

Minimize processed data early in queries to reduce overhead.

Do all filtering as soon as possible¶

Sometimes it may be tempting to define some VIEWs, some CTEs, do some JOINs, and only filter results at the end, but in this context the optimizer may lose track of how the fields we are filtering on relate to the indexes on the actual tables.

Whenever there is an opportunity to filter data immediately next to the FROM clause, try to narrow down results as early as possible.

See using common table expressions to speed up queries for an example.

Avoid `SELECT *`¶

CrateDB is a columnar database. The fewer columns you specify in a SELECT clause, the less data CrateDB needs to read from disk.

-- Avoid selecting all columns
SELECT *
FROM customers;

-- Instead, select explicitly the subset of columns you need
SELECT customerid, country
FROM customers;

Avoid large result sets¶

Be aware of the number of rows you are returning in a SELECT query. Analytical databases, such as CrateDB, excel at processing large data sets and returning small to medium-sized result sets. Serializing, transporting them over the network, and deserializing large result sets is expensive. When dealing with large result sets in the range of several hundred thousand records, consider whether your application needs the whole result set at once. Use cursors or LIMIT/OFFSET to fetch data in batches.

See also Fetching large result sets from CrateDB for examples.

Propagate LIMIT clauses when applicable¶

Similarly to the above, we may have for instance a LIMIT 10 at the end of the query and to get there it may have been sufficient to only pull 10 records (or some other number of records) at an earlier stage from some given table. If that is the case duplicate or move (depending on the specific query) the LIMIT clause to the relevant place.

In some cases, we may not know how many rows we need in the intermediate working sets but we know that there will be 10 records on the last day. Doing filtering early will help the optimizer and can protect the database from accidentally processing years of data. By not filtering early, the load on your cluster will increase tremendously.

So for instance instead of:

SELECT
  factory_metadata.factory_name,
  device_data.device_name,
  device_data.reading_value
FROM device_data
INNER JOIN factory_metadata ON device_data.factory_id = factory_metadata.factory_id
WHERE reading_time BETWEEN '2024-01-01' AND '2025-01-01'
LIMIT 10;

do:

WITH filtered_device_data AS (
   SELECT
         device_data.factory_id,
         device_data.device_name,
         device_data.reading_value
   FROM device_data
   WHERE
        /*
          We are sure one month of data is sufficient to find
          10 results and it may help with partition pruning
        */
         reading_time BETWEEN '2024-12-01' AND '2025-01-01'
   LIMIT 10
)
SELECT
  factory_metadata.factory_name,
  filtered_device_data.device_name,
  filtered_device_data.reading_value
FROM filtered_device_data
INNER JOIN factory_metadata ON filtered_device_data.factory_id = factory_metadata.factory_id;

Use filters with array expressions when filtering on the output of UNNEST¶

On denormalized data sets, you may observe records including columns storing arrays of objects.

You may want to unnest the array in a subquery or CTE and later filter on a property of the OBJECTs.

The next statement (in versions of CrateDB < 6.0.0) will result in every row in the table (not filtered with other conditions) being read and unnested, to check if it meets the criteria on field1.

SELECT *
FROM (
   SELECT UNNEST(my_array_of_objects) obj
   FROM my_table
)
WHERE obj['field1'] = 1;

However, CrateDB can do a lot better than this if we add an additional condition like this:

SELECT *
FROM (
   SELECT UNNEST(my_array_of_objects) obj
   FROM my_table
   WHERE 1 = ANY(my_array_of_objects['field1'])
) AS subquery
WHERE obj['field1'] = 1;

CrateDB leverages indexes to only unnest the relevant records from my_table which can make a huge difference.

Efficient Query Structure and Constructs¶

Optimize SQL logic by prioritizing efficient syntax and avoid redundant operations.

Only sort data when needed¶

Indexing in CrateDB is optimized to support filtering and aggregations without requiring expensive defragmentation operations, but it is not optimized for sorting.

Maintaining a sorted index would slow down ingestion, that is why other analytical database systems like Cassandra and Redshift make similar trade-offs.

This means that when an ORDER BY operation is requested, the whole dataset needs to be loaded into the main memory on the relevant cluster node to be sorted. For this reason, it is important to not request ORDER BY operations when not actually needed, and most importantly, not on tables of large cardinalities without aggregating records beforehand. On the other hand, of course it is no problem to sort a few thousand rows in the final stage of a SELECT operation, but we need to avoid requesting sort operations over millions of rows.

Consider leveraging filters and aggregations like max_by and min_by to limit the scope of ORDER BY operations, or avoid them altogether when possible.

So for instance instead of:

SELECT reading_time, reading_value
FROM device_data
WHERE reading_time BETWEEN '2024-01-01' AND '2025-01-01'
ORDER BY reading_time DESC
LIMIT 10;

use:

SELECT reading_time, reading_value
FROM device_data
WHERE reading_time BETWEEN '2024-12-20' AND '2025-01-01'
ORDER BY reading_time DESC
LIMIT 10;

Format output as a last step¶

In many cases, data may be stored in an efficient format, but we want to transform it to make it more human-readable in the output of the query. To accomodate such situations, we may use scalar functions such as date_format or timezone.

Sometimes queries apply these transformations in an intermediate step and later do further operations like filtering on the transformed values.

CrateDB’s query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. Based on the query scenario/situation, it is always aiming to use existing indexes on the original data for maximum efficiency.

However, there is always a chance that some particular clause in the query expression prevents the optimizer from selecting an optimal plan, ending up applying the transformation on thousands or millions of records that later would be discarded anyway. So, whenever it makes sense, we want to ensure these transformations are only applied after the database has already worked out the final result set to be sent back to the client.

So instead of:

WITH mydata AS (
  SELECT
        DATE_FORMAT(device_data.reading_time) AS formatted_reading_time,
        device_data.reading_value
  FROM device_data
)
SELECT *
FROM mydata
WHERE formatted_reading_time LIKE '2025%';

use:

SELECT
  DATE_FORMAT(device_data.reading_time) AS formatted_reading_time,
  device_data.reading_value
FROM device_data
WHERE device_data.reading_time BETWEEN '2025-01-01' AND '2026-01-01'

Replace CASE in expressions used for filtering, JOINs, grouping, etc¶

It is not always obvious to the optimizer what we may be trying to do with a CASE expression (see for instance Shortcut CASE evaluation Issue 16022).

If you are using CASE expression for “formatting” see the previous point about formatting output as late as possible, but if you are using a CASE expression as part of a filter of other operation consider replacing it with an equivalent expression, for instance:

SELECT SUM(a) as count_greater_than_10,...
FROM (
  SELECT CASE WHEN field1 > 10 THEN 1 ELSE 0 END
        , ...
  FROM mytable
  ...
) subquery
...;

can be rewritten as

SELECT COUNT(field1) FILTER (WHERE field1 > 10) as count_greater_than_10
FROM mytable;

And

SELECT *
FROM mytable
WHERE
  CASE
        WHEN $1 = 'ALL COUNTRIES' THEN true
        WHEN $1 = mytable.country AND $2 = 'ALL CITIES' THEN true
        ELSE $1 = mytable.country AND $2 = mytable.city
  END;

can be rewritten as

SELECT *
FROM mytable
WHERE ($1 = 'ALL COUNTRIES')
OR ($1 = mytable.country AND $2 = 'ALL CITIES')
OR ($1 = mytable.country AND $2 = mytable.city)

(the exact replacement expressions of course depend on the semantics of each case)

Use groupings instead of DISTINCT¶

(Reference: Issue 13818)

Instead of

SELECT DISTINCT country FROM customers;

use

SELECT country FROM customers GROUP BY country;

and instead of

SELECT COUNT(DISTINCT a) FROM t;

use

SELECT COUNT(a)
FROM (
        SELECT a
        FROM t
        GROUP BY a
) tmp;

Use subqueries instead of GROUP BY if the groups are already known¶

Consider the following query:

SELECT customerid, SUM(order_amount) AS total
FROM customer_orders
GROUP BY customerid;

This looks simple but to execute it CrateDB needs to keep the full result set in memory for all groups.

If we already know what the groups will be we can use correlated subqueries instead:

SELECT customerid,
  (SELECT SUM(order_amount)
   FROM customer_orders
   WHERE customer_orders.customerid = customers.customerid
  ) AS total
FROM customers;

Handling Large and Complex Queries¶

Strategies for breaking down complex operations on large datasets into manageable steps.

Batch operations¶

If you need to perform lots of UPDATEs or expensive INSERTs from SELECT, consider exploring different settings for the overload protection or thread pool sizing which can be used to fine tune the performance for these operations.

Otherwise, if you only need to run it once and performance is not critical, consider using small batches instead, where the operations are done on groups of records each time.

So for instance instead of doing:

UPDATE mytable SET field1 = field1 + 1;

consider a different approach such as:

for id in {1..100}; do
        crash -c "UPDATE mytable SET field1 = field1 + 1 WHERE customer_id = $id;"
done

Paginate on filters instead of results¶

For instance instead of

SELECT deviceid, AVG(field1)
FROM device_data
GROUP BY deviceid
LIMIT 1000 OFFSET 5000;

We can do something like

WITH devices AS (
  SELECT deviceid
  FROM devices
  LIMIT 5 OFFSET 25
)
SELECT deviceid, AVG(field1)
FROM device_data
WHERE device_data.deviceid IN (SELECT devices.deviceid FROM devices)
GROUP BY deviceid;

Use staging tables for intermediate results if you are doing a lot of JOINs¶

If you have many CTEs or VIEWs with a need to JOIN them, it can be benefical to query them individually, store intermediate results into dedicated tables, and then use these tables for JOINing.

While there is a cost in writing to disk and reading data back, the whole operation can benefit from indexing and from giving the optimizer more straightforward execution plans, to enable it optimizing for better parallel execution using multiple cluster nodes.

Schema and Function Optimization¶

Schema design and function usage to streamline performance.

Consider generated columns¶

If you frequently find yourself extracting information from fields and then using this extracted data on filters or aggregations, it can be good to consider doing this operation on ingestion with a generated column . In this way the value we need for filtering and aggregations can be indexed. This involves a trade-off between storage space and query performance, evaluate the frequency and execution times of these queries with the additional storage requirements of storing the generated value.

See Using regex comparisons and other features for inspection of logs for an example.

Be mindful of UDFs, leverage them in the right contexts, but only in the right contexts¶

When using user-defined functions (UDFs), two important details relevant for performance aspects need to be considered.

Once values are processed by an UDF, the database engine will load results into memory, and will not be able to leverage indexes on the underlying fields any longer. In this spirit, please apply the relevant general considerations about delaying formatting as much as possible.
UDFs run on a JavaScript virtual machine on a single thread, so they can have an impact on performance as relevant operations cannot be parallelized.

However, some operations may be more straightforward to do in JavaScript than SQL.

Filter and Expression Optimization¶

Expressions that improve filter efficiency and processing of specific data structures.

Avoid expression negation in filters¶

Positive filter expressions can directly leverage indexing. With negative expressions, the optimizer may be able to still use indexes, but this may not always happen and the optimizer might not rewrite the query optimally. Explicitly using positive conditions removes ambiguity and ensures the most efficient path is chosen.

So instead of:

SELECT
  customerid,
  status
FROM customers_table
WHERE NOT (customerid <= 2) AND NOT (status = 'inactive');

We can rewrite this as:

SELECT
  customerid,
  status
FROM customers_table
WHERE customerid > 3 AND status = 'active';

Use the special null_or_empty function with OBJECTs and ARRAYs when relevant¶

CrateDB has a special scalar function called null_or_empty , using this in filter conditions against OBJECTs and ARRAYs is much faster than using an IS NULL clause, if allowing empty objects and arrays is acceptable.

So instead of:

SELECT ...
FROM mytable
WHERE array_column IS NULL OR array_column = [];

We can rewrite this as:

SELECT ...
FROM mytable
WHERE null_or_empty(array_column);

Performance Analysis and Execution Plans¶

Review execution plans¶

If a query is slow but still completes in a certain amount of time, we can use EXPLAIN ANALYZE to get a detailed execution plan. The main thing to watch for on these is MatchAllDocsQuery and GenericFunctionQuery. These operations are full table scans, so you may want to review if that is expected in your query (you may actually intentionally be pulling all records from a table with a list of factory sites for instance) or if this is about a filter that is not being pushed down properly.

Query Optimization 101¶

Early Filtering and Data Reduction¶

Do all filtering as soon as possible¶

Avoid SELECT *¶

Avoid large result sets¶

Propagate LIMIT clauses when applicable¶

Use filters with array expressions when filtering on the output of UNNEST¶

Efficient Query Structure and Constructs¶

Only sort data when needed¶

Format output as a last step¶

Replace CASE in expressions used for filtering, JOINs, grouping, etc¶

Use groupings instead of DISTINCT¶

Use subqueries instead of GROUP BY if the groups are already known¶

Handling Large and Complex Queries¶

Batch operations¶

Paginate on filters instead of results¶

Use staging tables for intermediate results if you are doing a lot of JOINs¶

Schema and Function Optimization¶

Consider generated columns¶

Be mindful of UDFs, leverage them in the right contexts, but only in the right contexts¶

Filter and Expression Optimization¶

Avoid expression negation in filters¶

Use the special null_or_empty function with OBJECTs and ARRAYs when relevant¶

Performance Analysis and Execution Plans¶

Review execution plans¶

Avoid `SELECT *`¶