Live Stream on Jan 23rd: Unlocking Real Time Insights in the Renewable Energy Sector with CrateDB

Register now
Skip to content
Blog

CrateDB v5.1: Correlated Sub-Queries and Cursors

CrateDB v5.1 is now ready to use!
Our team just released CrateDB v5.1! We’ve added support for correlated sub-queries and cursors and continued to improve our SQL and PostgreSQL compatibility as well as the Administration UI.


Correlated sub-queries

CrateDB now supports correlated sub-queries at SELECT statements. The following example demonstrates its usage to retrieve the average value of the speed column beside the actual speed column value.

SELECT
id,
temperature,
(SELECT avg(temperature) AS average_temp
FROM devices
WHERE id = d.id)
FROM devices d 

Correlated sub-queries can be used inside the SELECT item list or as a WHERE clause expression. This is our initial implementation using a Correlated Join plan which isn’t as optimal for some queries as it will execute the sub-query for each row. See also our related documentation.

More details on this will be elaborated in a soon-to-come technical blog post about this topic. Stay tuned!

EXISTS expression

One common use case for a correlated sub-query is its usage in conjunction with the EXISTS expression, which was one of our main motivations to include it with CrateDB v5.1 as well. For example:

SELECT
  id,
  temperature
FROM devices d
WHERE EXISTS (SELECT 1 FROM plants WHERE id = d.plant_id)

Cursors

 

With CrateDB v5.1 we finally add support for using cursors to improve compatibility with many tools and clients but also general usage when looping/paging through large results sets. Compared to using OFFSET and LIMIT, fetching from a declared cursor will not result in a re-execution of the query. As such it will ensure consistency of the result set while also improving performance.

CrateDB v5.1 does not implement scrolling support for cursors yet, especially scrolling backward. We plan to follow up here soon, probably within the next feature release. (See the related issue to track the state of the missing scrolling support).

Grouping on columns of type ARRAY

In an early version of CrateDB, our framework wasn’t capable to group by a column of the type ARRAY. Due to our continued effort in refactoring and improving our stack, this limitation was nowadays a purely artificial one. Thus we’ve removed it and do finally support grouping on columns of any supported data type.

Throttle throughput of the ANALYZE execution

The ANALYZE statement is collecting statistics of the data stored inside CrateDB in order to optimize executions plans. Even we’ve implemented it to collect samples only and such avoid processing all of the data, this can put some load on the cluster. As CrateDB does execute this collection periodically (by default every 24h) without any user interaction, this additional load on the cluster is unexpected and may confuse. Several reports by users highlighted that the current implementation isn’t that optimal. Thus we decided to add a I/O throttling mechanism to lower any impact the cluster, the actual throttle value can be controlled by the user and is by default 40MB/s.

Besides these more prominent features, CrateDB v5.1 includes a lot of smaller improvements related to SQL and PostgreSQL compatibility and our administration UI as well as some breaking changes.

Need more information? Check all the details in the release notes.