Live Stream on Jan 23rd: Unlocking Real Time Insights in the Renewable Energy Sector with CrateDB

Register now
Skip to content
Blog

CrateDB v5.4: SQL enhancements, performance and operational improvements

CrateDB v5.4 is now ready to use!

Our team just released CrateDB v5.4! This release contains a lot of SQL enhancements like new scalars, aggregations, type system and PostgreSQL compatibility improvements. Additionally we added new functionality to help you on administrative and operational tasks like better control on query timeouts, join optimizations or improved COPY FROM robustness against temporarily network error.

Better control of long-running queries

This version adds the possibility to set a statement_timeout session setting and cluster setting that allows to set a timeout for queries.

This is very useful to prevent long running queries from blocking resources on a CrateDB cluster. Possible scenarios may be a complex query started by a 3rd party tool, which the user does not fully have control over (e.g. BI tools). Other use-cases are multi-tentant clusters where one want to limit resources, each query could use, is this regard, time.
Once the timeout is reached, the related query get cancelled.

Better control of storage consumption

Following a feature already available for text type columns, we added support to disable the column store on all numeric, timestamp and timestamp with timezone data types.
Disabling the columnar store is interesting when storage costs is a main issue and aggregations on various columns aren’t needed at all or are accepted to be slow. A column which values are only stored inside the row store will consume 30-50% less of the storage compared with storing it into both, the row and the columnar store.

Better control on how JOIN statements are executed

A JOIN always consists of a pair of relations, whereas the pair definition, which relations is on the right side, is at first dictated by the submitted SQL statement. Logically, the left and right sided relations can be swapped which may result in improved performance. For example, a NestedLoopJoin (the default, but also expensive join algorithm) is a permutation of all rows of both sides. The implementation will iterate over each row of the right side for each row of the left side, thus the right side iteration is repeated. Using the smaller table for repeated iterations improves performance in many cases, especially when the rows cannot be hold in memory but must be collected from disk on each iteration.
But it turns out that this does not hold true in all cases. Until we have a smarter way to catch these cases, e.g. cost-based join-ordering optimization, the user can now turn this optimizations off.

SET optimizer_reorder_hash_join = false

SET optimizer_reorder_nested_loop_join = false
Be aware that these settings are declared as experimental and such may change in future or even disappear.

Better support of the INTERVAL data type

Using intervals is a very handy and common feature especially when working with time-series data sets. Some useful scalar functions like e.g. the age function, even return a value of type INTERVAL.

Such adding support to compare intervals, order on intervals, or using intervals as an argument to aggregation functions is crucial for various use-cases.
With CrateDB 5.4, we addressed this request to let you work on time-series data in the way you’d expect it to.

There is much more coming with CrateDB 5.4, especially some new scalars, aggregations and PostgreSQL compatibility improvements, as well as some breaking changes.

Need more information? Check all the details in the release notes.