CrateDB Blog | Development, integrations, IoT, & more

CrateDB v5.8.1 Release: Improved JOIN Performance, Cluster Scaling, and More

Written by Sebastian Utz | 2024-08-05

We are happy to announce that our next feature release of CrateDB v5.8.1 is out now.

What is it about?

CrateDB v5.8.1 brings JOIN performance improvements, cluster up-scaling improvements, new scalars, and new administrative functionality.

JOIN Performance Improvement

With CrateDB v5.8.1, two JOIN performance improvements will be released.

The first one relates to constant join conditions, for example: select * t1left join t2 on t1.id = t2.id and t1.id > 1. In this case, the t1.id > 1 can be pushed down to the collect operation of table t1 because the condition contains a constant value (1). Filtering out as many rows as possible directly at the collect operation results in fewer rows needed to be processed in the join operations. Additionally, most filters can utilize an INDEX which improves performance a lot. This optimization is enabled by default but can be disabled explicitly by SET optimizer_move_constant_join_conditions_beneath_join = false.
 
The second optimization relates to the LOOKUP-JOIN implementation we’ve released with CrateDB v5.7.0. This optimization will now also work with more complex queries e.g., when sub-queries or INNER EQUI-JOINs are used. The LOOKUP-JOIN optimization is still marked as experimental and can lead to large memory consumption, therefore, we decided to disable it by default (compared to CrateDB v5.7.0, where it was enabled by default). It can be activated by SET optimizer_equi_join_to_lookup_join = true.
 

Cluster Up-Scaling Improvements

Adding nodes to a cluster (up-scaling) to improve ingestion or querying throughput only works when the related tables have equal or more shards configured on the table creation. Otherwise, new nodes won’t have any shards allocated and thus won’t help much with ingestion or query workloads.

For background information: even though these nodes don’t hold any data, they can still improve the overall performance when used as the “connecting aka. handler” nodes as they work as the job dispatching and final result merging components.

In situations where the already configured shards are not enough to scale-up, it is possible to increase the number of shards of existing tables, but only if the number_of_routing_shards where explicitly set by the user on table creation.

This is changed now, with CrateDB v5.8.1 a calculated value for number_of_routing_shards is set by default, allowing the user to increase the number of shards later on even without knowing about this setting.

This is mostly important for non-partitioned tables, as with partitioned tables, the number of shards can be changed on the table and will be used for any new partition.

New Scalars

Besides the new scalars added by the CrateDB team, we also got some great open-source contributions for new scalars:

New Administrative Functionality

CrateDB v5.8.1 exposes more network statistics over the Connections JMX MBean. On top of the previously exposed open and total connections, the number of messages and bytes sent and received per protocol is now available. Additionally, the number of total connections on the transport protocol (which is used for internal inter-node communication) is now available at the sys.nodes connections column.

CrateDB 5.8.1 brings some more smaller changes. Please check out our full release notes for more details on that, especially the list of breaking changes