Our team just released CrateDB v5.3! This time we’ve focused on performance, we’ve improved ingestion performance by up to 30% and some correlated sub-query performance by up to 10x.
Besides that, we’ve continued our ongoing effort to improve CrateDB's Standard SQL and PostgreSQL compatibility.
Ingestion performance improvements
We’ve optimized the way how to process written values: we generated JSON documents out of each new insert and also while sending updates to the replica shards, as each row will be stored inside Lucene’s document store in this format.
On the shards themself, these JSON documents were parsed and re-generated again by the underlying Elasticsearch code we used. This was identified as a performance hotspot on profiling. As we continued to “own” the Elasticsearch code in recent years (instead of using a fork), we changed the related components to only generate the final document to store once.
If using 0 replicas, the performance improvement is ~10-15%. With every replica in place, the improvement will multiply as we avoid the parse-generate loop in each replica as well. So with 1 replica in place it may improve up to 30%. We will follow up here with a more detailed lab blog post coming up soon.
Correlated sub-query performance improvements
We’ve improved the performance of queries using a correlated sub-query inside the WHERE
clause in conjunction with a non-correlated filter clause by up to 10x.
This is achieved by adding a simple optimizer rule to push down any filter to the relation it belongs to. This way, Lucene’s indexes could be used instead of applying a filter later on after pulling the data out of Lucene, resulting in a dramatic performance improvement.
An example query hitting this improvement could be like:SELECT COUNT(*) FROM uservisits u WHERE "lCode" LIKE '%-EN' AND EXISTS (SELECT 1 FROM uservisits WHERE "cCode" = u."cCode")
In this statement, the query part "lCode" LIKE '%-EN'
can be translated into a Lucene query when executing the count
aggregation reducing the actual results going into the later evaluation of the CorrelatedJoin
operator, which executes the correlated sub-query filter part.
Besides these more prominent features, CrateDB 5.3 includes some smaller improvements related to SQL and PostgreSQL compatibility, as well as some breaking changes.
Need more information? Check all the details in the release notes.