Hey folks! CrateDB 3.1 (stable) has been released and is now faster and easier to use than ever.
DOWNLOAD IT NOW, while it's hot.
The complete list of changes can be found in the release notes. In this post, I'll give you a quick tour of the highlights.
Highlights
Faster Query Performance
Performance enhancements will mainly benefit applications that use GROUP BY clauses and those that access arrays. GROUP BY
in combination with aggregations is a commonly seen pattern in CrateDB use cases. Arrays have been neglected a little bit in the past and were in need of some engineering love.
Various changes to memory utilization and Lucene query execution in CrateDB contributed to performance improvements for some types of query. I have listed them below, along with the results of some basic benchmarks to give you an idea of the performance improvement in 3.1. As with any sort of benchmark, your mileage may vary...
-
Accessing array elements in
WHERE
clausesSample query:
SELECT a FROM array_access WHERE a[1] = 101
This ran 200x faster in our tests.
-
Performing
GROUP BY
aggregationsSample query:
SELECT "cCode", count(*) FROM uservisits GROUP BY "cCode"
This ran 6.7x faster in our tests.
Sample query:
SELECT avg("adRevenue") FROM uservisits GROUP BY "cCode"
This ran 2.3x faster in our tests.
Sample query:
SELECT count(*) FROM (SELECT DISTINCT x FROM t) AS t ## x is long
This ran 3x faster in our tests.
-
WHERE NOT x = ANY
queriesWe introduced a new scalar function,
ignore3vl()
, which eliminates the 3-valued logic overhead of null handling if null handling is not required in yourWHERE NOT x = ANY
query logic, yielding potentially faster query results.Without
ignore3vl()
:SELECT count(*) FROM t WHERE NOT 20 = any(a)
With
ignore3vl()
:SELECT count(*) FROM t WHERE NOT ignore3vl(20 = any(a))
Using
ignore3vl()
in the query ran 3.8x faster than without it in our tests.
Broader PostgreSQL Wire Protocol Compatibility
CrateDB has supported the PostgreSQL wire protocol since the CrateDB 1.0 release in 2016. In version 3.1, we made a few enhancements that increase CrateDB compatibility with PostgreSQL drivers (especially the Go driver):
-
CrateDB does not support SQL transactions but does now return the expected responses
BEGIN
andCOMMIT
statements. -
Timestamp columns are now encoded using
int64
, which increases compatibility with different Postgres clients processing time series and other timestamp data. -
The TimeZone parameter response will now be returned to connecting clients, which enables compatibility with Django PostgreSQL ORM, among others. Thanks to Robert Palmer for the contribution.
-
Multi-query support in Simple Query Mode.
New System Metrics Monitoring Capabilities
To improve monitoring and troubleshooting, especially for hard-running clusters, we now expose more metrics through the JMX and Prometheus exporter (HTTP) interfaces:
-
Thread pools queues
-
Cluster state version (rapid change may indicate issues)
-
Circuit breaker statistics
-
Extended job logging to filter jobs, and added a metrics table for stats of different types
Ease of Use Improvements
New administrative features that make CrateDB easier to use:
-
Multi-line comments in SQL are now supported.
-
EXPLAIN ANALYZE is now supported and reports the timing of the different phases of a query’s execution plan, including both CrateDB and Lucene phases. This can be used to optimize the performance of queries and gain a better understanding of the underlying structure of CrateDB.
-
Detailed results reporting for
COPY FROM
for easier debugging (successes and fails) of bulk data loads into CrateDB. This has been a long-awaited feature and will make it easier to identify problems.Deprecating the Elasticsearch API
Because our codebase is diverging from Elasticsearch and more and more features are moved over to the CrateDB execution layer and subsequently exposed via SQL, we have decided to deprecate the Elasticsearch API. This means that in future releases the support for the Elasticsearch API can be entirely dropped.
If you are using the Elasticsearch API at the moment, please let us know on GitHub.
Do you have any other questions? Get in touch with us via Slack or find us on GitHub. You won't get a t-shirt, but we still want to hear from you. :)