CrateDB Blog | Development, integrations, IoT, & more

CrateDB 3.1 (Stable) Available Now

Written by Andy Ellicott | 2018-10-11

Hey folks! CrateDB 3.1 (stable) has been released and is now faster and easier to use than ever.

DOWNLOAD IT NOW, while it's hot.

The complete list of changes can be found in the release notes. In this post, I'll give you a quick tour of the highlights.

Highlights

Faster Query Performance

Performance enhancements will mainly benefit applications that use GROUP BY clauses and those that access arrays. GROUP BY in combination with aggregations is a commonly seen pattern in CrateDB use cases. Arrays have been neglected a little bit in the past and were in need of some engineering love.

Various changes to memory utilization and Lucene query execution in CrateDB contributed to performance improvements for some types of query. I have listed them below, along with the results of some basic benchmarks to give you an idea of the performance improvement in 3.1. As with any sort of benchmark, your mileage may vary...

  • Accessing array elements in WHERE clauses

    Sample query:

    SELECT a FROM array_access WHERE a[1] = 101

    This ran 200x faster in our tests.

  • Performing GROUP BY aggregations

    Sample query:

    SELECT "cCode", count(*) FROM uservisits GROUP BY "cCode"

    This ran 6.7x faster in our tests.

    Sample query:

    SELECT avg("adRevenue") FROM uservisits GROUP BY  "cCode"

    This ran 2.3x faster in our tests.

    Sample query:

    SELECT count(*) FROM (SELECT DISTINCT x FROM t) AS t ## x is long

    This ran 3x faster in our tests.

  • WHERE NOT x = ANY queries

    We introduced a new scalar function, ignore3vl(), which eliminates the 3-valued logic overhead of null handling if null handling is not required in your WHERE NOT x = ANY query logic, yielding potentially faster query results.

    Without ignore3vl():

    SELECT count(*) FROM t WHERE NOT 20 = any(a)

    With ignore3vl():

    SELECT count(*) FROM t WHERE NOT ignore3vl(20 = any(a))

    Using ignore3vl() in the query ran 3.8x faster than without it in our tests.

Broader PostgreSQL Wire Protocol Compatibility

CrateDB has supported the PostgreSQL wire protocol since the CrateDB 1.0 release in 2016. In version 3.1, we made a few enhancements that increase CrateDB compatibility with PostgreSQL drivers (especially the Go driver):

  • CrateDB does not support SQL transactions but does now return the expected responses BEGIN and COMMIT statements.

  • Timestamp columns are now encoded using int64, which increases compatibility with different Postgres clients processing time series and other timestamp data.

  • The TimeZone parameter response will now be returned to connecting clients, which enables compatibility with Django PostgreSQL ORM, among others. Thanks to Robert Palmer for the contribution.

  • Multi-query support in Simple Query Mode.

New System Metrics Monitoring Capabilities

To improve monitoring and troubleshooting, especially for hard-running clusters, we now expose more metrics through the JMX and Prometheus exporter (HTTP) interfaces:

  • Thread pools queues

  • Cluster state version (rapid change may indicate issues)

  • Circuit breaker statistics

  • Extended job logging to filter jobs, and added a metrics table for stats of different types

Ease of Use Improvements

New administrative features that make CrateDB easier to use:

  • Multi-line comments in SQL are now supported.

  • EXPLAIN ANALYZE is now supported and reports the timing of the different phases of a query’s execution plan, including both CrateDB and Lucene phases. This can be used to optimize the performance of queries and gain a better understanding of the underlying structure of CrateDB.

  • Detailed results reporting for COPY FROM for easier debugging (successes and fails) of bulk data loads into CrateDB. This has been a long-awaited feature and will make it easier to identify problems.

    Deprecating the Elasticsearch API

    Because our codebase is diverging from Elasticsearch and more and more features are moved over to the CrateDB execution layer and subsequently exposed via SQL, we have decided to deprecate the Elasticsearch API. This means that in future releases the support for the Elasticsearch API can be entirely dropped.

    If you are using the Elasticsearch API at the moment, please let us know on GitHub.

Do you have any other questions? Get in touch with us via Slack or find us on GitHub. You won't get a t-shirt, but we still want to hear from you. :)