Version 6.4.0 - Unreleased

Note

In development. 6.4.0 isn’t released yet. These are the release notes for the upcoming release.

Note

If you are upgrading a cluster, you must be running CrateDB 5.0.0 or higher before you upgrade to 6.4.0.

We recommend that you upgrade to the latest 6.3 release before moving to 6.4.0.

A rolling upgrade from >= 6.3.0 to 6.4.0 is supported. Before upgrading, you should back up your data.

Warning

Tables that were created before CrateDB 5.x will not function with 6.x and must be recreated before moving to 6.x.x.

You can recreate tables using COPY TO and COPY FROM or by inserting the data into a new table.

Table of contents

Breaking Changes

None

Deprecations

None

Changes

SQL Statements

None

SQL Standard and PostgreSQL Compatibility

Data Types

  • Added support for geo_point values in JSON array string format, for example "[14.988999953493476, 51.10299998894334]". This allows CSV files containing geo_point values in this format to be imported without transforming the source data first.

Scalar and Aggregation Functions

  • Added support for adding or subtracting intervals to values of type date.

  • Added support for TH and th template patterns to the to_char scalar function which can be used to add a ordinal suffix for numbers.

  • Added support for marking literal characters with double quotes in the to_char template pattern. Characters enclosed in "..." are treated as literal text and not interpreted as formatting tokens. Backslash escaping within double-quoted sections is also supported (e.g., \" for a literal ", \\ for a literal \).

  • Robert Palmer added the blake3 scalar function for computing BLAKE3 checksums of strings.

  • Bing O’Dowd added the largest triangle three buckets aggregation function for downsampling data using the Largest Triangle Three Buckets algorithm

Performance and Resilience Improvements

  • Improved the performance of SELECT strKeyColumn, count(*) FROM <tbl> GROUP BY 1 queries if they don’t contain any WHERE clause. In some of our benchmarks by up to 140%.

  • Improved the performance of queries on the sys.shards table. In some of our benchmarks by up to ~20% if the sequence number of translog stats were selected.

  • Added column pruning for scalar subqueries. This can help improve performance for sub-queries which intermediately selected columns not used in the results.

  • Improved the performance of the :ref:to_char <scalar-to_char> scalar. In some of our benchmarks by up to ~25%.

  • Improved the performance of bulk insert operations against tables that use user defined functions in a generated column which also has check constraints.

  • Increased the default queue size of the write search pool but lowered the amount of internal retries for INSERT INTO <table> VALUES statements in exchange. This should lower the load for bulk inserts into tables with many shards at the expensive of potentially higher memory load before requests get rejected if clients make single inserts with high concurrency/many connections.

  • Improved the performance of the min() and max() aggregations for columns of type NUMERIC.

  • Improved query planning for lookup joins under LIMIT by deferring collection of columns that are not needed during join execution until the final fetch phase.

Administration and Operations

  • Added session_id column to the sys.jobs and sys.jobs_log tables.

  • Added a state column to the sys.cluster table.

  • Added a remove-corrupted-data command to the crate-node CLI tool for disaster recovery of a corrupted shard which cannot be automatically recovered from a good copy (replica) or restored from a backup.

Client interfaces

  • Changed the HTTP interface to set parameter type hints which are guessed based on the args payload. This can help avoid having to add explicit casts within a SQL statement for parameter placeholders.

    For example, instead of SELECT percentile(height, ?::double precision) ... it is possible to use SELECT percentile(height, ?) if args contains a numeric value.

    A side effect of this change is that cast semantics can become stricter. This could surface latent issues if the context where the parameter is used requires a cast to a different type. Consider for example an INSERT INTO tbl (xs) VALUES (?), (?) where xs is of type text[] but the payload for xs is an object array.

    Previously this used a cast from undefined to text[] and succeeded, now it can lead to errors like:

    The types of the columns within VALUES lists must match.
    Found `text_array` and `object_array` at position: 1