Version 6.0.0¶
Released on 2025-07-17.
Warning
Do not use this version when upgrading from any previous version containing tables created before Version 5.5.0 as this may result in data loss!
If the cluster contains tables created before Version 5.5.0, after
upgrading to Version 6.0.0 certain actions on such tables like
deleting partitions, changing settings, rename, swap, etc. can lead to
corrupted table which causes all the data of the columns created
in versions before Version 5.5.0 to be shown as NULL. The bug
has been fixed in Version 6.0.6, so we highly recommend to avoid
upgrading to any earlier 6.0.x version.
Once already affected by the bug, existing data may be lost forever, while
new data (via INSERT or UPDATE) can be retrieved normally.
Note
If you are upgrading a cluster, you must be running CrateDB 5.0.0 or higher before you upgrade to 6.0.0.
We recommend that you upgrade to the latest 5.10 release before moving to 6.0.0.
A rolling upgrade from >= 5.10.1 to 6.0.0 is supported. Before upgrading, you should back up your data.
Important
It’s highly advisable, once the cluster is fully upgraded to Version 6.0.0, to create a new repository, on a different path than the repository used when the cluster was in 5.x version, and create a full backup of your data.
Warning
Tables that were created before CrateDB 5.x will not function with 6.x and must be recreated before moving to 6.x.x.
You can recreate tables using COPY TO and COPY FROM or by
inserting the data into a new table.
Table of contents
Breaking Changes¶
Updated Lucene from 9.12 to 10.2 which brings the following user facing changes:
Tables created in 4.x can’t be read anymore. If you’re running CrateDB 5.x and have any tables created in 4.x they need to be re-indexed before moving to CrateDB 6.0. Make sure there are no warnings in the admin console and see CrateDB table version compatibility scheme for more information.
The
dutch_kpandlovinstoken filters are no longer available. If users have tables which use these token filters for analysis, they will need to re-index with either plaindutchorenglishstemmers before upgrading.The
german2andgermantoken filters now use the same underlying stemmer, which always expandsä,öandütoae,oeanduerespectively. If users have tables which were using thegermanstemmer without a character filter that already did this, they will need to re-index after upgrading.Lucene 10.2 opens files with the
MADV_RANDOMadvice by default on Linux and Mac OS. If you experience an increase in IOPS and degraded performance, especially slow recovery times on node restarts, setCRATE_JAVA_OPTS=-Dorg.apache.lucene.store.defaultReadAdvice=NORMALto restore previous behavior.
Removed the deprecated
soft_deletes.enabledsetting forCREATE TABLE. The setting defaulted totruesince 4.3.0, was deprecated in 4.5.0 and soft deletes became mandatory in 5.0.0.When casting an expression of type OBJECT to an OBJECT, the inner types are merged instead of being replaced by the target type and the target column policy is used. The only exception is when the target column policy is set to STRICT. In this case, only the target inner type definition is used. See also the related documentation
Removed support for patterns on the left side argument of LIKE/ILIKE operators because it led to ambiguities when both sides contained possible pattern like strings e.g.:
SELECT 'a%' LIKE ANY(['a__']);
Updated the default
compressionof percentile aggregation from100.0to200.0which increases the accuracy of the approximations but with the cost of slightly more memory consumption and increased execution time, depending on the used data set. To regain existing behaviour, adjust thecompressionargument of the percentile aggregation accordingly.Made the behavior of the
DENYandGRANTstatements stricter. They now fail if there is a mismatch between thesecurable(VIEWorTABLE) and the actual relation type of the relation addressed by<ident>. Before a statement likeGRANT DQL ON TABLE actually_a_viewwould succeed but not do anything.Applied normalization for IP values so that for example
'::ffff:192.168.0.1'::IP, becomes'192.168.0.1'. Previously, this normalization was already applied for all the values inserted into a column of IP, but was not applied for literal values in an SQL query, and was also not applied to the values inserted into a column ofIPdata type, when this column was part of thePRIMARY KEYof the table. This resulted in wrong behavior when trying to filter on the table by it’sIP(PRIMARY KEY), as the value stored for the_idwould have been the un-normalized one, whereas the value for the table column would have been the normalized one. e.g.:CREATE TABLE tbl(a IP , PRIMARY KEY(a)); INSERT INTO tbl(a) VALUES ('::ffff:192.168.0.1'); REFRESH TABLE tbl; SELECT _id, a FROM tbl;
Would yield:
+--------------------+-------------+ | _id | a | +--------------------+-------------+ | ::ffff:192.168.0.1 | 192.168.0.1 | +--------------------+-------------+
So the query:
SELECT * FROM tbl WHERE a = '192.168.0.1'
would not return any results, as it will use the
_idto try and match theIPvalue in theWHEREclause. You can find more details about this mechanism here.Warning
Because of this change, users are advised to re-create tables which have an
IPcolumn asPRIMARY KEYor as part of thePRIMARY KEY. Since the stringIPvalues will be automatically normalized before stored as_id, if for example a value:::ffff:192.168.0.1is already stored on the table, after upgrading to Version 6.0.0, it will be possible to re-insert the value on the table, without any complaint from thePRIMARY KEYconstraint check, as the value will be stored, normalized, as192.168.0.1.Fixed an issue that caused queries filtering on partition(s) of a table for which the
PARTITION BYclause, contains a column ofBOOLEANtype to return no results, e.g. for a table:CREATE TABLE tbl(id BOOLEAN, PRIMARY KEY(id)) PARTITIONED BY (id); INSERT INTO tbl(id) VALUES (FALSE); REFRESH TABLE tbl;
The query:
SELECT * FROM tbl WHERE id = FALSE;
would return
0rows;Warning
Because of this fix, users should re-create tables which have a
BOOLEANcolumn in theirPARTITIONED BYclause.Added a statement_max_length setting to limit the length of allowed SQL statements. It defaults to
262144. Statements exceeding this limit are rejected because large statements consume a high amount of memory. Large statements are typically caused by inline values or using a large number ofVALUESclauses. Instead, statements should be written using parameter symbols (?) and inserting lots of values should be done using either bulk operations or aINSERT INTOin combination with aSELECT FROMandUNNEST.Changed the output column names of a query result, which correspond to function calls to return the function name. Previously, for a function call that can be pre-evaluated, without using any data from the underlying tables, the column name would be the evaluated value, e.g.:
cr> SELECT ((1+2)*3); +---+ | 9 | +---+ | 9 | +---+
With this change:
cr> SELECT ((1+2)*3); +---------------+ | ((1 + 2) * 3) | +---------------+ | 9 | +---------------+
Additionally, previously, function calls which are using table columns as arguments, would be returned as column names, using the complete function call, but with this change only the function name is used, so that it matches the behavior of PostgreSQL. e.g. Previously:
cr> SELECT avg(a) FROM tbl; +--------+ | avg(a) | +--------+ | 3.0 | +--------+
With this change:
cr> SELECT avg(a) FROM tbl; +-----+ | avg | +-----+ | 3.0 | +-----+
Note
When a function call which uses table columns as arguments is nested in an comparison or arithmetic expression, the output name of the column continues to expose the complete function call. e.g.:
cr> SELECT sqrt(a) > 1 FROM tbl; +---------------+ | (sqrt(a) > 1) | +---------------+ | TRUE | +---------------+
stddev(column) changed and is now an alias for stddev_samp(column), to match PostgreSQL behavior. In order to calculate the population standard deviation, stddev_pop(column) must be used.
current_cataloghas been added to the list of reserved keywords, because of the addition of CURRENT_CATALOG function which is called as a keyword without the(). The whole list of keywords can be found here.
Deprecations¶
Usage of _version has been deprecated completely. Please note that its usage for Optimistic Concurrency Control has already been deprecated since Version 4.0.0.
Changes¶
SQL Statements¶
COPY FROMnow assumes that the input files aregzipcompressed if all of the specified files end in.gz.
SQL Standard and PostgreSQL Compatibility¶
Added the starts_with scalar function, which returns true if a string starts with a given prefix.
Added support for lower case format patterns to the to_char scalar function.
Added the information_schema.applicable_roles, information_schema.enabled_roles, information_schema.administrable_role_authorizations and information_schema.role_table_grants tables.
Populated the
pg_index.indnkeyattscolumn with the number of key attributes in the primary key.
Data Types¶
Added support for dynamic mapping of nested arrays.
Improved error handling of missing keys when accessing elements of (nested) object type expressions to be consistent according to the defined Object column policy and the related error_on_unknown_object_key session setting.
Scalar and Aggregation Functions¶
Added CURRENT_CATALOG function which for CrateDB always returns
crate, to enhance compatibility with PostgreSQL JDBC driver.Improved performance of avg(column) function when operating on columns of NUMERIC type with enabled Column store.
Added support for NUMERIC type to the avg(DISTINCT column) function.
Added support for NUMERIC type to the following arithmetic scalar functions: POWER, DEGREES and RADIANS.
Added support for the array_overlap scalar function and the associated && operator.
Added support for the stddev_pop(column) function to compute the population standard deviation.
Added support for NUMERIC type to the stddev_pop(column) function.
Added support for the stddev_samp(column) function to compute the sample standard deviation.
Replaced t-digest algorithm used by percentile aggregation from
AVLTreeDigesttoMergingDigestto improve the consistency and accuracy of the result.
Performance and Resilience Improvements¶
Changed how scheduling and prioritization for read queries, in particular queries against sys.shards and sys.nodes work. This should help continue monitoring a cluster that is overloaded with too many concurrent queries.
Improved the logic to push down expressions within the
WHEREclause to the table to use index lookups instead of resulting in post-filtering when using virtual tables involving table functions and object columns. For example, the following case now gets optimized:SELECT * FROM ( SELECT *, unnest(document['arr']) FROM tbl ) t WHERE document['field1'] >=1;
Improved the performance for
SELECT COUNT(not_null_column) FROM tbl. It is now executed the same way asSELECT COUNT(*) FROM tbl.Improved the handling of temporary unavailable shards during read-only queries. There’s now a higher chance that the system can deal with the temporary failure without surfacing the error to the client.
Improved execution for queries with mixed implicit and explicit joins. Joins are now always executed in the original order of the query.
Improved the performance of the queries involving
= ALLarray operator.Improved the performance of shard recovery in certain cases, where a shard has become idle after 5 minutes of no write activity.
Increased the frequency of retention lease synchronization, which will make merges more likely to be able to remove recovery source.
Administration and Operations¶
Added
merge_idandfully_merged_docscolumns to the sys.segments table to give more information on ongoing merges and whether the recovery source is still present in merged segments.Added the sys.cluster_health table to provide information about the health of the whole cluster in comparison to the sys.health table which exposes health about each table only.
Added a insert_select_fail_fast session setting that allows partial failures of
INSERT FROM SELECTstatements.Added
last_write_beforecolumn to the sys.shards table, reporting a timestamp before which the last write operation to the shard has taken place.Re-enabled mapping.depth.limit setting to enforce a maximum nesting depth for object columns when adding new columns to a table. The default value is
100. The limit can be increased if necessary, but use with caution as deeply nested structures may lead to long execution times or stack overflow errors.