Version 4.0.0¶
Released on 2019/06/25.
Note
If you are upgrading a cluster, you must be running CrateDB 3.0.4 or higher before you upgrade to 4.0.0.
We recommend that you upgrade to the latest 3.3 release before moving to 4.0.0.
An upgrade to Version 4.0.0 requires a full restart upgrade.
When restarting, CrateDB will migrate indexes to a newer format. Depending on the amount of data, this may delay node start-up time.
Please consult the Upgrade Notes before upgrading.
Warning
Tables that were created prior CrateDB 3.x will not function with 4.x and must be recreated before moving to 4.x.x.
You can recreate tables using COPY TO and COPY FROM or by
inserting the data into a new table.
Before upgrading, you should back up your data.
Table of Contents
Upgrade Notes¶
Discovery Changes¶
This version of CrateDB uses a new cluster coordination (discovery) implementation which improves resiliency and master election times. A new voting mechanism is used when a node is removed or added which makes the system capable of automatically maintaining an optimal level of fault tolerance even in situations of network partitions.
This eliminates the need of the easily miss configured minimum_master_nodes
setting.
Additionally a rare resiliency failure, recorded as Repeated cluster partitions can cause cluster state updates to be lost can no longer occur.
Due to this some discovery settings are added, renamed and removed.
Old Name |
New Name |
|---|---|
New, required on upgrade. |
|
|
|
|
|
|
Removed |
|
Removed |
|
Removed |
|
Removed |
|
Removed |
Caution
The cluster.initial_master_nodes setting is required to be set at production (non loopback bound) clusters on upgrade, see the setting documentation for details.
Note
Only a single port value is allowed for each discovery.seed_hosts
setting entry. Defining a port range as it was allowed but ignored in
previous versions under the old setting name
discovery.zen.ping.unicast.hosts, will be rejected.
Note
CrateDB will refuse to start when it encounters an unknown setting, like the
above mentioned removed ones. Please make sure to adjust your crate.yml
or CMD arguments upfront.
Breaking Changes¶
General¶
Renamed CrateDB data types to the corresponding PostgreSQL data types.
Current Name
New Name
shortsmallintlongbigintfloatrealdoubledouble precisionbytecharstringtexttimestamptimestamp with time zoneSee Data types for more detailed information. The old data type names, are registered as aliases for backward comparability.
Changed the ordering of columns to be based on their position in the CREATE TABLE statement. This was done to improve compatibility with PostgreSQL and will affect queries like
SELECT * FROMorINSERT INTO <table> VALUES (...)Changed the default Column policy on tables from
dynamictostrict. Columns of type object still default todynamic.Removed the implicit soft limit of 10000 that was applied for clients using
HTTP.Dropped support for Java versions < 11
Removed Settings¶
Removed the deprecated setting
cluster.graceful_stop.reallocate.Removed the deprecated
http.enabledsetting.HTTPis now always enabled and can no longer be disabled.Removed the deprecated
license.identsetting. Licenses must be set using the SET LICENSE statement.Removed the deprecated
license.enterprisesetting. To use CrateDB without any enterprise features one should use the community edition instead.Removed the experimental
enable_semijoinsession setting. As this defaulted to false, this execution strategy cannot be used anymore.Removed the possibility of configuring the AWS S3 repository client via the
crate.yamlconfiguration file and command line arguments. Please, use the CREATE REPOSITORY statement parameters for this purpose.Removed HDFS repository setting:
concurrent_streamsas it is no longer supported.The
zen1related discovery settings mentioned in Discovery Changes.
System table changes¶
Changed the layout of the
versioncolumn in theinformation_schema.tablesandinformation_schema.table_partitionstables. The version is now displayed directly undercreatedandupgraded. Thecratedbandelasticsearchsub-category has been removed.Removed deprecated metrics from sys.nodes:
Metric name
fs['disks']['reads']fs['disks']['bytes_read']fs['disks']['writes']fs['disks']['bytes_written']os['cpu']['system']os['cpu']['user']os['cpu']['idle']os['cpu']['stolen']process['cpu']['user']process['cpu']['system']Renamed column
information_schema.table_partitions.schema_nametotable_schema.Renamed
information_schema.columns.user_defined_type_*columns toinformation_schema_columns.udt_*for SQL standard compatibility.Changed type of column
information_schema.columns.is_generatedtoSTRINGwith valueNEVERorALWAYSfor SQL standard compatibility.
Removed Functionality¶
The Elasticsearch REST API has been removed.
Removed the deprecated
ingestframework, including theMQTTendpoint.Removed the HTTP pipelining functionality. We are not aware of any client using this functionality.
Removed the deprecated average duration and query frequency JMX metrics. The total counts and sum of durations as documented in QueryStats MBean should be used instead.
Removed the deprecated
ON DUPLICATE KEYsyntax of INSERT statements. Users can migrate to theON CONFLICTsyntax.Removed the
indexthread-pool and thebulkalias for thewritethread-pool. The JMXgetBulkproperty of theThreadPoolsbean has been renamed toogetWrite.Removed deprecated
nGram,edgeNGramtoken filter andhtmlStripchar filter, they are superseded byngram,edge_ngramandhtml_strip.Removed the deprecated
USR2signal handling. Use ALTER CLUSTER DECOMMISSION instead. Be aware that the behavior of sendingUSR2signals to a CrateDB process is now undefined and up to the JVM. In some cases it may still terminate the instance but without clean shutdown.
Deprecations¶
Deprecate the usage of the _version column for Optimistic Concurrency Control in favour of the _seq_no and _primary_term columns.
Deprecate the usage of the
TIMESTAMPalias data type as a timestamp with time zone, use the TIMESTAMP WITH TIME ZONE or theTIMESTAMPTZdata type alias instead. TheTIMESTAMPdata type will be an equivalent to data type without time zone in futureCrateDBreleases.Marked SynonymFilter tokenizer as deprecated.
Marked LowerCase tokenizer as deprecated.
Changes¶
SQL Standard and PostgreSQL compatibility improvements¶
Added support for using relation aliases with column aliases. Example:
SELECT x, y from unnest([1], ['a']) as u(x, y)Added support for column Default clause for CREATE TABLE.
Extended the support for window functions. The
PARTITION BYdefinition and theCURRENT ROW -> UNBOUNDED FOLLOWINGframe definitions are now supported.Added the string_agg(column, delimiter) aggregation function.
Added support for SQL Standard Timestamp Format to the Dates and times.
Added the TIMESTAMP WITHOUT TIME ZONE data type.
Added the TIMESTAMPTZ alias for the TIMESTAMP WITH TIME ZONE data type.
Added support for the type ‘string’ cast operator, which is used to initialize a constant of an arbitrary type.
Added the pg_catalog.pg_get_userbyid() scalar function to enhance PostgreSQL compatibility.
Enabled scalar function evaluation when used in the query FROM clause in place of a relation.
Show the session setting description in the output of the
SHOW ALLstatement.Added information for the internal PostgreSQL data type:
namein pg_catalog.pg_type for improved PostgreSQL compatibility.Added the pg_catalog.pg_settings table.
Added support for String literals with C-Style escapes.
Added trim scalar function that trims the (leading, trailing or both) set of characters from an input string.
Added string_to_array scalar function that splits an input string into an array of string elements using a separator and a null-string.
Added missing PostgreSQL type mapping for the
array(ip)collection type.Added current_setting system information scalar function that yields the current value of the setting.
Allow User-defined functions to be registered against the
pg_catalogschema. This also extends CURRENT_SCHEMA to be addressable withpg_catalogincluded.Added quote_ident scalar function that quotes a string if it is needed.
Users and Access Control¶
Mask sensitive user account information in sys.repositories for repository types:
azure,s3.Restrict access to log entries in sys.jobs and sys.jobs_log to the current user. This doesn’t apply to superusers.
Added a new
Administration Language (AL)privilege type which allows users to manage other users and useSET GLOBAL. See Privileges.
Repositories and Snapshots¶
Added support for the Azure Storage repositories.
Changed the default value of the
fsrepository type settingcompress, totrue. See fs repository parameters.Improved resiliency of the CREATE SNAPSHOT operation.
Performance and resiliency improvements¶
Exposed the _seq_no and _primary_term system columns which can be used for Optimistic Concurrency Control. By introducing _seq_no and _primary_term, the following resiliency issues were fixed:
Predicates like
abs(x) = 1which require a scalar function evaluation and cannot operate on table indices directly are now candidates for the query cache. This can result in order of magnitude performance increases on subsequent queries.Routing awareness attributes are now also taken into consideration for primary key lookups. (Queries like
SELECT * FROM t WHERE pk = 1)Changed the circuit breaker logic to measure the real heap usage instead of the memory reserved by child circuit breakers. This should reduce the chance of nodes running into an out of memory error.
Added a new optimization that allows to run predicates on top of views or sub-queries more efficiently in some cases.
Others¶
Added support for dynamical reloading of SSL certificates. See Configuring the Keystore.
Added
minimum_index_compatibility_versionandminimum_wire_compatibility_versionto sys.version to expose the current state of the node’s index and wire protocol version as part of the sys.nodes table.Upgraded to Lucene 8.0.0, and as part of this the BM25 scoring has changed. The order of the scores remain the same, but the values of the scores differ. Fulltext queries including
_scorefilters may behave slightly different.Added a new
_docidsystem column.Added support for subscript expressions on an object column of a sub-relation. Examples:
SELECT a['b'] FROM (SELECT a FROM t1)orSELECT a['b'] FROM my_viewwheremy_viewis defined asSELECT a FROM t1.