Version 3.0.0¶
Released on 2018/05/16.
Note
If you are upgrading a cluster, you must be running CrateDB 2.0.4 or higher before you upgrade to 3.0.0.
We recommend that you upgrade to the latest 2.3 release before moving to 3.0.0.
You cannot perform a rolling upgrade to this version. Any upgrade to this version will require a full restart upgrade.
When restarting, CrateDB will migrate indexes to a newer format. Depending on the amount of data, this may delay node start-up time.
Please consult the Upgrade Notes before upgrading.
Warning
Tables that were created prior to upgrading to CrateDB 2.x will not function with 3.0 and must be recreated before moving to 3.0.x.
You can recreate tables using COPY TO and COPY FROM while running a
2.x release into a new table, or by inserting the data into a new table.
Before upgrading, you should back up your data.
Table of contents
Changelog¶
Breaking Changes¶
Dropped support for tables that have been created with CrateDB prior to version 2.0. Tables which require upgrading are indicated in the cluster checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release. The upgrade of tables needs to happen before updating CrateDB to this version. This can be done by exporting the data with
COPY TOand importing it into a new table withCOPY FROM. Alternatively you can useINSERTwith query.Data paths as defined in
path.datamust not contain the cluster name as a folder. Data paths which are not compatible with this version are indicated in the node checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release.The
regionsetting forCREATE REPOSITORYhas been removed. It is automatically inferred but can still be manually specified by using theendpointsetting.Store level throttling settings
indices.store.throttle.*have been removed.The gateway recovery table setting
recovery.initial_shardshas been removed. Nodes will recover their unassigned local primary shards immediately after restart.The discovery setting
discovery.typehas been removed. To enable EC2 discovery, thediscovery.zen.hosts_providersetting must be set toec2.Dropped support for reading AWS credentials used for S3 and EC2 discovery from environment variables
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYas well as Java system propertiesaws.accessKeyIdandaws.secretKey.EC2
cloud.aws.*settings have been renamed todiscovery.ec2.*.The setting that controls system call filters
bootstrap.seccomphas been has been renamed tobootstrap.system_call_filter.The columns
number_of_shards,number_of_replicas, andself_referencing_column_nameininformation_schema.tableschanged to returnNULLfor non-sharded tables.Adapted queries in the Admin UI to be compatible with CrateDB 3.0 and greater.
For HTTP authentication, support was dropped for the
X-Userheader, used to provide a username, which has been deprecated in2.3.0.in favour of the standard HTTPAuthorizationheader.The
error_traceGET parameter of the HTTP endpoint only allowstrueandfalsein lower case. Other values are not allowed any more and will result in a parsing exception.The
_nodecolumn onsys.shardsandsys.operationshas been renamed tonode, is now visible by default and has been trimmed to only includenode['id']andnode['name']. In order to get all information a join query can be used withsys.nodes.
Changes¶
CrateDB is now based on Elasticsearch 6.1.4 and Lucene 7.1.0.
Multiple Admin UI improvements.
Added a new tab for views in the Admin UI which lists available views and their properties.
Updated the bundled CrateDB Shell (
crash) to version0.24.0which adds support for default schema for connections.Added support in the PostgreSQL Wire Protocol’s SimpleQuery mode to process a query string which contains multiple queries delimited by semicolons.
Added support for
DEALLOCATEstatement which is used by certain PostgreSQL Wire Protocol clients (e.g. libpq) to deallocate a prepared statement and release its resources.Added support for ordering on analysed columns and partition columns.
Added support for views which can be created using the new
CREATE VIEWstatement and dropped using theDROP VIEWstatement. Views are listed ininformation_schema.viewsand they show up ininformation_schema.tablesas well asinformation_schema.columns.Enterprise: Added the VIEW privilege class which can be used to grant/deny access to views.
Added support for
INSERT INTO ... ON CONFLICT DO NOTHING. The statement ignores insert values which would cause duplicate keys.Added support for
ON CONFLICTclause in insert statements.INSERT INTO ... ON CONFLICT (pk_col) DO UPDATE SET col = valis identical toINSERT INTO ... ON DUPLICATE KEY UPDATE col = val. The specialEXCLUDEDtable can be used to refer to the insert values:INSERT INTO ... ON CONFLICT (pk_col) DO UPDATE SET col = EXCLUDED.colDEPRECATED: The
ON DUPLICATE KEY UPDATEclause has been deprecated in favor of theON CONFLICT DO UPDATE SETclause.Implemented the Block Hash Join algorithm which is now used for Equi-Joins.
Added new
sys.healthsystem information table to expose the health of all tables and table partitions.Added new
cluster.routing.allocation.disk.watermark.flood_stagesetting, that controls at which disk usage indices should become read-only to prevent running out of disk space. There is also a new node check that indicates whether the threshold is exceeded.Added a new
bengalilanguage analyzer and abengali_normalizationtoken filter.Add
max_token_lengthparameter to whitespace tokenizer.Added new tokenizers
simple_patternandsimple_pattern_splitwhich allow to tokenize text for the fulltext index by a regular expression pattern.Added support for CSV file inputs in
COPY FROMstatements. Input type is inferred using the file’s extension or can be set using the optionalWITHclause and specifying theformat.Fully qualified column names including a schema name will no longer match on table aliases.
The default user if enterprise is disabled changed from
nulltocrate. This causes entries insys.jobsto show up withcrateas username. Functions likeuserwill also returncrateif enterprise is enabled but the user module is not available.Display the node information (name and id) of jobs in the
sys.jobstable.Changed the primary key constraints of the information schema tables
table_constraints,referential_constraints,table_partitions,key_column_usage,columns, andtablesto be SQL compliant.Arrays can now contain mixed types if they’re safely convertible. JSON libraries tend to encode values like
[0.0, 1.2]as[0, 1.2], this caused an error because of the strict type match we enforced before.Implemented
constraint_schemaandtable_schemaininformation_schema.key_column_usagecorrectly and documented the full table schema.Statistics for jobs and operations are enabled by default. If you don’t need any statistics, please set
stats.enabledtofalse.Changed
BEGINandSET SESSIONto no longer requireDQLpermissions on theCLUSTERlevel.Added
epochargument to theEXTRACTfunction which returns the number of seconds since Jan 1, 1970. For example:extract(epoch from '1970-01-01T00:00:01')returns1.0seconds.Enable logging of JVM garbage collection times that help to debug memory pressure and garbage collection issues. GC log files are stored separately to the standard CrateDB logs and the files are log-rotated.
CrateDB will now by default create a heap dump in case of a crash caused by an out of memory error. This makes it necessary to account for the additional disk space requirements.
Implemented a
Readynode status JMX metric expressing if the node is ready for processing SQL statements.Implemented a
NodeInfoJMX MBean to expose useful information (id, name) about the node.Fixed path of log file name in rotation pattern in
log4j2.properties. It now writes into the correct logging directory instead of the parent directory.ALTER TABLE <name> OPENwill now wait for all shards to become active before returning to be consistent with the behaviour of other statements.Added note about the newly available
JMX HTTP Exporterto the monitoring documentation section.The first argument (
field) of theEXTRACTfunction has been limited to string literals and identifiers, as it was documented.
Upgrade Notes¶
Configuration Changes¶
There are a few configuration changes that you should be aware of before restarting the nodes.
Removed Settings¶
All store level throttle settings (under
indices.store.throttle.*) have been removed, and should be removed from your node configuration.Similarly, the
recovery.initial_shardsconfiguration option has been removed, and should also be removed from your configuration.
Renamed Settings¶
The
discovery.typesetting which was previously used to specify whether a cluster should use DNS discovery or the EC2 API, has been removed. Configuring the use of the EC2 API has now been moved to thediscovery.zen.hosts_providersetting.The
bootstrap.seccompsetting, which controls system call filters, has been renamed tobootstrap.system_call_filter.
Altered Settings¶
The
path.datasetting specifies the path or paths where the CrateDB node should store its table data and cluster metadata.In CrateDB 3.0.0 and later, this path must not contain the cluster name as a directory. For example, if you have set
cluster.name: abcdef, the settingpath.data: /mnt/abcdef/datawould be incompatible. Moving or renaming the directory, such as to/mnt/data, and altering yourpath.datasetting accordingly will allow you to continue using the node’s data.Data paths that are incompatible with 3.0.0 will be indicated visually in the Admin UI if you are running the latest 2.2.x or 2.3.x release.
Other Changes¶
The
CREATE REPOSITORYstatement for creating backup repositories has been changed.Previously, when using Amazon S3 for backup storage, bucket regions had to be configured explicitly. Bucket regions are now inferred automatically.
If you want to override this, you can use the endpoint parameter.
Previously, the
X-UserHTTP header could be used to provide a username. This head is now deprecated in favour of the standard HTTP Authorization header.The
_nodecolumn in thesys.shardsandsys.operationstables has been renamed tonode.Additionally,
nodeobject now only includesidandnameof the node, i.e.node['id']andnode['name'].To get the full node information, use
node['id']to join thesys.nodestable.