Version 2.0.0¶
Released on 2017/05/16.
Warning
CrateDB 2.x versions prior 2.0.4 (including this version) contain a critical bug which leads to deletion of blob data upon node shutdown. It is recommended to not install those versions.
Table of contents
Changelog¶
Breaking Changes¶
To accommodate user-defined functions, some new reserved keywords have been added to the CrateDB SQL dialect:
RETURNS,CALLED,REPLACE,FUNCTION,LANGUAGE,INPUTThe
license.enterprisesetting is set totrueby default. This enables the CrateDB Enterprise Edition.Enabling this setting requires a valid enterprise license for production use.
If you disable this setting, CrateDB will run with the standard feature set.
All custom
node.*style attributes must now be written asnode.attr.*to distinguish them from attributes that CrateDB uses internally. Consult the node attribute docs for information.The
node.clientsetting has been removed.The default value of the
node.attr.max_local_storage_nodesnode setting has been changed to1to prevent running multiple nodes on the same data path by default.Previous versions of CrateDB defaulted to allowing up to 50 nodes running on the same data path. This was confusing where users accidentally started multiple nodes and ended up thinking they have lost data because the second node will start with an empty directory.
Running multiple nodes on the same data path tends to be an exception, so this is a safer default.
Parsing support of time values has been changed:
The unit
wrepresenting weeks is no longer supported.Fractional time values (e.g.
0.5s) are no longer supported. For example, this means when setting timeouts,0.5swill be rejected and should instead be input as500ms.
The already unused
path.worknode setting has been removed.The node setting
bootstrap.mlockallhas been renamed tobootstrap.memory_lock.The
keyword_repeatandtype_as_payloadbuilt-in token filter have been removed.The
classicbuilt-in analyzer has been removed.The shard balance related cluster settings
cluster.routing.allocation.balance.primaryandcluster.routing.allocation.balance.replicahave been removed.Some
recoveryrelated cluster settings have been removed or replaced:The
indices.recovery.concurrent_streamscluster setting is now superseded bycluster.routing.allocation.node_concurrent_recoveries.The
indices.recovery.activity_timeoutcluster setting have been renamed toindices.recovery.recovery_activity_timeout.Following
recoverycluster settings have been removed:indices.recovery.file_chunk_sizeindices.recovery.translog_opsindices.recovery.translog_sizeindices.recovery.compress
Logging is now configured by
log4j2.propertiesinstead oflogging.yml.The plugin interface has changed, injecting classes on
shardorindexlevels is no longer supported.It’s no longer possible to run CrateDB as the Unix
rootuser.Some translog related table settings have been removed or replaced:
The
index.translog.interval,translog.disable_flushandtranslog.flush_threshold_periodtable settings have been removed.The
index.translog.sync_intervaltable setting doesn’t accept a value less than100mswhich prevents fsyncing too often ifasyncdurability is enabled. The special value0is no longer supported.The
index.translog.flush_threshold_opstable setting is not supported anymore. In order to control flushes based on the transaction log growth useindex.translog.flush_threshold_sizeinstead.
The
COPY FROMstatement now requires column names to be quoted in the JSON file being imported.Queries on columns with
INDEX OFFwill now fail instead of always resulting in an empty result.Configuration support using system properties has been dropped.
It’s no longer possible to use
Hadoop 1.xas arepositoryfor snapshots.Changed default bind and publish address from
0.0.0.0to the systemloopbackaddresses which will result in CrateDB listening only to local ports.The
discovery.ec2.ping_timeoutsetting has been removed and thediscovery.zen.ping_timeoutsetting is now also used for EC2 discovery.The
monitor.jvm.gc.[old|young].[debug|info|warn]settings used to configure logging of garbage collection have been renamed (addingcollector) tomonitor.jvm.gc.collector.[old|young].[debug|info|warn].Recovery timeout settings changes:
indices.recovery.retry_internal_action_timeouthas been renamed toindices.recovery.internal_action_timeoutindices.recovery.retry_internal_long_action_timeouthas been renamed toindices.recovery.internal_action_long_timeoutindices.recovery.retry_activity_timeouthas been renamed toindices.recovery.recovery_activity_timeout
Thread pool settings prefix have been changed from
threadpooltothread_pool. E.g.:thread_pool.<name>.type.The
cluster nameis not part of the effective path where data is stored anymore.The blobs data directory layout has changed.
Changes¶
Extended the subselect support.
Added support for host based authentication (HBA).
Added support for renaming tables using the
ALTER ... RENAME TO ...statement.Added support for
CREATE USERandDROP USER.Added support for opening and closing a table or single partition.
Information on the state of tables/partitions is now exposed by a new column
closedon theinformation_schema.tablesandinformation_schema.table_partitionstables.Added full support for
DISTINCTon queries whereGROUP BYis present.UDC pings will send
licence.identif defined from now on.Added support for
GROUP BYin combination with subselect. E.g.:SELECT x, COUNT(*) FROM (SELECT x FROM t LIMIT 1) AS tt GROUP BY x;
Implemented hash sum scalar functions (MD5, SHA1). Please see sha1.
Various Admin UI improvements.
Added support for
GROUP BYon joins.Added support for user-defined functions.
Added JavaScript language for user-defined functions.
Added cluster check and warning for unlicensed usage of CrateDB Enterprise.
Added built-in
fingerprint,keep_types,min_hashandserbian_normalizationtoken filter.Added a
fingerprintbuilt-in analyzer.Upgraded to Elasticsearch 5.0.2.
Improved performance of blob stats computation by calculating them in an incremental manner.
Optimized performance of negation queries on
NOT NULLcolumns. E.g.:SELECT * FROM t WHERE not_null_col != 10
Updated documentation to indicate that it’s not possible to use
object,geo_point,geo_shape, orarrayin theORDER BYclause.Removed
psql.enabledandpsql.portsettings fromsys.clusterbecause they where wrongly exposed in this table.Use the region of the EC2 instance for EC2 discovery when neither
cloud.aws.ec2.endpointnorcloud.aws.regionare specified or do not resolve in a valid service endpoint.It is now possible to restore an empty partitioned table.
Added validation that
ORDER BYsymbols are included in theSELECTlist whenDISTINCTis used.
Fixes¶
Fixed an issue which could result in queries being stuck if the thread pools are exhausted.
Fixed an issue which caused failing
sys.snapshotqueries if thedata.pathof an existing fs repository was not configured anymore.Fixed that
sys.snapshotqueries hung instead of throwing an error if something went wrong.
Upgrade Notes¶
Daemon User¶
You can no longer run CrateDB as the superuser on Unix-like systems. You should
create a new crate user for running the CrateDB daemon.
Logging¶
The logging.yml has been removed. You must migrate your Logging
configuration to the new log4j2.properties file.
System Properties¶
You can no longer use the JAVA_OPTIONS or CRATE_JAVA_OPTS environment
variables to pass configuration to CrateDB itself, for example:
JAVA_OPTIONS=-Dcluster.name=crate
Or:
CRATE_JAVA_OPTS=-Dcluster.name=crate
Instead, you must pass these options in on the CLI tools.
You can continue to use the JAVA_OPTIONS and CRATE_JAVA_OPTS
environment variables to set general JVM properties and CrateDB specific JVM
properties, respectively.
Configuration Changes¶
Many configuration settings and files have been renamed or removed. You must review the Breaking Changes section above and update your setup as necessary.
SQL Changes¶
Several breaking changes were made to CrateDB’s SQL. This includes changes to time parsing, syntax changes, and new reserved keywords. You must review the Breaking Changes section above and update your client code as necessary.
Bind Address¶
The default bind address has been changed from 0.0.0.0 to the loopback
address (meaning it will only be accessible on localhost). See
Hosts for more.
If you want to keep the original behaviour (i.e. bind to every available network interface) you must add the following line to your Configuration file:
network.host: 0.0.0.0
Note
If you bind to a network reachable IP address, you must follow the instructions in the new bootstrap checks guide.
Heap Size¶
If you have previously set or configured CRATE_MIN_MEM or CRATE_MAX_MEM
in your startup scripts or environment, you must remove both, and replace them
with a single variable CRATE_HEAP_SIZE. The CRATE_HEAP_SIZE variable sets both the minimum and maximum memory to
allocate, and should be set to whatever your previous CRATE_MAX_MEM was set
to.
Cluster name in path data¶
The computation of the effective data directory path has changed in a way that
the cluster name is not part of the path anymore. In previous versions it was
$PATH_DATA_DIR/$CLUSTER_NAME/nodes/ and now it is
$PATH_DATA_DIR/nodes/. There’s a fallback that still accepts the old data
structure, which will be removed in future versions of CrateDB. It will be
required that the data directory is either moved to the new location or the
path.data setting gets changed to point to the old location by appending
the cluster name to it (e.g /data/ becomes
/data/yourclustername). Therefore it’s not possible anymore for multiple
clusters to share the exact same path.data directory.
Boolean Data Type¶
Tables that have been created with CrateDB version 0.54.x or smaller and
that contain a column of type BOOLEAN must be re-created to be able to
perform all supported operations on that column.