CrateDB users should upgrade to 5.9.9, 5.8.6, or 5.7.6 to avoid a potential data loss issue that may occur when the maximum amount of shards per node limit has been changed.
Issue description and impact
When trying to increase the number of shards:
ALTER TABLE t1 PARTITION (a = 1) SET (number_of_shards = ?);
The operation will silently fail and drop the entire partition, if:
routing.allocation.total_shards_per_node
is changed from the default and
number_of_shards
was higher than what is possible to allocate, given the limit previously configured by the user.
As a result, all data in the partition is deleted from the disk.
For more details, see https://github.com/crate/crate/issues/17278
Affected versions
All actively supported versions are affected by this issue:
5.9.8 and lower
5.8.5 and lower
5.7.5 and lower
Versions older than 5.7 are no longer supported. While they have not been specifically tested, it is probably older versions are also affected. Users still on these versions should upgrade to 5.9.9.
WITH
shards_per_node AS (
SELECT settings['routing']['allocation']['total_shards_per_node'] AS shards_per_node
FROM information_schema.tables
UNION ALL
SELECT settings['routing']['allocation']['total_shards_per_node'] AS shards_per_node
FROM information_schema.table_partitions
)
SELECT SUM(IF (shards_per_node > 0, 1, 0)) > 0 AS affected
FROM shards_per_node;
Workaround
We recommend taking the following steps to prevent the issue from occurring:
- Do not change
total-shards-per-node
- If the value has been changed, do not increase the number of shards
- Temporarily remove DDL permissions from all admins, you can issue the following command:
REVOKE DDL ON TABLE mytable FROM myadmin
Remediation
Upgrading to CrateDB 5.9.9, 5.8.6, and 5.7.6 prevents this issue from occurring. There is no remediation if data loss already occurred.
Please reach out to CrateDB Support if you have any questions or issues with performing the steps above.
Please be aware that CrateDB 5.9.9 is currently unavailable in CrateDB Cloud due to a compatibility issue with the official Docker image. We are working on resolving this as soon as possible.