Skip to content
Login
Try for free
Login
Try for free
Features

Sharding

Sharding involves splitting datasets into smaller units called shards, which are stored on individual nodes within the cluster. When nodes are added, removed, or data distribution becomes uneven, CrateDB automatically redistributes the shards to maintain balanced data distribution. If the number of shards is not defined, CrateDB sets a default value based on the number of nodes in the cluster.

Working with the right number of shards per table allows to optimize the performance of your application by leveraging the most of the existing hardware – ideally you have the same number of shards per table on every node as the number of available CPUs on each node. If the number of shards is not defined, CrateDB sets a default value based on the number of available CPUs on the nodes of the cluster. In very heavy read/write scenarios, advanced configuration options help to avoid any bottlenecks, for example by defining dedicated master-eligible nodes that don’t hold data, but only the cluster configuration. 

In addition to partitioning, you can seamlessly scale your cluster horizontally to cope with any growing application workload, by creating more shards for more recent partitions. For example, in a constantly growing application, the next weekly or monthly partition can be split in more shards than the previous one to distribute the workload across a larger number of nodes. 

The ideal shard size is between 3-70 GB, depending on your use case. Smaller shard sizes help reduce the restore time of nodes in case of automatic failovers and recoveries.

CrateDB-Shard-size
CrateDB-Sharding
CREATE TABLE t1 (
name STRING
) CLUSTERED INTO 3 SHARDS;

Product documentation

Sharding

Shard allocation filtering

Sharding guide

Additional resources

On-demand webinar

Time-series data: from raw data to fast analysis in only three steps.

Timestamp: 9:32 – 14:33

CrateDB at Berlin Buzzwords 2023

When milliseconds matter: maximizing query performance in CrateDB.

Timestamp:  8:33 – 9:12

Blog

Guide to sharding and partitioning best practices in CrateDB

Sharding and partitioning are very important concepts when it comes to system scaling. When defining your strategy, you should account upfront for any future growth, given the significant burden of moving data and restructuring the tables.

Read more
connection between partitions, shards, and the Lucene index.

Need help with sizing your cluster?