Live Webinar: From Rockset to CrateDB

Register now
Skip to content
Features

Replication

Replication in CrateDB allows users to replicate data across multiple nodes in a cluster. Data is replicated at the shard level, and replica shards automatically step in as primary shards if the primary one becomes unavailable due to failures or maintenance. Maintaining at least two replicas is recommended to ensure high availability of the CrateDB cluster.

Replication helps increase performance with parallel data query and data availability. Read requests are broken down and executed in parallel across multiple shards on multiple nodes, massively improving read performance.

CrateDB offers multiple configuration options to find the optimal balance between shards, partitions, and replications:

CrateDB ensures efficient data synchronization from the primary shard to all replica shards, leveraging the append-only characteristic of Lucene segments. When a node failure results in the loss of a primary shard, a replica shard is automatically promoted to primary status. This system enhances both fault tolerance and the capacity for potent query execution across multiple nodes.

Read operations are executed on primary shards and their replicas to increase query distribution and enhance performance. CrateDB randomly assigns a shard when routing an operation, with the option to configure this behavior. Write operations follow a synchronous process across all active replicas.

Replication
CREATE TABLE t1 (
name STRING
) CLUSTERED INTO 3 SHARDS
WITH (“number_of_replicas” = '1');

Product documentation

Replication

Additional resources

CrateDB at Berlin Buzzwords 2023

When milliseconds matter: maximizing query performance in CrateDB.

Timestamp:  9:55 – 10:35

Need help with data replication?