Skip to content
Features

Partitioning

In CrateDB, tables can be split up into partitions. Each partition consists of one or more shards. Partitioning is essential for maintaining large tables, improving specific SQL operation performance, and optimizing queries by reducing the number of records to be searched. CrateDB automatically optimizes partitions by splitting them into a specified number of shards upon partition creation. A typical use-case for partitioning is organizing time-series data by month, quarter, or year. As soon as a unique combination of the partition key is inserted, a new partition gets automatically created.

CrateDB's partitioning strategy involves treating each table partition as a separate table with its own shards. It allows to choose the right partitioning scheme based on the use case and growing demands of an application. For example, the number of shards can be changed on the partitioned table, which will then define how many shards will be used for the next partition creation. This way, you can start with few shards per partition and scale up the number of shards for later partitions once traffic and ingest rates increase during the lifetime of your application.

Queries that contain filters only identify the relevant partitions and significantly lower the execution time as excluded partitions don’t have to be processed. Also deleting data from a partitioned table is cheap, as full partitions can be dropped. Furthermore, partitions can be closed like tables – all operations on closed partitions and tables are ignored and therefore ideal for archiving data that must not yet be deleted, but should not influence the execution of other queries.

The incremental backup strategy of CrateDB also works at the partition level, i.e. partitions can be backed up and restored individually.

CrateDB Partitioning
CREATE TABLE t1 (
name STRING,
month TIMESTAMP
) CLUSTERED INTO 3 SHARDS
PARTITIONED BY (month);

INSERT INTO t1 (name, month) VALUES ( 
('foo', '2023-01-01'),
('bar', '2023-02-01')
);

On-demand webinar

Time-series data: from raw data to fast analysis in only three steps.

CrateDB at Berlin Buzzwords 2023

When milliseconds matter: maximizing query performance in CrateDB.

Timestamp:  9:13 – 9:55

CrateDB Workshop 2023

Modeling Data in CrateDB

Timestamp:  39:27–49:20

Additional resources

Interested in learning more?