In CrateDB, tables can be split up into partitions. Each partition consists of one or more shards. Partitioning is essential for maintaining large tables, improving specific SQL operation performance, and optimizing queries by reducing the number of records to be searched. CrateDB automatically optimizes partitions by splitting them into a specified number of shards upon partition creation. A typical use-case for partitioning is organizing time-series data by month, quarter, or year. As soon as a unique combination of the partition key is inserted, a new partition gets automatically created.
CrateDB's partitioning strategy involves treating each table partition as a separate table with its own shards. It allows to choose the right partitioning scheme based on the use case and growing demands of an application. For example, the number of shards can be changed on the partitioned table, which will then define how many shards will be used for the next partition creation. This way, you can start with few shards per partition and scale up the number of shards for later partitions once traffic and ingest rates increase during the lifetime of your application.
Queries that contain filters only identify the relevant partitions and significantly lower the execution time as excluded partitions don’t have to be processed. Also deleting data from a partitioned table is cheap, as full partitions can be dropped. Furthermore, partitions can be closed like tables – all operations on closed partitions and tables are ignored and therefore ideal for archiving data that must not yet be deleted, but should not influence the execution of other queries.
The incremental backup strategy of CrateDB also works at the partition level, i.e. partitions can be backed up and restored individually.
CREATE TABLE t1 ( name STRING, month TIMESTAMP ) CLUSTERED INTO 3 SHARDS PARTITIONED BY (month); INSERT INTO t1 (name, month) VALUES ( ('foo', '2023-01-01'), ('bar', '2023-02-01') );
Time-series data: from raw data to fast analysis in only three steps.
CrateDB at Berlin Buzzwords 2023
When milliseconds matter: maximizing query performance in CrateDB.
Timestamp: 9:13 – 9:55
CrateDB Workshop 2023
Modeling Data in CrateDB
Guide to sharding and partitioning best practices in CrateDB
Sharding and partitioning are very important concepts when it comes to system scaling. When defining your strategy, you should account upfront for any future growth, given the significant burden of moving data and restructuring the tables.