Skip to content
Storage

Partitioning

Manage massive tables efficiently with smart data segmentation.

As data volumes grow, managing large tables efficiently becomes critical for both performance and maintainability.
CrateDB uses partitioning to divide tables into smaller, manageable segments — called partitions — each consisting of one or more shards. This approach improves query performance, simplifies data lifecycle management, and makes it easier to retain or archive historical data without impacting current workloads. Partitioning is particularly useful for time-series or event-driven data, where new records continuously arrive while older data becomes less frequently accessed.

How partitioning works

In CrateDB, each table partition behaves like a separate table under the hood, with its own shards and metadata.

  • Automatic creation: When data with a new partition key value (for example, a new month) is inserted, CrateDB automatically creates a corresponding partition.
  • Shard-based efficiency: Each partition is divided into a defined number of shards, which CrateDB uses to parallelize ingestion and queries.
  • Flexible configuration: You can adjust the number of shards for future partitions to match your workload as it grows.
For instance, you might start with fewer shards per partition during initial deployment, and increase the shard count later as data ingestion or query volume scales up, ensuring efficient use of hardware over time.
cr-quote-image

Benefits of partitioning

Partitioning is a key technique for keeping performance consistent as datasets expand.

  • Accelerated queries: SQL filters automatically identify the relevant partitions, so CrateDB only searches the data that matters, drastically reducing query latency.
  • Faster deletes: Removing old data is easy and fast; entire partitions can be dropped instantly without affecting other data.
  • Archiving support: You can close partitions that no longer need to be queried but must remain stored for compliance or audit purposes. Closed partitions are ignored by the query planner but still available for recovery or reactivation.
  • Independent backup and restore: CrateDB’s incremental backup system operates at the partition level, enabling selective backup and recovery of partitions as needed.
cr-quote-image

Common partitioning strategies

The most common partitioning strategy in CrateDB is time-based partitioning, which organizes data by time intervals such as month, quarter, or year. Example use cases:

  • Time-series data: Store recent data in active partitions (e.g., current month) and archive older partitions for compliance.
  • IoT analytics: Organize sensor data by time to optimize ingestion and queries.
  • Log management: Partition by date to efficiently drop or archive old logs.
Partitioning can also be based on categorical values (for example, customer, region, or device type) depending on the access patterns and business logic.
cr-quote-image

Partitioning and sharding

Each partition contains one or more shards, which are distributed across cluster nodes.
Together, partitioning and sharding form the foundation of CrateDB’s scalability, combining logical data separation (partitions) with physical data distribution (shards).

As your dataset grows, you can increase the shard count for new partitions, enabling seamless horizontal scaling without downtime or manual intervention.
cr-quote-image

Why partitioning matters

  • Faster queries through partition pruning.
  • Simplified data lifecycle with easy retention and deletion.
  • Scalable growth by adjusting partition and shard parameters over time.
  • Operational flexibility to archive, back up, or restore data by partition.
  • Cost efficiency through selective storage and retention strategies.
cr-quote-image
CrateDB Partitioning
CREATE TABLE t1 (
name STRING,
month TIMESTAMP
) CLUSTERED INTO 3 SHARDS
PARTITIONED BY (month);

INSERT INTO t1 (name, month) VALUES ( 
('foo', '2023-01-01'),
('bar', '2023-02-01')
);

CrateDB architecture guide

This comprehensive guide covers all the key concepts you need to know about CrateDB's architecture. It will help you gain a deeper understanding of what makes it performant, scalable, flexible and easy to use. Armed with this knowledge, you will be better equipped to make informed decisions about when to leverage CrateDB for your data projects. 

CrateDB-Architecture-Guide-Cover

Additional resources

Want to learn more?