Storage

In today’s data-driven world, how you store, manage, and access your data determines the speed of insights, the cost of infrastructure, and the resilience of your operations. CrateDB’s storage architecture is built from the ground up to deliver real-time performance, high throughput ingestion, and efficient data retention, whether you’re running telemetry at the edge, logs in a data centre, or analytics in the cloud.

Behind the scenes, CrateDB combines columnar + row storage, automatic sharding & partitioning, replication & backup, and features such as compression and data-tiering, to give you full control over data volume, access speed, and cost. This unified storage layer empowers you to scale your analytics without sacrificing operational simplicity or performance.

Columnar & row-based storage

CrateDB uses a hybrid storage engine that supports both row-oriented and column-oriented data layouts.

Row storage is optimized for fast inserts and transactional workloads.
Columnar storage delivers high-performance aggregations and analytical queries.

CrateDB automatically leverages the right structure for your workload, combining ingestion speed with analytical power.

Sharding

To scale horizontally, CrateDB divides each table into multiple shards, distributing them across cluster nodes.
This design enables:

Parallel query execution and data ingestion.
Automatic load balancing and fault isolation.
Seamless scalability: simply add nodes to increase capacity.

Sharding happens automatically, but you retain full control over shard count and allocation for advanced tuning.

Partitioning

Partitioning allows you to organize large tables based on time or value, improving manageability and performance.

Efficiently drop or archive old partitions to optimize retention.
Improve query performance on time-filtered datasets.
Enable data tiering by assigning partitions to different storage classes.

CrateDB makes partitioning simple and transparent, ensuring you can manage data lifecycles at scale.

Replication

CrateDB protects your data through replication, maintaining multiple copies across nodes for high availability and fault tolerance.

If a node fails, another replica immediately takes over, ensuring no data loss and uninterrupted analytics. Replication also supports maintenance operations and upgrades without downtime.

Compression

CrateDB’s built-in columnar compression minimizes disk space while improving I/O efficiency, allowing you to store more data without performance trade-offs.
Compression happens automatically, enabling you to reduce costs and query faster, even with massive, long-term datasets.

Consistency & durability

CrateDB enforces strong atomicity, consistency, and durability principles while offering eventual consistency for distributed search operations, optimizing both reliability and speed.

Every write operation is atomic at the row level, ensuring that changes are either fully committed or rolled back.

Data is persisted through write-ahead logging and replicated for durability across nodes.

Data tiering

Not all data needs to live in the same storage class. With Hot, Warm, and Cold tiers, CrateDB enables you to balance cost, performance, and retention.

Keep frequently accessed data in the Hot tier for instant queries, move older data to Warm or Cold tiers, and still query all of it using standard SQL.

Backup & restore

CrateDB’s snapshot-based backup system protects your data from loss or corruption.
You can create incremental backups, store them locally or in cloud repositories, and restore entire clusters, tables, or partitions in seconds.

This ensures compliance with retention policies and keeps business continuity simple and reliable.

Why it matters

Real-time analytics at any scale: Handle continuous ingestion and complex queries without latency.
Resilient by design: Built-in replication, durability, and backup guarantee uptime and recoverability.
Cost-efficient growth: Compression and data tiering reduce infrastructure costs without losing access to historical data.
Operational simplicity: A single, unified storage engine eliminates the complexity of managing multiple systems or manual optimizations.