Storage
In today’s data-driven world, how you store, manage, and access your data determines the speed of insights, the cost of infrastructure, and the resilience of your operations. CrateDB’s storage architecture is built from the ground up to deliver real-time performance, high throughput ingestion, and efficient data retention, whether you’re running telemetry at the edge, logs in a data centre, or analytics in the cloud.
Behind the scenes, CrateDB combines columnar + row storage, automatic sharding & partitioning, replication & backup, and features such as compression and data-tiering, to give you full control over data volume, access speed, and cost. This unified storage layer empowers you to scale your analytics without sacrificing operational simplicity or performance.

Columnar & row-based storage
CrateDB uses a hybrid storage engine that supports both row-oriented and column-oriented data layouts.
- Row storage is optimized for fast inserts and transactional workloads.
- Columnar storage delivers high-performance aggregations and analytical queries.
Sharding
To scale horizontally, CrateDB divides each table into multiple shards, distributing them across cluster nodes.
This design enables:
- Parallel query execution and data ingestion.
- Automatic load balancing and fault isolation.
- Seamless scalability: simply add nodes to increase capacity.
Partitioning
Partitioning allows you to organize large tables based on time or value, improving manageability and performance.
- Efficiently drop or archive old partitions to optimize retention.
- Improve query performance on time-filtered datasets.
- Enable data tiering by assigning partitions to different storage classes.
Replication
CrateDB protects your data through replication, maintaining multiple copies across nodes for high availability and fault tolerance.
If a node fails, another replica immediately takes over, ensuring no data loss and uninterrupted analytics. Replication also supports maintenance operations and upgrades without downtime.
Compression
Compression happens automatically, enabling you to reduce costs and query faster, even with massive, long-term datasets.
Consistency & durability
CrateDB enforces strong atomicity, consistency, and durability principles while offering eventual consistency for distributed search operations, optimizing both reliability and speed.
Every write operation is atomic at the row level, ensuring that changes are either fully committed or rolled back.
Data is persisted through write-ahead logging and replicated for durability across nodes.
Data tiering
Not all data needs to live in the same storage class. With Hot, Warm, and Cold tiers, CrateDB enables you to balance cost, performance, and retention.
Keep frequently accessed data in the Hot tier for instant queries, move older data to Warm or Cold tiers, and still query all of it using standard SQL.
Backup & restore
CrateDB’s snapshot-based backup system protects your data from loss or corruption.
You can create incremental backups, store them locally or in cloud repositories, and restore entire clusters, tables, or partitions in seconds.
This ensures compliance with retention policies and keeps business continuity simple and reliable.
Why it matters
- Real-time analytics at any scale: Handle continuous ingestion and complex queries without latency.
- Resilient by design: Built-in replication, durability, and backup guarantee uptime and recoverability.
- Cost-efficient growth: Compression and data tiering reduce infrastructure costs without losing access to historical data.
- Operational simplicity: A single, unified storage engine eliminates the complexity of managing multiple systems or manual optimizations.