Distributed Database
CrateDB is a distributed database, which means that data is stored on multiple nodes in a network (see also shared-nothing architecture). In a CrateDB cluster, data is equally distributed through automatic rebalancing, and its distributed SQL query engine allows for aggregations, JOINs, sub-selects, and ad-hoc queries to be performed at in-memory speed. CrateDB also integrates native, full-text search features, which enable you to store and query structured or unstructured data together. Therefore, you no longer have to use separate SQL and Search databases to manage tabular and non-tabular data.

Benefits of a distributed database
- Performance and availability
- Cost-effectiveness
- Scalability
- Fault tolerance
- Data consistency
- Flexibility
Distributed SQL queries
CrateDB uses ANSI SQL as its query language for data querying and manipulation. This reduces the learning curve and allows users to focus on query logic rather than dealing with the details of a distributed system and a proprietary query language. Users can also write user-defined functions to manipulate data.
SQL statements are translated into a series of processing steps, optimized for efficiency. CrateDB's execution involves logical and physical plans that guide data retrieval from distributed nodes. The execution layer distributes these plans across nodes for parallel processing. This approach ensures effective and scalable query execution in a distributed database environment.
Additional resources
On-demand Workshop 2023
Introduction to CrateDB and its Architecture
Timestamp: 14:01–16:40
CrateDB at Berlin Buzzwords 2023
When milliseconds matter: maximizing query performance in CrateDB.
Timestamp: 1:00 – 1:28
Blog
Distributed query execution in CrateDB: What you need to know
Learn how CrateDB generates execution plans, and the optimizations influence the order of operators.
