Distributed database architecture of CrateDB

CrateDB is a distributed database, which means that data is stored on multiple nodes in a network (see also shared-nothing architecture). In a CrateDB cluster, data is equally distributed through automatic rebalancing, and its distributed SQL query engine allows for aggregations, JOINs, sub-selects, and ad-hoc queries to be performed at in-memory speed. CrateDB also integrates native, full-text search features, which enable you to store and query structured or unstructured data together. Therefore, you no longer have to use separate SQL and Search databases to manage tabular and non-tabular data.

Distributed SQL queries

CrateDB uses native SQL as its query language for data querying and manipulation, which reduces the learning curve and allows users to focus on query logic rather than dealing with the details of a distributed system and a proprietary query language. A key feature of CrateDB is its ability to efficiently manage extensive concurrent reads and writes, which is crucial in a distributed system.

Users can also write user-defined functions to manipulate data. SQL statements are translated into a series of processing steps, optimized for efficiency. CrateDB's execution involves logical and physical plans that guide data retrieval from distributed nodes. The execution layer distributes these plans across nodes for parallel processing.

CrateDB’s query engine has been engineered from the outset to optimize data throughput and query performance, especially as the number of concurrent operations grows. This approach ensures effective and scalable query execution in a distributed database environment, allowing users to extract insights and perform actions on vast datasets with unparalleled speed and efficiency.

The engine's advanced indexing techniques, real-time data ingestion, and real-time querying synergize to deliver a seamless and high-performance user experience.