Features
Distributed Reads and Writes
CrateDB's approach to distributed reads and writes is crucial for its high performance. By leveraging advanced mechanisms such as columnar storage, efficient indexing per datatype, full-text indexes, and similarity search, CrateDB is able to achieve exceptional performance.
Distributed Reads
- Focus on distribution: CrateDB's design prioritizes distribution, leading to operations being split across shards and their replicas. This strategy accelerates aggregations by utilizing available hardware on individual nodes and distributing queries across all nodes.
- Aggregation approach: CrateDB's approach to aggregation involves four steps - collect, reshuffle, aggregate, and merge. The unique distribution layer in CrateDB reshuffles intermediate responses between nodes to speed up processing.
- Hash and distribute: Each node hashes the returned values and distributes each row to other nodes based on the hash value. This ensures that no other node has data with the same values, optimizing the execution of aggregations in a distributed environment.
Distributed Writes
- Sharding mechanism: CrateDB employs a sharding mechanism that distributes data across multiple nodes in a cluster. This optimally distributes the workload across available CPUs and enables parallel write operations.
- Lucene index: Each shard in CrateDB is associated with a Lucene index. When new data is written, it first goes to an in-memory buffer. Then upon reaching a specific size, this data is flushed and appended to a new Lucene segment on disk.
- Segment merging: CrateDB employs periodic segment merging as a background operation to optimize storage and performance. Merging smaller segments into larger ones reduces the overhead of managing multiple segments and enhances read efficiency.
- Replication: In addition to sharding, CrateDB replicates data at the shard level across various nodes, ensuring data availability and fault tolerance. Only when the data is written to the primary shard and replicated to the required number of replica shards, as per the defined replication factor, are write operations deemed successful.
CrateDB at Berlin Buzzwords 2023
When milliseconds matter: maximizing query performance in CrateDB.Timestamp: 1:00 – 1:28
CrateDB Architecture Guide
This comprehensive guide covers all the key concepts you need to know about CrateDB's architecture. It will help you gain a deeper understanding of what makes it performant, scalable, flexible and easy to use. Armed with this knowledge, you will be better equipped to make informed decisions about when to leverage CrateDB for your data projects.