Use of the Lucene engine in CrateDB for enhanced storage and indexing

CrateDB's fully distributed query engine is built on top of Apache Lucene®. Lucene engine supports CrateDB's core infrastructure for storage and indexing.

CrateDB utilizes Lucene to evenly distribute tabular data across the cluster into append-only shards. Lucene enhances SQL performance with full-text search and geospatial search, enabling easy scaling and dynamic schemas.

In CrateDB, every table is sharded, meaning that tables are divided and distributed across the cluster nodes. Each shard in CrateDB is a Lucene index broken down into segments, which are physically stored in a directory accessible to the node that manages the shards. The append-only nature of these segments ensures data immutability on disk, simplifying tasks like data replication, data recovery, and shard synchronization.

White Paper

CrateDB: Architecture Guide

The unique architecture of CrateDB allows it to prioritize scalability, performance and cost-efficiency at the same time, giving the industry the ability to access the power of their data.

Blog

Indexing and Storage in CrateDB

In this article series, we look at CrateDB from different perspectives. We start from the bottom of CrateDB architecture and gradually move up to higher layers, presenting the most important aspects of CrateDB internals.

Blog

Guide to write operations in CrateDB

In this article we will go through the basic concepts of Lucene, such as Lucene segments, refresh and flush operation and introduce the concept of translog that guarantees that write operations are persistent to disk.