Skip to content

Lucene Engine

CrateDB's fully distributed query engine is built on top of Apache Lucene®. Lucene engine supports CrateDB's core infrastructure for storage and indexing.

CrateDB utilizes Lucene to evenly distribute tabular data across the cluster into append-only shards. Lucene enhances SQL performance with full-text search and geospatial search, enabling easy scaling and dynamic schemas.

In CrateDB, every table is sharded, meaning that tables are divided and distributed across the cluster nodes. Each shard in CrateDB is a Lucene index broken down into segments, which are physically stored in a directory accessible to the node that manages the shards. The append-only nature of these segments ensures data immutability on disk, simplifying tasks like data replication, data recovery, and shard synchronization.


Product documentation

Storage and consistency

Additional resources

White Paper

The unique architecture of CrateDB allows it to prioritize scalability, performance and cost-efficiency at the same time, giving the industry the ability to access the power of their data.


November 12, 2021
In this article series, we look at CrateDB from different perspectives. We start from the bottom of CrateDB architecture and gradually move up to higher layers, presenting the most important aspects of CrateDB internals. 


December 16, 2022
In this article we will go through the basic concepts of Lucene, such as Lucene segments, refresh and flush operation and introduce the concept of translog that guarantees that write operations are persistent to disk.

On-demand Workshop 2023

Introduction to CrateDB and its Architecture 

Want to know more?