The Guide for Time Series Data Projects is out.

Download now
Skip to content

Lucene Engine

CrateDB's fully distributed query engine is built on top of Apache Lucene®. Lucene engine supports CrateDB's core infrastructure for storage and indexing.

CrateDB utilizes Lucene to evenly distribute tabular data across the cluster into append-only shards. Lucene enhances SQL performance with full-text search and geospatial search, enabling easy scaling and dynamic schemas.

In CrateDB, every table is sharded, meaning that tables are divided and distributed across the cluster nodes. Each shard in CrateDB is a Lucene index broken down into segments, which are physically stored in a directory accessible to the node that manages the shards. The append-only nature of these segments ensures data immutability on disk, simplifying tasks like data replication, data recovery, and shard synchronization.


Product documentation

Storage and consistency

Additional resources

White Paper

CrateDB: Technical overview

The unique architecture of CrateDB allows it to prioritize scalability, performance and cost-efficiency at the same time, giving the industry the ability to access the power of their data.


Indexing and Storage in CrateDB

In this article series, we look at CrateDB from different perspectives. We start from the bottom of CrateDB architecture and gradually move up to higher layers, presenting the most important aspects of CrateDB internals. 


Guide to write operations in CrateDB

In this article we will go through the basic concepts of Lucene, such as Lucene segments, refresh and flush operation and introduce the concept of translog that guarantees that write operations are persistent to disk.

On-demand Workshop 2023

Introduction to CrateDB and its Architecture 

Want to know more?