Hybrid Index

CrateDB indexes all columns by default, for lightning-fast query responses on your fingertips.

Overview

CrateDB, like Lucene, Elasticsearch, and Rockset, indexes all fields of stored documents by default, yielding instant query performance on everything.

About

By default, CrateDB indexes all data in every field, and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.

The ability to use the per-field data structures to assemble and return search results is what makes CrateDB so fast.

Details

For a quick refresh about the technologies behind the storage engine of CrateDB, let us refer you to to a few upstream documentations and articles about Lucene and Elasticsearch.

See also an article by Rockset, which refers to the same powerful indexing regime, claiming that paradigm would be a unique invention.

On disk, CrateDB stores data into Lucene indexes. By default, all fields are indexed, nested or not, but the indexing can be turned off selectively.

Reference Manual

Index Types Index Everything Fast Query Execution

Usage

Handling data types in the most efficient way, for maximum usability, is built into CrateDB. You automatically leverage its indexing data structures by submitting SQL queries to the execution engine.

Learn

Articles about CrateDB’s uniqueness as an “index everything by default” database, insights into the technologies behind, and also comparing it with solutions from other vendors.

Blog: Indexing and Storage in CrateDB

Blog

Learn about the fundamentals of the CrateDB storage layer, looking at the three main Lucene structures that are used within CrateDB: Inverted Indexes for text values, BKD-trees for numeric values, and Doc Values.

Fundamentals
Converged Indexing Deep Dive

Blog: Time Series Benchmark on CrateDB and MongoDB

Blog Read More

When using CrateDB, it’s like you’ve stumbled into an alternative reality where Elastic is a proper database. [1]

– Henrik Ingo, Nyrkiö Oy, independent database consultant, MongoDB

About the revolutionary idea to index all columns, in order to make all queries equally fast, unlocking completely ad hoc exploratory querying.

I knew that Rockset had developed a service where they would index every column by default, based on their innovative LSM indexing structure, making such a revolutionary idea even possible. CrateDB is now the second product I’ve heard of offering this feature – and now with Rockset being acquired and shutting down […]

Also about benchmarking CrateDB against MongoDB using the Distributed Systems Infrastructure (DSI) benchmark framework and the TimescaleDB Time Series Benchmark Suite (TSBS).

Benchmark
Converged Indexing Query Performance

Note

This page is currently under construction. It only includes the most basic essentials, and needs expansion. For example, the “Synopsis” section is missing completely, and the “Usage” section is a bit thin.