Skip to content
Features

Indexing, Columnar Storage, and Aggregations 

CrateDB uses Lucene as a storage engine enabling fast search queries and aggregations. By default, everything gets indexed using various index methods (full-text, geospatial, BKD-trees) for each data type and stored in a column store, also referred to as doc values, to improve query speed for aggregations across large data sets.

Columnar Storage

In CrateDB, each value term is stored in a column store by default, alongside storing the row data as-is and indexing each value automatically. A column store provides various advantages such as improving global aggregations and groupings and enabling the possibility of ordering as data for one column is packed at one place.

CREATE TABLE t1 (
  id INTEGER,
  url TEXT INDEX OFF STORAGE WITH (columnstore = false)
);

Indexing

CrateDB supports various indexing methods for different data types. All columns are indexed by default, although one can disable or adjust the indexing on a column level. Text columns are using a plain inverted index by default that doesn’t analyze the text. One can also define a full-text index with custom analyzers. Numeric values, including geopoints, are indexed using BKD-trees. The efficient indexing through Lucene provides fast search across billions of records.

CrateDB-Inverted-Index

Object Storage and Indexing

CrateDB dynamically maps each sub-column to a data type and stores primitive object properties as sub-columns using the above mention column store. Each sub-column is also indexed by default. This dynamic mapping can be disabled by the user.

// obj JSON
{
   "MachineID" : "drill001",
   "Sensors" : [
     { "name" : "temp1", "value": 21.5 },
     { "name" : "temp2", "value": 20.8 },
     { "name" : "accel", "value": 3.2 }
  ],
   "Events" : { "type" : "info"}
}
// find all Machines that emit an INFO event and provide a temperature value

SELECT obj['MachineID'] as "machine"
FROM my_devices
WHERE obj['Events']['type'] = 'info'
AND 'temp1' = ANY(obj['Sensors']['name']);

Aggregations

Aggregation functions (e.g., count (*), GROUP BY) in CrateDB are used to calculate a single summary value for one or more columns of data. Aggregate functions can be used to compute results across all rows or to return a single summary value for each group of column values.

Product documentation

Fulltext Indices

Column Store

Column Policy

Aggregations

Additional resources

CrateDB at Big Data Conference Europe 2022

Not all Time-Series are Equal – Challenges of Storing and Analyzing Industrial Data

Timestamp:  24:06 – 27:00

Blog

Indexing and Storage in CrateDB

In this article series, we look from the bottom of CrateDB architecture and gradually move up to higher layers, focusing on different CrateDB internals.

Read more
CREATE TABLE documents_b (title TEXT, author OBJECT(DYNAMIC) AS (name TEXT, birthday TIMESTAMP WITH TIME ZONE), INDEX author_title_ft USING FULLTEXT(title, author['name']));

Handling Dynamic Objects in CrateDB

Learn the fundamentals of objects in CrateDB and how to use different object types depending on your use case.

Read more
A CrateDB CREATE TABLE query

Need to know more about indexing or aggregations?