Indexing, Columnar Storage, and Aggregations
CrateDB uses Lucene as a storage engine enabling fast search queries and aggregations. By default, everything gets indexed using various index methods (full-text, geospatial, BKD-trees) for each data type and stored in a column store, also referred to as doc values, to improve query speed for aggregations across large data sets.
Columnar Storage
In CrateDB, each value term is stored in a column store by default, alongside storing the row data as-is and indexing each value automatically. A column store provides various advantages such as improving global aggregations and groupings and enabling the possibility of ordering as data for one column is packed at one place.
CREATE TABLE t1 ( id INTEGER, url TEXT INDEX OFF STORAGE WITH (columnstore = false) );
Indexing
CrateDB supports various indexing methods for different data types. All columns are indexed by default, although one can disable or adjust the indexing on a column level. Text columns are using a plain inverted index by default that doesn’t analyze the text. One can also define a full-text index with custom analyzers. Numeric values, including geopoints, are indexed using BKD-trees. The efficient indexing through Lucene provides fast search across billions of records.

Object Storage and Indexing
CrateDB dynamically maps each sub-column to a data type and stores primitive object properties as sub-columns using the above mention column store. Each sub-column is also indexed by default. This dynamic mapping can be disabled by the user.
// obj JSON { "MachineID" : "drill001", "Sensors" : [ { "name" : "temp1", "value": 21.5 }, { "name" : "temp2", "value": 20.8 }, { "name" : "accel", "value": 3.2 } ], "Events" : { "type" : "info"} }
// find all Machines that emit an INFO event and provide a temperature value SELECT obj['MachineID'] as "machine" FROM my_devices WHERE obj['Events']['type'] = 'info' AND 'temp1' = ANY(obj['Sensors']['name']);
Additional resources
CrateDB at Big Data Conference Europe 2022
Not all Time-Series are Equal – Challenges of Storing and Analyzing Industrial Data
Timestamp: 24:06 – 27:00
Blog
Indexing and Storage in CrateDB
In this article series, we look from the bottom of CrateDB architecture and gradually move up to higher layers, focusing on different CrateDB internals.
![CREATE TABLE documents_b (title TEXT, author OBJECT(DYNAMIC) AS (name TEXT, birthday TIMESTAMP WITH TIME ZONE), INDEX author_title_ft USING FULLTEXT(title, author['name']));](https://crate.io/hs-fs/hubfs/Screenshot%202021-11-12%20at%2013.52.30.png?width=600&height=292&name=Screenshot%202021-11-12%20at%2013.52.30.png)
Handling Dynamic Objects in CrateDB
Learn the fundamentals of objects in CrateDB and how to use different object types depending on your use case.
