Hybrid Index¶
CrateDB indexes all columns by default, for lightning-fast query responses on your fingertips.
Overview
CrateDB, like Lucene, Elasticsearch, and Rockset, indexes all fields of stored documents by default, yielding instant query performance on everything.
About
By default, CrateDB indexes all data in every field, and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.
The ability to use the per-field data structures to assemble and return search results is what makes CrateDB so fast.
Details
For a quick refresh about the technologies behind the storage engine of CrateDB, let us refer you to to a few upstream documentations and articles about Lucene and Elasticsearch.
See also an article by Rockset, which refers to the same powerful indexing regime, claiming that paradigm would be a unique invention.
On disk, CrateDB stores data into Lucene indexes. By default, all fields are indexed, nested or not, but the indexing can be turned off selectively.
Reference Manual
Index Types Index Everything Fast Query Execution
Usage¶
Handling data types in the most efficient way, for maximum usability, is built into CrateDB. You automatically leverage its indexing data structures by submitting SQL queries to the execution engine.
Learn¶
Articles about CrateDB’s uniqueness as an “index everything by default” database, insights into the technologies behind, and also comparing it with solutions from other vendors.
Blog: Time Series Benchmark on CrateDB and MongoDB
When using CrateDB, it’s like you’ve stumbled into an alternative reality where Elastic is a proper database. [1]
– Henrik Ingo, Nyrkiö Oy, independent database consultant, MongoDB
About the revolutionary idea to index all columns, in order to make all queries equally fast, unlocking completely ad hoc exploratory querying.
I knew that Rockset had developed a service where they would index every column by default, based on their innovative LSM indexing structure, making such a revolutionary idea even possible. CrateDB is now the second product I’ve heard of offering this feature – and now with Rockset being acquired and shutting down […]
Also about benchmarking CrateDB against MongoDB using the Distributed Systems Infrastructure (DSI) benchmark framework and the TimescaleDB Time Series Benchmark Suite (TSBS).
Benchmark
Converged Indexing
Query Performance
Note
This page is currently under construction. It only includes the most basic essentials, and needs expansion. For example, the “Synopsis” section is missing completely, and the “Usage” section is a bit thin.