Skip to content
Data

Automatic Indexing

Instant access to fresh data, No manual tuning required.

CrateDB automatically indexes every piece of data as it’s ingested, ensuring your queries stay fast, even as your datasets grow by billions of records. There’s no need to define indexes manually, manage schema changes, or tune performance. With automatic indexing, CrateDB combines the simplicity of SQL with the speed of modern search and analytics, giving you real-time results without operational overhead.

Why automatic indexing matters

When ingesting data, it’s not always clear which queries you’ll need in the future.
Use cases evolve, new business requirements emerge, and query patterns change over time.

CrateDB removes the complexity of anticipating index strategies. By automatically indexing every attribute by default, it ensures fast query responses for any query type, even when your data and workloads evolve.

  • Instantly searchable data: New records are indexed within seconds of ingestion.
  • Zero manual configuration: Indexing happens automatically, with smart defaults for every data type.
  • Real-time analytics: Query freshly ingested data without delay.
  • Operational simplicity: No DBA maintenance or reindexing required.

 

cr-quote-image

How automatic indexing works

CrateDB’s indexing subsystem is built on top of the Lucene search engine, deeply integrated into its distributed SQL architecture.

As data is inserted or updated:

  1. CrateDB automatically indexes each field, including nested JSON, text, vectors, and geospatial data.
  2. Indexes are distributed across nodes and shards for parallel query execution.
  3. Queries use these indexes transparently, combining search, filtering, and aggregation logic in one SQL operation.

This design delivers the best of both worlds, the expressiveness of SQL and the performance of a search engine.

CrateDB applies specialized indexing techniques tailored to each data type, ensuring optimal performance and flexibility across all workloads: 

  • Inverted indexes for text values
  • Block K-D Trees (BKD) for numeric, date, and geospatial values
  • HNSW (Hierarchical Navigable Small World) graphs for vector data
cr-quote-image

Inverted indexes for text values

Used for efficient search and filtering on text columns, enabling fast matches, wildcards, and regular expressions.

  • Text columns use a plain inverted index by default (no text analysis).
  • You can define full-text indexes with custom analyzers for advanced linguistic search.

CrateDB supports:

  • 30+ languages, 11 analyzers, 15 tokenizers, and 35+ token filters,
  • Plus full customization for analyzers and tokenizers, enabling features like fuzzy search, phrase queries, and attribute boosting.

CrateDB-Inverted-Index

cr-quote-image

Block K-D Trees (BKD) for numeric, date, and geospatial values

Highly efficient, IO-optimized indexes for structured and location-based data.

  • Numeric values (including geopoints) are indexed using BKD-trees, which combine on-disk block structures with a lightweight in-memory binary tree for quick block lookups.
  • This hybrid design delivers excellent query and update performance, even after millions of updates or large data movements.

 

cr-quote-image

HNSW (Hierarchical Navigable Small World) graphs for vector data

For high-dimensional vectors used in AI and semantic search, CrateDB uses HNSW-based approximate nearest neighbor (ANN) indexing.
This enables fast similarity search using SQL’s KNN_MATCH() — ideal for embeddings, recommendations, and hybrid vector/text search.
cr-quote-image

Built for every data model

CrateDB indexes all supported data types automatically — from structured tables to complex objects:

Data type Indexing benefit
Relational Fast lookups and aggregations on standard SQL tables
Time-series Automatic time-based indexing for efficient rollups and trend analysis
JSON / Object Each nested key-value pair indexed individually for flexible, ad-hoc queries
Full-text Rich search capabilities with analyzers, scoring, and tokenization
Geospatial Spatial queries using within(), distance(), and intersects()
Vector Efficient approximate nearest neighbor (ANN) queries via KNN_MATCH()

With CrateDB, any data you ingest is immediately queryable — regardless of format or structure.

cr-quote-image

Real-time queries on fresh data

CrateDB’s real-time ingestion pipeline and automatic indexing work hand in hand.
As soon as new events, logs, or sensor readings are written, they’re available for

  • Aggregations
  • Search queries
  • Vector similarity
  • Geospatial filters
  • JOINs across multiple data sources
There’s no waiting for batch jobs or refresh intervals — new data is live within seconds.
cr-quote-image

Advantages of automatic indexing

Challenge CrateDB solution
Manual index design Fully automated, adaptive indexing
Delayed query access to new data Real-time indexing on ingest
Performance degradation at scale Distributed indexing across all nodes
Complex schema evolution Dynamic field handling and nested object indexing
Separate search infrastructure Integrated Lucene-based engine inside the database

Automatic indexing turns CrateDB into a truly operational database, equally suited for analytics, search, and AI workloads.

cr-quote-image

Example: query instantly after ingest

// obj JSON
{
   "MachineID" : "drill001",
   "Sensors" : [
     { "name" : "temp1", "value": 21.5 },
     { "name" : "temp2", "value": 20.8 },
     { "name" : "accel", "value": 3.2 }
  ],
   "Events" : { "type" : "info"}
}
// find all Machines that emit an INFO event and provide a temperature value

SELECT obj['MachineID'] as "machine"
FROM my_devices
WHERE obj['Events']['type'] = 'info'
AND 'temp1' = ANY(obj['Sensors']['name']);

CrateDB architecture guide

This comprehensive guide covers all the key concepts you need to know about CrateDB's architecture. It will help you gain a deeper understanding of what makes it performant, scalable, flexible and easy to use. Armed with this knowledge, you will be better equipped to make informed decisions about when to leverage CrateDB for your data projects. 

CrateDB-Architecture-Guide-Cover

Additional resources

Want to learn more?