Full-text search in CrateDB
The standard architecture for full-text search is Elasticsearch or OpenSearch running alongside your primary database; two systems to operate, data duplicated across both, and a synchronization pipeline that lags behind every write. Every schema change requires coordinating two systems. Every query that needs both search relevance and structured analytics requires results to be fetched separately and merged in application code.
CrateDB's full-text search engine is built on Apache Lucene and runs inside the same distributed SQL engine as your analytics. You define a full-text index at table creation time and query it with standard SQL. No separate cluster, no synchronization, no dual-write.
Full-text search examples
All use the standard SQL MATCH function.
No Elasticsearch DSL, no separate query language, no additional client library.
/* Create a full-text index across multiple columns with a language-specific analyzer
at table creation time. CrateDB handles tokenization automatically.
The index is available for querying from the moment the first document is inserted. */
/* The table stores data about netflix movies and series. The composite fulltext
* index is defined on two columns with the analyzer based on English language. */
CREATE TABLE "netflix_catalog" (
"show_id" TEXT PRIMARY KEY,
"type" TEXT,
"title" TEXT,
"director" TEXT,
"cast" ARRAY(TEXT),
"country" TEXT,
"date_added" TIMESTAMP,
"release_year" TEXT,
"rating" TEXT,
"duration" TEXT,
"listed_in" ARRAY(TEXT),
"description" TEXT,
INDEX title_director_description_ft using FULLTEXT (title, director, description)
WITH (analyzer = 'english')
);
/* Weight specific fields more heavily in relevance scoring using the caret notation.
Title matches score higher than description matches.
This is standard BM25 relevance ranking, the same algorithm Elasticsearch uses. */
/* The query retuns data from the "netflix_catalog" table, specifically searching
* for movies where the 'title', 'director', or 'description' fields best match the
* term 'Friday', with a higher emphasis on matches in the title, and orders the results
* by their relevance score in descending order. */
SELECT show_id, title, director, country, release_year, rating, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'title^2 Friday') USING best_fields
AND type='Movie'
ORDER BY _score DESC;
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| show_id | title | director | country | release_year | rating | _score |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| s1674 | Black Friday | Anurag Kashyap | India | 2004 | TV-MA | 5.6455536 |
| s6805 | Friday the 13th | Marcus Nispel | United States | 2009 | R | 3.226806 |
| s1038 | Tuesdays & Fridays | Taranveer Singh | India | 2021 | TV-14 | 3.1089375 |
| s7494 | Monster High: Friday Night Frights | Dustin McKenzie | United States | 2013 | TV-Y7 | 3.0620003 |
| s3226 | Little Singham: Mahabali | Prakash Satam | NULL | 2019 | TV-Y7 | 3.002901 |
| s8233 | The Bye Bye Man | Stacy Title | United States, China | 2017 | PG-13 | 2.9638999 |
| s8225 | The Brawler | Ken Kushner | United States | 2019 | TV-MA | 2.8108454 |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
/* Apply per-column boost factors to tune relevance across multiple fields in a single query.
Useful when different fields have different importance for a given search context. */
/* This query searches for records where the combined fields of title, director, and
* description match the phrase 'mission impossible', with a different weights given to
* title and description. The results are ordered by their relevance score in desc order. */
SELECT show_id, type, title, director, country, rating, description, _score
FROM "netflix_catalog"
WHERE MATCH((title_director_description_ft, title 1.5, description 1), 'mission impossible')
ORDER BY _score DESC;
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | type | title | director | country | rating | description | _score |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1000 | Movie | Stowaway | Joe Penna | Germany, United States | TV-MA | A three-person crew on a mission to Mars faces an impossible choice when an unplanned passenger jeopardizes the lives of everyone on board. | 5.0469 |
| s1768 | Movie | The Impossible | J.A. Bayona | Spain, Thailand, United States | PG-13 | Tracking one family's harrowing experiences, this gripping drama depicts the chaos generated by the massive 2004 tsunami in Southeast Asia. | 3.1496503 |
| s7873 | TV Show | Rishta.com | NULL | India | TV-14 | Partners at an Indian matrimonial agency face endlessly challenging and often impossible demands as they help clients make the perfect match. | 3.131972 |
| s3242 | TV Show | Nailed It! Holiday! | NULL | United States | TV-PG | It's the "Nailed It!" holiday special you've been waiting for, with missing ingredients, impossible asks and desserts that look delightfully sad. | 3.1265335 |
| s1032 | Movie | Into the Beat | Stefan Westerwelle | Germany | TV-14 | A teen ballerina discovers hip-hop by chance and is faced with an impossible choice: Does she follow her parents' footsteps... or her newfound passion? | 2.9678147 |
| s1583 | Movie | Triple 9 | John Hillcoat | United States | R | A group of dirty Atlanta cops blackmailed by the Russian mob plan a seemingly impossible heist that sets off an explosive chain reaction of violence. | 2.9678147 |
| s848 | TV Show | Mad for Each Other | NULL | NULL | TV-MA | Bothered to realize they are next-door neighbors and share a psychiatrist, a man and a woman find it's impossible to stay out of each other's way. | 2.9385366 |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
/* Search for specific phrases and filter results by minimum relevance score.
Only documents scoring above the threshold are returned,
giving precise control over result quality. */
/* The query searches records where the combined title, director, and description
* fields phrase-prefix match 'award winning' with a relevance score of at least 10,
* and orders the results by their relevance score in descending order. */
SELECT show_id, type, title, country, rating, duration, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'award winning') USING phrase_prefix
AND _score >= 10
ORDER BY _score DESC;
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
| show_id | type | title | country | rating | duration | _score |
| ------- | -------- | ----------------------------------------| --------------- | ------ | ---------- | --------- |
| s2423 | TV Show | Queer Eye | United States | TV-14 | 5 Seasons | 56.835224 |
| s5930 | Movie | Print the Legend | United States | TV-14 | 100 min | 56.835224 |
| s849 | TV Show | Master of None | United States | TV-MA | 3 Seasons | 54.720444 |
| s7369 | TV Show | Mad Men | United States | TV-14 | 7 Seasons | 53.722008 |
| s4556 | Movie | The Meaning of Monty Python | United Kingdom | TV-MA | 60 min | 51.933487 |
| s6342 | TV Show | Blue Planet II | United Kingdom | TV-G | 1 Season | 51.933487 |
| s4162 | Movie | Surga Yang Tak Dirindukan 2 | Indonesia | TV-14 | 119 min | 50.937637 |
| s5186 | TV Show | Alias Grace | Canada | TV-MA | 1 Season | 41.05056 |
| s4184 | TV Show | The World's Most Extraordinary Homes | United Kingdom | TV-G | 3 Seasons | 39.506954 |
| s6111 | Movie | Alejandro Sanz: What I Was Is What I Am | Spain | TV-MA | 102 min | 38.77893 |
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
/* Handle spelling variations and typos using configurable fuzziness parameters.
CrateDB returns relevant results even when search terms contain errors,
using the same fuzzy matching algorithm as Elasticsearch. */
/* This query uses the composite index to match the term 'scence fction'
* (with a likely typo in 'science fiction'), allowing for some fuzziness in matching,
* using the English analyzer, requiring at least one term to match. */
SELECT show_id, title, country, description, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'scence fction')
USING best_fields
WITH (
fuzziness=1,
analyzer='english',
minimum_should_match=1,
operator='or'
)
ORDER BY _score DESC;
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | title | country | description | _score |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1604 | Alien Worlds | United Kingdom | Applying the laws of life on Earth to the rest of the galaxy, this series blends science fact and fiction to imagine alien life on other planets. | 4.052918 |
| s2589 | Oh Yuck | Canada | Using a concoction of cartoons, comedy and live action, Dr. Yuck and his eccentric lab mates investigate the science behind the planet's ickiest things. | 3.8219523 |
| s7848 | Red vs. Blue | United States | This parody of first-person shooter games, military life and science-fiction films centers on a civil war fought in the middle of a desolate canyon. | 3.8219523 |
| s3054 | Border Security: America's Front Line | Canada | This reality series chronicles the behind-the-scenes action as U.S. Customs and Border Protection agents safeguard the nation's crossings and ports. | 3.5455647 |
| s5870 | Turbo FAST | United States | Join Turbo and his posse, the Fast Action Stunt Team. Filled with comedy, action and fun, this series amps it up to the extreme. | 2.695207 |
| s3574 | Our Planet - Behind The Scenes | United Kingdom | Years spent recording footage of creatures from every corner of the globe is bound to produce a bit of drama. Here's a behind-the-scenes look. | 2.5541513 |
| s8016 | Sid the Science Kid | United States | Armed with a healthy sense of humor and the help of his teacher, friends and family, curious kid Sid tackles questions youngsters have about science. | 2.5541513 |
| s4917 | Bill Nye: Science Guy | United States | The dynamic, bow-tied host behind the young adult science show faces climate-change skeptics as he demonstrates the importance of scientific evidence. | 2.49332 |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
Full-text search with real-time indexing
CrateDB's full-text search engine is built on Apache Lucene, the same indexing library that powers Elasticsearch. Define a full-text index with your preferred analyzer at table creation time and CrateDB automatically tokenizes content on ingestion. New documents become searchable within milliseconds of arrival. No separate indexing job, no batch delay, no refresh interval to tune.
Advanced search expressions
CrateDB supports the full range of Lucene query expressions through standard SQL: boolean logic for combining conditions, phrase and proximity search for positional matching, wildcard and prefix search for partial terms, fuzzy matching for spelling variations, and range queries for score-bounded retrieval. These expressions work on any indexed field including nested JSON attributes, without custom query languages or specialized client libraries.
Search and analytics in one query
Full-text search results can be filtered, aggregated, and joined with structured data in the same SQL statement. You can search across logs, count results by dimension, correlate matches with time-series metrics, and combine full-text relevance with vector similarity in a single query. Teams running Elasticsearch alongside an analytical database to achieve this maintain two systems. CrateDB does it in one.
Distributed search at scale
Full-text queries execute in parallel across all cluster nodes simultaneously. Indexing is distributed too. Documents are tokenized and indexed across the cluster as they arrive, with no central indexing bottleneck. Performance stays consistent as the dataset grows from millions to billions of documents.
CrateDB handles full-text search alongside time-series, JSON, vector, geospatial, and relational data in the same engine. No separate pipelines.
Full-text search in production
"I'm glad it's SQL behind those charts. If I had to use Elasticsearch to answer new questions, we wouldn't be nearly as responsive to new requirements."
Joe Hacobian
Infrastructure Engineer
Digital Domain
Unlocking the Power of Semantic Search
Unlocking the Power of Semantic Search
Learn how CrateDB integrates full-text search, geospatial data, and vector similarity for hybrid search using SQL, faster, more contextually relevant results without multiple systems.
Talk: CrateDB — a database or a search engine?
Talk: CrateDB — a database or a search engine?
Workshop: Modeling data in CrateDB
Workshop: Modeling data in CrateDB
Go deeper on full-text search in CrateDB
Documentation
Documentation
Related use cases
Page
Workshop
Related pages
Page
FAQ
Full-text search indexes the entire textual content of documents, articles, or records, making all text fields searchable. This allows users to perform comprehensive searches across large volumes of unstructured or semi-structured text. CrateDB enhances this functionality with its high-performance, scalable architecture, ensuring rapid indexing and efficient retrieval even with substantial data volumes.
Full-text search techniques include boolean logic, wildcard searches, phrase searches, proximity searches, and fuzzy searches, which enhance the depth and accuracy of search results. CrateDB leverages these capabilities to provide robust full-text search functionality with real-time indexing and efficient query performance.
Full-text search databases allow users to search for words throughout the entire text body of documents, enhancing research depth and precision. Examples of databases optimized for full-text search include CrateDB, MongoDB, Elasticsearch, and PostgreSQL. CrateDB's distributed shared-nothing architecture is built on top of Apache Lucene and excels in performing efficient distributed full-text searches across diverse datasets, ensuring scalability to meet growing search demands.
Both CrateDB and Elasticsearch are built on Apache Lucene and support the same core full-text capabilities — BM25 relevance scoring, analyzers, fuzzy matching, boolean queries, and phrase search. The key difference is architecture. Elasticsearch is a dedicated search engine that requires a separate operational database for analytical workloads. CrateDB is a distributed analytical database that includes full-text search as a native capability. Teams that need full-text search alongside real-time analytics, time-series queries, vector search, or structured SQL analytics can do all of that in CrateDB without synchronizing data between systems. Teams whose entire workload is full-text search on documents may find Elasticsearch's richer search-specific features more appropriate.
Yes. CrateDB supports hybrid queries that combine full-text MATCH with vector knn_match, structured SQL filters, and aggregations in a single statement. This is useful for search applications where both keyword relevance and semantic similarity matter — for example, finding documents that contain specific terms and are semantically similar to a reference document. No application-side result merging is required.