Full-text Data
CrateDB makes it easy to index, store, and retrieve massive volumes of unstructured, semi-structured, and structured data. Its built-in search engine delivers fast, accurate, and scalable full-text search, directly inside a distributed SQL database.
Full-text search with SQL
Hyper-fast. Queries in milliseconds.
/* The table stores data about netflix movies and series. The composite fulltext
* index is defined on two columns with the analyzer based on English language. */
CREATE TABLE "netflix_catalog" (
"show_id" TEXT PRIMARY KEY,
"type" TEXT,
"title" TEXT,
"director" TEXT,
"cast" ARRAY(TEXT),
"country" TEXT,
"date_added" TIMESTAMP,
"release_year" TEXT,
"rating" TEXT,
"duration" TEXT,
"listed_in" ARRAY(TEXT),
"description" TEXT,
INDEX title_director_description_ft using FULLTEXT (title, director, description)
WITH (analyzer = 'english')
);
/* The query retuns data from the "netflix_catalog" table, specifically searching
* for movies where the 'title', 'director', or 'description' fields best match the
* term 'Friday', with a higher emphasis on matches in the title, and orders the results
* by their relevance score in descending order. */
SELECT show_id, title, director, country, release_year, rating, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'title^2 Friday') USING best_fields
AND type='Movie'
ORDER BY _score DESC;
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| show_id | title | director | country | release_year | rating | _score |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| s1674 | Black Friday | Anurag Kashyap | India | 2004 | TV-MA | 5.6455536 |
| s6805 | Friday the 13th | Marcus Nispel | United States | 2009 | R | 3.226806 |
| s1038 | Tuesdays & Fridays | Taranveer Singh | India | 2021 | TV-14 | 3.1089375 |
| s7494 | Monster High: Friday Night Frights | Dustin McKenzie | United States | 2013 | TV-Y7 | 3.0620003 |
| s3226 | Little Singham: Mahabali | Prakash Satam | NULL | 2019 | TV-Y7 | 3.002901 |
| s8233 | The Bye Bye Man | Stacy Title | United States, China | 2017 | PG-13 | 2.9638999 |
| s8225 | The Brawler | Ken Kushner | United States | 2019 | TV-MA | 2.8108454 |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
/* This query searches for records where the combined fields of title, director, and
* description match the phrase 'mission impossible', with a different weights given to
* title and description. The results are ordered by their relevance score in desc order. */
SELECT show_id, type, title, director, country, rating, description, _score
FROM "netflix_catalog"
WHERE MATCH((title_director_description_ft, title 1.5, description 1), 'mission impossible')
ORDER BY _score DESC;
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | type | title | director | country | rating | description | _score |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1000 | Movie | Stowaway | Joe Penna | Germany, United States | TV-MA | A three-person crew on a mission to Mars faces an impossible choice when an unplanned passenger jeopardizes the lives of everyone on board. | 5.0469 |
| s1768 | Movie | The Impossible | J.A. Bayona | Spain, Thailand, United States | PG-13 | Tracking one family's harrowing experiences, this gripping drama depicts the chaos generated by the massive 2004 tsunami in Southeast Asia. | 3.1496503 |
| s7873 | TV Show | Rishta.com | NULL | India | TV-14 | Partners at an Indian matrimonial agency face endlessly challenging and often impossible demands as they help clients make the perfect match. | 3.131972 |
| s3242 | TV Show | Nailed It! Holiday! | NULL | United States | TV-PG | It's the "Nailed It!" holiday special you've been waiting for, with missing ingredients, impossible asks and desserts that look delightfully sad. | 3.1265335 |
| s1032 | Movie | Into the Beat | Stefan Westerwelle | Germany | TV-14 | A teen ballerina discovers hip-hop by chance and is faced with an impossible choice: Does she follow her parents' footsteps... or her newfound passion? | 2.9678147 |
| s1583 | Movie | Triple 9 | John Hillcoat | United States | R | A group of dirty Atlanta cops blackmailed by the Russian mob plan a seemingly impossible heist that sets off an explosive chain reaction of violence. | 2.9678147 |
| s848 | TV Show | Mad for Each Other | NULL | NULL | TV-MA | Bothered to realize they are next-door neighbors and share a psychiatrist, a man and a woman find it's impossible to stay out of each other's way. | 2.9385366 |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
/* The query searches records where the combined title, director, and description
* fields phrase-prefix match 'award winning' with a relevance score of at least 10,
* and orders the results by their relevance score in descending order. */
SELECT show_id, type, title, country, rating, duration, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'award winning') USING phrase_prefix
AND _score >= 10
ORDER BY _score DESC;
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
| show_id | type | title | country | rating | duration | _score |
| ------- | -------- | ----------------------------------------| --------------- | ------ | ---------- | --------- |
| s2423 | TV Show | Queer Eye | United States | TV-14 | 5 Seasons | 56.835224 |
| s5930 | Movie | Print the Legend | United States | TV-14 | 100 min | 56.835224 |
| s849 | TV Show | Master of None | United States | TV-MA | 3 Seasons | 54.720444 |
| s7369 | TV Show | Mad Men | United States | TV-14 | 7 Seasons | 53.722008 |
| s4556 | Movie | The Meaning of Monty Python | United Kingdom | TV-MA | 60 min | 51.933487 |
| s6342 | TV Show | Blue Planet II | United Kingdom | TV-G | 1 Season | 51.933487 |
| s4162 | Movie | Surga Yang Tak Dirindukan 2 | Indonesia | TV-14 | 119 min | 50.937637 |
| s5186 | TV Show | Alias Grace | Canada | TV-MA | 1 Season | 41.05056 |
| s4184 | TV Show | The World's Most Extraordinary Homes | United Kingdom | TV-G | 3 Seasons | 39.506954 |
| s6111 | Movie | Alejandro Sanz: What I Was Is What I Am | Spain | TV-MA | 102 min | 38.77893 |
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
/* This query uses the composite index to match the term 'scence fction'
* (with a likely typo in 'science fiction'), allowing for some fuzziness in matching,
* using the English analyzer, requiring at least one term to match. */
SELECT show_id, title, country, description, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'scence fction')
USING best_fields
WITH (
fuzziness=1,
analyzer='english',
minimum_should_match=1,
operator='or'
)
ORDER BY _score DESC;
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | title | country | description | _score |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1604 | Alien Worlds | United Kingdom | Applying the laws of life on Earth to the rest of the galaxy, this series blends science fact and fiction to imagine alien life on other planets. | 4.052918 |
| s2589 | Oh Yuck | Canada | Using a concoction of cartoons, comedy and live action, Dr. Yuck and his eccentric lab mates investigate the science behind the planet's ickiest things. | 3.8219523 |
| s7848 | Red vs. Blue | United States | This parody of first-person shooter games, military life and science-fiction films centers on a civil war fought in the middle of a desolate canyon. | 3.8219523 |
| s3054 | Border Security: America's Front Line | Canada | This reality series chronicles the behind-the-scenes action as U.S. Customs and Border Protection agents safeguard the nation's crossings and ports. | 3.5455647 |
| s5870 | Turbo FAST | United States | Join Turbo and his posse, the Fast Action Stunt Team. Filled with comedy, action and fun, this series amps it up to the extreme. | 2.695207 |
| s3574 | Our Planet - Behind The Scenes | United Kingdom | Years spent recording footage of creatures from every corner of the globe is bound to produce a bit of drama. Here's a behind-the-scenes look. | 2.5541513 |
| s8016 | Sid the Science Kid | United States | Armed with a healthy sense of humor and the help of his teacher, friends and family, curious kid Sid tackles questions youngsters have about science. | 2.5541513 |
| s4917 | Bill Nye: Science Guy | United States | The dynamic, bow-tied host behind the young adult science show faces climate-change skeptics as he demonstrates the importance of scientific evidence. | 2.49332 |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
Full-text search with real-time indexing
CrateDB provides robust full-text search capabilities powered by Apache Lucene®, enabling millisecond-fast retrieval across documents, logs, messages, product descriptions, and any text-heavy dataset.
Define a full-text index with your preferred analyzer at table creation, and CrateDB automatically tokenizes content for optimized search. Combined with real-time ingestion, new data becomes searchable within seconds, supporting continuously updated applications.
Advanced search expressions
Go far beyond simple keyword matching. CrateDB supports advanced search techniques, including:
- Boolean logic
- Phrase and proximity search
- Wildcard and prefix search
- Fuzzy matching for spelling variations
- Range queries and compound queries
Distributed, scalable, and high-performance search
CrateDB’s shared-nothing, distributed architecture allows search workloads to scale horizontally without performance degradation.
- Massive parallel indexing
- Distributed execution of full-text queries
- Balanced resource usage across all cluster nodes
Real-time analytics meets full-text search
Search and analytics don’t have to live in separate systems. CrateDB unifies both:
- High-throughput ingestion
- Real-time indexing
- SQL-based aggregations on search results
- Hybrid search (full-text + structured filters + vector search)
Resilient search with high availability
Keep search operations online at all times. CrateDB provides:
- Built-in replication
- Automatic failover
- Self-balancing cluster behavior
Simple integration with SQL and PostgreSQL ecosystem
CrateDB integrates seamlessly with existing systems thanks to:
- Native SQL
- PostgreSQL Wire Protocol
- Broad driver and language support
- HTTP endpoint
- Extensible plugin ecosystem
Connect your applications, analytics tools, or AI pipelines and combine search with analytical queries to extract richer insights from your data.
View a sample list of integrations >
Unlocking the Power of Semantic Search
Unlocking the Power of Semantic Search
Unlock the power of semantic search by watching this insightful webinar where Simon Prickett, Senior Product Evangelist CrateDB, highlights CrateDB's ability to integrate various data types (text, geospatial, vectors) for hybrid search using SQL, enabling faster, more contextually relevant results.
CrateDB workshop
CrateDB workshop
Timestamp: 25:43–39:27
CrateDB: A database or a search engine? Both!
CrateDB: A database or a search engine? Both!
Curious to learn more?
Additional resources
Documentation
Documentation
Blog
Workshop
FAQ
Full-text search indexes the entire textual content of documents, articles, or records, making all text fields searchable. This allows users to perform comprehensive searches across large volumes of unstructured or semi-structured text. CrateDB enhances this functionality with its high-performance, scalable architecture, ensuring rapid indexing and efficient retrieval even with substantial data volumes.
Full-text search techniques include boolean logic, wildcard searches, phrase searches, proximity searches, and fuzzy searches, which enhance the depth and accuracy of search results. CrateDB leverages these capabilities to provide robust full-text search functionality with real-time indexing and efficient query performance.
Full-text search databases allow users to search for words throughout the entire text body of documents, enhancing research depth and precision. Examples of databases optimized for full-text search include CrateDB, MongoDB, Elasticsearch, and PostgreSQL. CrateDB's distributed shared-nothing architecture is built on top of Apache Lucene and excels in performing efficient distributed full-text searches across diverse datasets, ensuring scalability to meet growing search demands.