Full-text Data
CrateDB efficiently indexes, stores, and retrieves high volumes of unstructured, semi-structured, and structured data, enabling advanced and fast searches across diverse data types, including full-text data.
Full-text search with SQL
Hyper-fast. Queries in milliseconds.
/* The table stores data about netflix movies and series. The composite fulltext
* index is defined on two columns with the analyzer based on English language. */
CREATE TABLE "netflix_catalog" (
"show_id" TEXT PRIMARY KEY,
"type" TEXT,
"title" TEXT,
"director" TEXT,
"cast" ARRAY(TEXT),
"country" TEXT,
"date_added" TIMESTAMP,
"release_year" TEXT,
"rating" TEXT,
"duration" TEXT,
"listed_in" ARRAY(TEXT),
"description" TEXT,
INDEX title_director_description_ft using FULLTEXT (title, director, description)
WITH (analyzer = 'english')
);
/* The query retuns data from the "netflix_catalog" table, specifically searching
* for movies where the 'title', 'director', or 'description' fields best match the
* term 'Friday', with a higher emphasis on matches in the title, and orders the results
* by their relevance score in descending order. */
SELECT show_id, title, director, country, release_year, rating, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'title^2 Friday') USING best_fields
AND type='Movie'
ORDER BY _score DESC;
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| show_id | title | director | country | release_year | rating | _score |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
| s1674 | Black Friday | Anurag Kashyap | India | 2004 | TV-MA | 5.6455536 |
| s6805 | Friday the 13th | Marcus Nispel | United States | 2009 | R | 3.226806 |
| s1038 | Tuesdays & Fridays | Taranveer Singh | India | 2021 | TV-14 | 3.1089375 |
| s7494 | Monster High: Friday Night Frights | Dustin McKenzie | United States | 2013 | TV-Y7 | 3.0620003 |
| s3226 | Little Singham: Mahabali | Prakash Satam | NULL | 2019 | TV-Y7 | 3.002901 |
| s8233 | The Bye Bye Man | Stacy Title | United States, China | 2017 | PG-13 | 2.9638999 |
| s8225 | The Brawler | Ken Kushner | United States | 2019 | TV-MA | 2.8108454 |
+---------+------------------------------------+-------------------+----------------------+--------------+--------+-----------+
/* This query searches for records where the combined fields of title, director, and
* description match the phrase 'mission impossible', with a different weights given to
* title and description. The results are ordered by their relevance score in desc order. */
SELECT show_id, type, title, director, country, rating, description, _score
FROM "netflix_catalog"
WHERE MATCH((title_director_description_ft, title 1.5, description 1), 'mission impossible')
ORDER BY _score DESC;
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | type | title | director | country | rating | description | _score |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1000 | Movie | Stowaway | Joe Penna | Germany, United States | TV-MA | A three-person crew on a mission to Mars faces an impossible choice when an unplanned passenger jeopardizes the lives of everyone on board. | 5.0469 |
| s1768 | Movie | The Impossible | J.A. Bayona | Spain, Thailand, United States | PG-13 | Tracking one family's harrowing experiences, this gripping drama depicts the chaos generated by the massive 2004 tsunami in Southeast Asia. | 3.1496503 |
| s7873 | TV Show | Rishta.com | NULL | India | TV-14 | Partners at an Indian matrimonial agency face endlessly challenging and often impossible demands as they help clients make the perfect match. | 3.131972 |
| s3242 | TV Show | Nailed It! Holiday! | NULL | United States | TV-PG | It's the "Nailed It!" holiday special you've been waiting for, with missing ingredients, impossible asks and desserts that look delightfully sad. | 3.1265335 |
| s1032 | Movie | Into the Beat | Stefan Westerwelle | Germany | TV-14 | A teen ballerina discovers hip-hop by chance and is faced with an impossible choice: Does she follow her parents' footsteps... or her newfound passion? | 2.9678147 |
| s1583 | Movie | Triple 9 | John Hillcoat | United States | R | A group of dirty Atlanta cops blackmailed by the Russian mob plan a seemingly impossible heist that sets off an explosive chain reaction of violence. | 2.9678147 |
| s848 | TV Show | Mad for Each Other | NULL | NULL | TV-MA | Bothered to realize they are next-door neighbors and share a psychiatrist, a man and a woman find it's impossible to stay out of each other's way. | 2.9385366 |
+---------+----------+------------------------+--------------------+---------------------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
/* The query searches records where the combined title, director, and description
* fields phrase-prefix match 'award winning' with a relevance score of at least 10,
* and orders the results by their relevance score in descending order. */
SELECT show_id, type, title, country, rating, duration, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'award winning') USING phrase_prefix
AND _score >= 10
ORDER BY _score DESC;
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
| show_id | type | title | country | rating | duration | _score |
| ------- | -------- | ----------------------------------------| --------------- | ------ | ---------- | --------- |
| s2423 | TV Show | Queer Eye | United States | TV-14 | 5 Seasons | 56.835224 |
| s5930 | Movie | Print the Legend | United States | TV-14 | 100 min | 56.835224 |
| s849 | TV Show | Master of None | United States | TV-MA | 3 Seasons | 54.720444 |
| s7369 | TV Show | Mad Men | United States | TV-14 | 7 Seasons | 53.722008 |
| s4556 | Movie | The Meaning of Monty Python | United Kingdom | TV-MA | 60 min | 51.933487 |
| s6342 | TV Show | Blue Planet II | United Kingdom | TV-G | 1 Season | 51.933487 |
| s4162 | Movie | Surga Yang Tak Dirindukan 2 | Indonesia | TV-14 | 119 min | 50.937637 |
| s5186 | TV Show | Alias Grace | Canada | TV-MA | 1 Season | 41.05056 |
| s4184 | TV Show | The World's Most Extraordinary Homes | United Kingdom | TV-G | 3 Seasons | 39.506954 |
| s6111 | Movie | Alejandro Sanz: What I Was Is What I Am | Spain | TV-MA | 102 min | 38.77893 |
+---------+----------+-----------------------------------------+-----------------+--------+------------+-----------+
/* This query uses the composite index to match the term 'scence fction'
* (with a likely typo in 'science fiction'), allowing for some fuzziness in matching,
* using the English analyzer, requiring at least one term to match. */
SELECT show_id, title, country, description, _score
FROM "netflix_catalog"
WHERE MATCH(title_director_description_ft, 'scence fction')
USING best_fields
WITH (
fuzziness=1,
analyzer='english',
minimum_should_match=1,
operator='or'
)
ORDER BY _score DESC;
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| show_id | title | country | description | _score |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| s1604 | Alien Worlds | United Kingdom | Applying the laws of life on Earth to the rest of the galaxy, this series blends science fact and fiction to imagine alien life on other planets. | 4.052918 |
| s2589 | Oh Yuck | Canada | Using a concoction of cartoons, comedy and live action, Dr. Yuck and his eccentric lab mates investigate the science behind the planet's ickiest things. | 3.8219523 |
| s7848 | Red vs. Blue | United States | This parody of first-person shooter games, military life and science-fiction films centers on a civil war fought in the middle of a desolate canyon. | 3.8219523 |
| s3054 | Border Security: America's Front Line | Canada | This reality series chronicles the behind-the-scenes action as U.S. Customs and Border Protection agents safeguard the nation's crossings and ports. | 3.5455647 |
| s5870 | Turbo FAST | United States | Join Turbo and his posse, the Fast Action Stunt Team. Filled with comedy, action and fun, this series amps it up to the extreme. | 2.695207 |
| s3574 | Our Planet - Behind The Scenes | United Kingdom | Years spent recording footage of creatures from every corner of the globe is bound to produce a bit of drama. Here's a behind-the-scenes look. | 2.5541513 |
| s8016 | Sid the Science Kid | United States | Armed with a healthy sense of humor and the help of his teacher, friends and family, curious kid Sid tackles questions youngsters have about science. | 2.5541513 |
| s4917 | Bill Nye: Science Guy | United States | The dynamic, bow-tied host behind the young adult science show faces climate-change skeptics as he demonstrates the importance of scientific evidence. | 2.49332 |
+---------+---------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
Full-text search and real-time indexing
Perform quick and accurate searches across various data types, including text, documents, and other unstructured data with robust full-text search capabilities and real-time indexing. CrateDB's fully distributed query engine is built on top of Apache Lucene®. To use full-text search, a full-text index with an analyzer must be created when setting up the column, allowing the text to be broken down into searchable tokens. Once this is done, full-text search queries can be run based on the index.
Support for complex querying
Perform complex search queries effortlessly using CrateDB's support for advanced techniques, including Boolean logic, wildcard searches, phrase searches, proximity searches, and fuzzy search capabilities. Experience precise and versatile querying options that accommodate variations, ensuring accurate results across diverse search requirements.
Scalable, high-performance, and distributed search engine
Scale your search operations effortlessly with CrateDB's high-performance and scalable architecture, ensuring rapid indexing and retrieval of search results even when dealing with high volumes of data. Its distributed shared-nothing architecture enables executing distributed full-text searches efficiently across diverse datasets and horizontal scaling to accommodate growing search requirements.
Real-time analytics and data ingestion
Utilize real-time data ingestion and analysis, enabling continuous updates and immediate availability of new data in search results.
High availability and fault tolerance
Ensure high availability of search functionalities with a fault-tolerant design. CrateDB’s built-in mechanisms for automatic failover and replication prevent downtime and ensure uninterrupted search operations.
Easy and seamless integration
Integrate with various programming languages, frameworks, and tools with native SQL and the PostgreSQL Wire Protocol. Seamlessly integrate CrateDB with various analytical tools, merging search capabilities with analytical processing for comprehensive insights into search results and data patterns. Customize and expand search functionalities using CrateDB's extensible plugin ecosystem, integrating additional modules or third-party plugins as per specific search requirements.
View a sample list of integrations >
Unlocking the Power of Semantic Search
Unlocking the Power of Semantic Search
Unlock the power of semantic search by watching this insightful webinar where Simon Prickett, Senior Product Evangelist CrateDB, highlights CrateDB's ability to integrate various data types (text, geospatial, vectors) for hybrid search using SQL, enabling faster, more contextually relevant results.
CrateDB workshop
CrateDB workshop
Timestamp: 25:43–39:27
CrateDB: A database or a search engine? Both!
CrateDB: A database or a search engine? Both!
Curious to learn more?
Other resources on full-text search
Documentation
Documentation
Blog
Workshop
FAQ
Full-text search indexes the entire textual content of documents, articles, or records, making all text fields searchable. This allows users to perform comprehensive searches across large volumes of unstructured or semi-structured text. CrateDB enhances this functionality with its high-performance, scalable architecture, ensuring rapid indexing and efficient retrieval even with substantial data volumes.
Full-text search techniques include boolean logic, wildcard searches, phrase searches, proximity searches, and fuzzy searches, which enhance the depth and accuracy of search results. CrateDB leverages these capabilities to provide robust full-text search functionality with real-time indexing and efficient query performance.
Full-text search databases allow users to search for words throughout the entire text body of documents, enhancing research depth and precision. Examples of databases optimized for full-text search include CrateDB, MongoDB, Elasticsearch, and PostgreSQL. CrateDB's distributed shared-nothing architecture is built on top of Apache Lucene and excels in performing efficient distributed full-text searches across diverse datasets, ensuring scalability to meet growing search demands.