This exercise requires a CrateDB cluster with the course dataset loaded. If you didn't create your cluster earlier in the course, complete these steps before going further.
It’s your turn to try out CrateDB’s full-text search capabilities using the Chicago community areas data.
Full-Text Queries
Each community area has a text field containing a few paragraphs of text describing the area’s features and history. In our schema, this field is named “description” and it’s inside the “details” object.
Try running the following query to view the description for each area:
SELECT name, details['description'] AS description from community_areas;
When the community_areas table was created, we told CrateDB to perform full-text indexing on it:
…
details OBJECT(STRICT) AS (
description TEXT INDEX USING fulltext WITH (analyzer='english'),
population BIGINT
), …
Let’s use the MATCH predicate to try some queries against the index that was created. Recall that, when using MATCH, we can select _score to retrieve a score for each result. This score is a measure of the quality of the match compared to other matching rows.
Execute this query to search for community areas containing either the term “railroad” or the term “tracks”:
SELECT name, _score, details['description'] AS description
FROM community_areas
WHERE MATCH(details['description'], 'railroad tracks')
ORDER BY _score DESC;
Take a moment to look at the results returned by this query. Note that Greater Grand Crossing and Burnside are the community areas with the highest scores.
What if we wanted to search for the specific phrase “railroad tracks”, rather than either of the terms “railroad” or “track”? Try this modified query which does just that:
SELECT name, _score, details['description'] AS description
FROM community_areas
WHERE MATCH(details['description'], 'railroad tracks') USING phrase
ORDER BY _score DESC;
Examine the results returned by this query. How do they differ from those returned by the previous query?
Before moving on, scroll down through the results to find the result for Greater Grand Crossing. The description for this community area contains the phrase “railroad company’s tracks”. Let’s try a third version of this query that considers this a match for the phrase “railroad tracks”.
SELECT name, _score, details['description'] AS description
FROM community_areas
WHERE MATCH(details['description'], 'railroad tracks') USING phrase WITH (slop=1)
ORDER BY _score DESC;
Adding a “slop” of 1 boosts Greater Grand Crossing’s score. Slop is a measure of proximity when searching for phrases. Specifying a slop of 1 tells CrateDB that another word can exist between “railroad” and “tracks”, and that this should still be considered a phrase match.
Mixing Full-Text Queries with Other Data Types
Because full text queries are written as SQL queries in CrateDB, we can mix in data stored in other formats in other columns in the table, perform aggregations and even joins with other tables.
Try executing this query which returns information about community areas whose descriptions match both of the terms “railroad” and “historic” in any order, and which have a population of over 50,000 people:
SELECT name, _score, details['population'] AS population, details['description'] AS description
FROM community_areas
WHERE MATCH(details['description'], 'railroad historic') using best_fields with (operator='and')
AND details['population'] > 50000
ORDER BY _score DESC;
Try Your Own Queries
Take a moment to try some queries of your own. Refer to the documentation for the MATCH predicate to understand the full range of options available.