Live Stream on Jan 23rd: Unlocking Real Time Insights in the Renewable Energy Sector with CrateDB

Register now
Skip to content
Resources > Academy > CrateDB Fundamentals

Hands-on: Document Data

Login or sign up for free CrateDB Academy: CrateDB Fundamentals

This exercise requires a CrateDB cluster with the course dataset loaded.  If you didn't create your cluster earlier in the course, complete these steps before going further.


In this hands-on, you’ll add a new table to your CrateDB cluster, import some data into it and try out some queries that return data from objects and arrays.

Load Library Data

We’re going to add data about Chicago’s public libraries. First, open your cluster’s console and run the following SQL statement that creates a new table:

CREATE TABLE libraries (
   name TEXT,
   location OBJECT(DYNAMIC) AS (
       address TEXT,
       zipcode TEXT,
       communityarea INTEGER,
       position GEO_POINT
   ),
   hours ARRAY(TEXT),
   phone TEXT,
   website TEXT
);

Now, load the data by executing this SQL statement:

COPY libraries 
FROM 'https://github.com/crate/cratedb-datasets/raw/main/academy/chicago-data/chicago_libraries.json'
RETURN SUMMARY;

Check that CrateDB loaded 81 records with 0 errors.

Querying the Data - Objects

The West Town community area is area number 24 in our dataset. Run the following query to see the details of libraries located in this community:

SELECT * FROM libraries WHERE location['communityarea'] = 24;

CrateDB should return 3 results. Explore the results in the console, expanding the location and hours fields to view their contents.

When creating the table, we chose a dynamic object for the location column. This means that we can add further fields or nested objects inside location, and CrateDB will update the schema and index the values accordingly. You’ll learn more about this in the indexing module later in this course.

Let’s try adding an object inside location for one of the libraries. We’ll store details of some of the amenities provided.

Run the following query to update one of the libraries:

UPDATE libraries SET location['amenities'] = '{
   'wifi': true,
   'photocopier': true,
   'computers': 6,
   'meetingrooms': 3,
   'cafe': false,
   'restrooms': true,
   'accessible': true
}'::object where name = 'Walker';

Now, retrieve the updated record for Walker:

SELECT * FROM libraries WHERE name='Walker';

Explore the query result to see that Walker now contains a nested object inside location.

How do we select values from nested objects? This query returns the number of meeting rooms at Walker:

SELECT location['amenities']['meetingrooms'] as num_rooms 
FROM libraries
WHERE name='Walker';

Only Walker has this extra data… so, what happens if we ask CrateDB to retrieve this value for other libraries? Try this query:

SELECT name, location['amenities']['meetingrooms'] as num_rooms 
FROM libraries
WHERE name in ('Austin', 'Beverly', 'Walker');

CrateDB returns null values for those rows that don’t have the extra data.

Querying the Data - Arrays

The hours column contains an array of text values, with the first entry representing the library opening hours for Monday and the last one representing the hours for Sunday.

Try this query, which returns the opening hours for Avalon library:

SELECT hours FROM libraries WHERE name='Avalon';

Arrays are 1 indexed. To return just the hours for Tuesday, run this query:

SELECT hours[2] AS tuesday_hours FROM libraries WHERE name='Avalon';

You’ll see that Avalon is open between 10 and 5 on Tuesday.

Finally, try this query which returns all libraries that close at least one day of the week:

SELECT name, hours FROM libraries WHERE 'CLOSED' IN (hours);

 

Take this course for free