Live Stream on Jan 23rd: Unlocking Real Time Insights in the Renewable Energy Sector with CrateDB

Register now
Skip to content
Resources > Academy > CrateDB Fundamentals

Hands-on: Aggregating and Grouping Data

Login or sign up for free CrateDB Academy: CrateDB Fundamentals

This exercise requires a CrateDB cluster with the course dataset loaded.  If you didn't create your cluster earlier in the course, complete these steps before going further.


Now you’ve seen how aggregations and grouping work in CrateDB, here’s an opportunity to put that learning into practice…

Basic Aggregations

We’ll begin with some basic aggregations using the 311 call data. This query returns the number of calls relating to graffiti removal in Englewood (community area 68):

SELECT count(*) AS num_graffiti_complaints
FROM three_eleven_calls
WHERE srtype = 'Graffiti Removal Request'
AND locationdetails['communityarea'] = 68;

This query returns the average time taken to complete a Graffiti Removal Request in Englewood:

SELECT avg(age(closeddate, createddate)) AS avg_duration 
FROM three_eleven_calls
WHERE status = 'Completed'
AND srtype='Graffiti Removal Request'
AND isduplicate=false
AND locationdetails['communityarea'] = 68;

Finally, let’s switch to our taxis data. The column modelyear stores the model year of each taxi in the fleet. The Ford Crown Victoria is a classic American taxi cab. Run this query to find the model year for the oldest Crown Victoria in the fleet:

SELECT MIN(modelyear) FROM taxis 
WHERE make='FORD' AND model='CROWN VICTORIA';

Aggregations with Grouping

Now let’s try some queries using GROUP BY clauses. This query summarizes the data in the 311 calls table, returning the number of calls made for each type of issue:

SELECT srtype, count(*) AS num_calls
FROM three_eleven_calls
GROUP BY srtype
ORDER BY num_calls DESC;

Run this query to discover the most common issues affecting citizens of Chicago.

We can also include other clauses in these queries. Here, we’re asking for a breakdown of Chicago’s taxi fleet minus all the Toyota vehicles:

SELECT make, model, count(*) as how_many 
FROM taxis WHERE make != 'TOYOTA'
GROUP BY make, model ORDER BY how_many DESC;

Try Some Queries of Your Own

Take some time to try out some queries of your own using any of the tables in the data set. For example, you might want to calculate the average population for Chicago’s community areas, or look at which hour of the day most 311 calls are placed.

You should reference the CrateDB documentation for aggregate functions and aggregations and grouping when writing your own queries.

Take this course for free