Live Stream: Turbocharge your aggregations, search & AI models & get real-time insights

Register now
Skip to content
Resources > Academy > CrateDB Fundamentals

Querying the Sample Data

Login or sign up for free CrateDB Academy: CrateDB Fundamentals

Now you've created your own CrateDB cluster and loaded the course data. It's time to run some simple SQL queries and play around with the data on your own.

Here in the CrateDB Cloud Console I've got a query that I'll paste in and this is going to query the three_eleven_calls table. So that table of reports from citizens about issues with infrastructure in the city and what we're going to do is ask it to summarise the number of calls where the type features the word 'pothole' by day of week. In this table, the day of week is in the createddayofweek field and the weekdays start with Sunday, which is day one.

I'll run the query here and we can see that the most potholes 1440 are reported on Tuesdays and the least are reported on days one and seven, Saturday and Sunday over the weekend. Which I guess makes sense as people aren't travelling so much on the weekend because they're not going to work.

Next, let's try a query that uses our taxis and taxi rides data. I want to know which taxi operated the taxi ride or the trip that had the highest total cost. So here I'm selecting some fields from the taxis table and I want the vehicleid from the taxis table to match the vehicleid from the taxi_rides table where the totalcost field has the highest value and for this we use CrateDB's max_by. This returns the value of its first parameter, so vehicleid at the point at which the 2nd parameter, totalcost, was the highest.

When I run this query, we'll see that Vehicle 2417 performed the trip with the highest overall total cost, and it was a white Toyota Camry. Finally, let's try a query that uses our community_areas table. Recall that in the community_areas table, we have details of each of the 77 neighbourhoods that make up the city of Chicago, including their boundaries, population, and historical information.

There are two airports in Chicago, O'Hare and Midway. Imagine that an airline needs to move an aircraft from one of these airports to another, and it follows a path as shown on the map to the right. What we're going to do now is paste a query into the console here and ask CrateDB which community areas with a population of more than 30,000 people does this flight pass over? So here I'm selecting the name and then the population which is inside an object in the community_areas table. So we're using details['population'] to access that. And then what I want to do is match records whose boundaries field which is a geo polygon of the boundaries of the neighbourhood intersect with the flight. And the flight here is represented by a line string in Well Known Text format and that contains all of the lat long points of the flight. And then finally, we want population to be greater than 30,000 and I've asked CrateDB to order the results set by population.

So what we're going to get when we run this query is the neighbourhoods or community areas that this flight passes over that have more than 30,000 people ordered by population descending. So the community with the greatest population first. Let's run the query now and we can see that Uptown is an area that this flight passes over. It has the highest population and down here Greater Grand Crossing has the lowest population of those with more than 30,000.

I would encourage you to go and play with the data set yourself and refer to the CrateDB documentation whilst doing so. It's good to get familiar with this data, run some queries and explore it for yourself.

Take this course for free