It’s time to get up and running with your first CrateDB Cluster and to load the dataset for this course into your database.
You’ll need to choose between running your cluster in the cloud, or locally with Docker. The cloud is the preferred option for this course. Instructions for both options are provided below.
Once you have a cluster up and running, enter the commands in the “Loading the Course Dataset” section to create the tables and load the data that you’ll need for the course.
If you need help, don’t forget that you can reach out to us by posting in the Academy forum at the online CrateDB Community.
Option 1: Create a Cluster in the Cloud
To create a cluster in the cloud, navigate to https://console.cratedb.cloud/ and follow these instructions:
- Create a new account on CrateDB cloud. You can use your existing Google, Microsoft or GitHub IDs, or choose your own username and password.
- Having logged in, you’ll be asked to provide a name for your organization.
- The next step is to deploy a cluster. Choose a cloud provider and region then select the CRFREE option.
- Give your cluster a name and press “Deploy”.
- Make sure to note down or download your connection details: the username, password, host and port. Don’t share these!
- A green icon appears on screen when your cluster is fully deployed.
- Once you see this, click “Console” from the menu to the left of the screen to launch the console. You’ll need this to load the dataset for the course.
Option 2: Create a Cluster with Docker
To create a cluster locally with Docker, you’ll need to first install Docker Desktop. Once Docker is running on your machine, you can pull the CrateDB container and start a cluster with the following command:
docker run --publish 4200:4200 --publish 5432:5432 crate -Cdiscovery.type=single-node
- Verify that your cluster is running by pointing your browser at http://localhost:4200.
- You should see the CrateDB Admin User Interface.
- Click on the console icon in the menu on the left to launch the console. You’ll need this to load the dataset for the course.
Loading the Dataset
The data for this course is derived from open data about the City of Chicago. To participate in the hands-on segments of the course and to be successful in the final exam, you’ll need to load the data into your cluster and run queries against it.
You’ll create a table in CrateDB to store data from each file in the dataset, then load the data into it. This process is the same whether your CrateDB cluster is in the cloud or running locally with Docker.
Community Areas
The community_areas table stores data about the 77 community areas that make up Chicago. Paste this SQL command into the CrateDB console to create the table:
CREATE TABLE community_areas (
areanumber INTEGER PRIMARY KEY,
name TEXT,
details OBJECT(STRICT) AS (
description TEXT INDEX USING fulltext,
population BIGINT
),
boundaries GEO_SHAPE INDEX USING geohash WITH (PRECISION='1m', DISTANCE_ERROR_PCT=0.025)
);
When you submit (execute) the query, CrateDB should respond with CREATE TABLE OK.
Now, load the data into the table by pasting this SQL command into the console and running it:
COPY community_areas
FROM 'https://github.com/crate/cratedb-datasets/raw/main/academy/chicago-data/chicago_community_areas.json'
RETURN SUMMARY;
Check the output of the command to ensure that 77 records were loaded with 0 errors.
Weather Data
The beach_weather table stores data from Chicago lake shore weather stations. Create the table with this SQL command:
CREATE TABLE beach_weather (
measurementid TEXT PRIMARY KEY,
station TEXT,
ts TIMESTAMP WITHOUT TIME ZONE,
airtemp DOUBLE PRECISION,
wetbulbtemp DOUBLE PRECISION,
humidity SMALLINT,
rainintensity DOUBLE PRECISION,
intervalrain DOUBLE PRECISION,
totalrain DOUBLE PRECISION,
precipitationtype SMALLINT,
winddirection SMALLINT,
windspeed DOUBLE PRECISION,
maxwindspeed DOUBLE PRECISION,
pressure DOUBLE PRECISION,
solarradiation SMALLINT,
batterylife DOUBLE PRECISION
);
Then load the data like this:
COPY beach_weather
FROM 'https://media.githubusercontent.com/media/crate/cratedb-datasets/main/academy/chicago-data/beach_weather_station_data.csv'
WITH (empty_string_as_null=true)
RETURN SUMMARY;
Check the output of the command to verify that 174098 records were loaded with 0 errors.
311 Call Data
The three_eleven_calls table stores details about service requests placed with the Chicago 311 non-emergency issue reporting service.
Create the table for this data using this SQL command:
CREATE TABLE three_eleven_calls (
srnumber TEXT,
srtype TEXT,
srshortcode TEXT,
createddept TEXT,
ownerdept TEXT,
status TEXT,
origin TEXT,
createddate TIMESTAMP,
lastmodifieddate TIMESTAMP,
closeddate TIMESTAMP,
week GENERATED ALWAYS AS date_trunc('week', createddate),
isduplicate BOOLEAN,
createdhour SMALLINT,
createddayofweek SMALLINT,
createdmonth SMALLINT,
locationdetails OBJECT(DYNAMIC) AS (
streetaddress TEXT,
city TEXT,
state TEXT,
zipcode TEXT,
streetnumber TEXT,
streetdirection TEXT,
streetname TEXT,
streettype TEXT,
communityarea SMALLINT,
ward SMALLINT,
policesector SMALLINT,
policedistrict SMALLINT,
policebeat SMALLINT,
precinct SMALLINT,
latitude DOUBLE PRECISION,
longitude DOUBLE PRECISION,
location GEO_POINT
)
) PARTITIONED BY (week);
Then load the data like this:
COPY three_eleven_calls
FROM 'https://github.com/crate/cratedb-datasets/raw/main/academy/chicago-data/311_records_apr_2024.json.gz'
WITH (compression='gzip')
RETURN SUMMARY;
Check the output of the command to verify that 174092 records were loaded with 0 errors.
Taxi Fleet Data
The taxis table stores details of the vehicles that make up Chicago’s taxi fleet. Create the table with this SQL command:
CREATE TABLE taxis (
vehicleid INTEGER,
status TEXT,
make TEXT,
model TEXT,
modelyear INTEGER,
color TEXT,
fuel TEXT,
wheelchairaccessible BOOLEAN,
operator TEXT,
zipcode TEXT,
affiliation TEXT,
medallion TEXT
);
Then populate it with:
COPY taxis
FROM 'https://media.githubusercontent.com/media/crate/cratedb-datasets/main/academy/chicago-data/taxi_details.csv'
RETURN SUMMARY;
Check the output of the command to verify that 2655 records were loaded with 0 errors.
Taxi Rides Data
The taxi_rides table stores details of taxi trips taken within Chicago in April 2024. Create the table like this:
CREATE TABLE taxi_rides (
tripid TEXT,
vehicleid INTEGER,
starttime TIMESTAMP,
endtime TIMESTAMP,
duration INTEGER,
miles DOUBLE PRECISION,
pickupcommunityarea SMALLINT,
dropoffcommunityarea SMALLINT,
fare DOUBLE PRECISION,
tips DOUBLE PRECISION,
tolls DOUBLE PRECISION,
extras DOUBLE PRECISION,
totalcost DOUBLE PRECISION,
paymenttype TEXT,
company TEXT,
pickupcentroidlatitude DOUBLE PRECISION,
pickupcentroidlongitude DOUBLE PRECISION,
pickupcentroidlocation GEO_POINT,
dropoffcentroidlatitude DOUBLE PRECISION,
dropoffcentroidlongitude DOUBLE PRECISION,
dropoffcentroidlocation GEO_POINT
);
Then populate it with:
COPY taxi_rides
FROM 'https://github.com/crate/cratedb-datasets/raw/main/academy/chicago-data/taxi_rides_apr_2024.json.gz'
WITH (compression='gzip')
RETURN SUMMARY;
Check the output of the command to verify that 245117 records were loaded with 0 errors.
Optional: Install the Crash CLI
The Crash CLI runs on your local machine. It provides another way to connect to your CrateDB cluster and run SQL commands. You don’t need to use it for this course, but if you’d like to try it out then follow the installation instructions here.
You’re Ready!
Congratulations! You’ve created your own CrateDB cluster and loaded the course dataset. You’re ready to learn more about CrateDB.