Try CrateDB Live
- 1. Run CrateDB
- 2. Choose Scenario
- 3. Get Ready
- 4. Import Data
- 5. Explore Queries
- 6. More Queries
- 7. Connect
- 8. Next Steps
Sensor Dataset – 260k readings, 5 metrics
In this step, we load a sample IoT dataset into CrateDB using the COPY FROM statement.
The dataset is provided as a JSON file hosted in S3:https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json
It is derived from a combination of several publicly available climate datasets published by Copernicus Climate Data Store.
CrateDB can ingest data directly from HTTP(S) endpoints, so there is no need to download the file locally.
1. Create the target table:
Before importing the data, create a table that reflects the structure of the dataset.
CREATE TABLE IF NOT EXISTS demo.climate_data ( measurement_time TIMESTAMP WITHOUT TIME ZONE, geo_location GEO_POINT, data OBJECT(DYNAMIC) AS ( temperature DOUBLE PRECISION, pressure DOUBLE PRECISION, u10 DOUBLE PRECISION, v10 DOUBLE PRECISION, latitude DOUBLE PRECISION, longitude DOUBLE PRECISION ) );
2. Understanding the schema:
- measurement_time stores the time of the observation
geo_locationis a GEO_POINT used for geospatial queriesdatais a dynamic object that groups all sensor readings:- temperature, pressure
- wind components (u10, v10)
- raw latitude and longitude values
Using an OBJECT(DYNAMIC)column allows you to group related measurements while still querying them using standard SQL.
3. Import the data using COPY FROM:
Run the following statement to load the data:
COPY demo.climate_data FROM 'https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json' WITH ( format = 'json', header = true ) return summary;
Note that if you are using a Crate Cloud free tier account, this will take a few seconds, as we use a shared environment with limited resources. For information on the performance aspect of loading lots of data, see here.
4. Validate the data:
Check that the data has been loaded:
SELECT COUNT(*) FROM demo.climate_data;
Preview a few rows:
SELECT * FROM demo.climate_data LIMIT 10;
You can also query nested fields directly:
SELECT measurement_time, data['temperature'], data['pressure'] FROM demo.climate_data LIMIT 10;
Notes and considerations
- CrateDB distributes ingestion across nodes automatically in a cluster
- The
OBJECT(DYNAMIC)column allows additional fields to be added later without schema changes - Storing both geo_location and raw latitude/longitude enables flexible geospatial queries
Summary
In this step, you:
- Created a table with a structured + semi-structured schema
- Imported JSON data directly from an external source
- Verified that both top-level and nested fields were correctly ingested
This dataset will be used in the next sections to explore querying, analytics, and geospatial operations in CrateDB.