Skip to content
Explore

Try CrateDB Live

Scenario: Real-time IoT Analytics

Sensor Dataset – 260k readings, 5 metrics

In this step, we load a sample IoT dataset into CrateDB using the COPY FROM statement.

The dataset is provided as a JSON file hosted in S3:
https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json

 It is derived from a combination of several publicly available climate datasets published by Copernicus Climate Data Store.

CrateDB can ingest data directly from HTTP(S) endpoints, so there is no need to download the file locally.

 

1. Create the target table:

Before importing the data, create a table that reflects the structure of the dataset.

CREATE TABLE IF NOT EXISTS demo.climate_data (
   measurement_time TIMESTAMP WITHOUT TIME ZONE,
   geo_location GEO_POINT,
   data OBJECT(DYNAMIC) AS (
      temperature DOUBLE PRECISION,
      pressure DOUBLE PRECISION,
      u10 DOUBLE PRECISION,
      v10 DOUBLE PRECISION,
      latitude DOUBLE PRECISION,
      longitude DOUBLE PRECISION
   )
);

2. Understanding the schema:

  • measurement_time stores the time of the observation
  • geo_location is a GEO_POINT used for geospatial queries
  • data is a dynamic object that groups all sensor readings:
    • temperature, pressure
    • wind components (u10, v10)
    • raw latitude and longitude values

Using an OBJECT(DYNAMIC)column allows you to group related measurements while still querying them using standard SQL.

3. Import the data using COPY FROM:

Run the following statement to load the data:

COPY demo.climate_data
FROM 'https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json'
WITH (
    format = 'json',
    header = true
) return summary;

Note that if you are using a Crate Cloud free tier account, this will take a few seconds, as we use a shared environment with limited resources. For information on the performance aspect of loading lots of data, see here.

4. Validate the data:

Check that the data has been loaded:

SELECT COUNT(*) FROM demo.climate_data;

Preview a few rows:

SELECT * FROM demo.climate_data LIMIT 10; 

You can also query nested fields directly:

SELECT
    measurement_time,
    data['temperature'],
    data['pressure']
FROM demo.climate_data
LIMIT 10;

Notes and considerations

  • CrateDB distributes ingestion across nodes automatically in a cluster
  • The OBJECT(DYNAMIC)column allows additional fields to be added later without schema changes
  • Storing both geo_location and raw latitude/longitude enables flexible geospatial queries

Summary

In this step, you:

  • Created a table with a structured + semi-structured schema
  • Imported JSON data directly from an external source
  • Verified that both top-level and nested fields were correctly ingested

This dataset will be used in the next sections to explore querying, analytics, and geospatial operations in CrateDB.