Try CrateDB Hands-On: Import Data

Sensor Dataset – 260k readings, 5 metrics

In this step, we load the sample weather dataset into three tables in CrateDB using the COPY FROM statements.

The dataset is provided as JSON files hosted in S3:

https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json
https://guided-path.s3.us-east-1.amazonaws.com/geo_points.json
https://guided-path.s3.us-east-1.amazonaws.com/german_regions.json

It is derived from a combination of several publicly available climate datasets published by Copernicus Climate Data Store.

CrateDB can ingest data directly from HTTP(S) endpoints, so there is no need to download the file locally.

1. Create the target tables:

Before importing the data, create the tables that reflect the structure of the dataset.

The first table is climate_data. It stores around 264,990 raw sensor readings from multiple locations:

CREATE TABLE IF NOT EXISTS demo.climate_data (
   measurement_time TIMESTAMP WITHOUT TIME ZONE,
   geo_location GEO_POINT,
   data OBJECT(DYNAMIC) AS (
      temperature DOUBLE PRECISION,
      pressure DOUBLE PRECISION,
      u10 DOUBLE PRECISION,
      v10 DOUBLE PRECISION,
      latitude DOUBLE PRECISION,
      longitude DOUBLE PRECISION
   )
);

CREATE TABLE IF NOT EXISTS  demo.geo_points (
    latitude     DOUBLE PRECISION,
    longitude    DOUBLE PRECISION,
    geo_location GEO_POINT,
    nearest_town TEXT,
    PRIMARY KEY (latitude, longitude)
);

CREATE TABLE IF NOT EXISTS demo.german_regions (
    region_name        TEXT PRIMARY KEY,
    geo_coords         GEO_SHAPE,
    tourism_info       TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
    transportation     TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
    economics          TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
    introduced_species TEXT INDEX USING FULLTEXT WITH (analyzer = 'english'),
    embedding          FLOAT_VECTOR(1536)
);

For each region we store:

region_name, such as 'Berlin'
geo_coords, a shape that defines the outline of region in around 100 points.
tourism_info, transportation, economics and introduced_species. These each contain around 2,000 words of text on the subject in question for each region. Used for text search.
embedding, a vector created by looking at the above fields. Used for vector search.

2. Understanding the schema.

The schema is a hierarchy:

Germany is divided into german_regions.
Each region contains 1 or more geo_points. Note we don't have a traditional foreign key relationship. We use the WITHIN operator instead.
The lowest level is climate_data. A row is a reading, for a geo_point, for a specific time.

3. Import the data using COPY FROM:

Run the following statements to load the data:

Notes and considerations

CrateDB distributes ingestion across nodes automatically in a cluster
The OBJECT(DYNAMIC) column allows additional fields to be added later without schema changes
Storing both geo_location and raw latitude/longitude enables flexible geospatial queries

Summary

In this step, you:

Created tables with structured + semi-structured schemas
Imported JSON data directly from an external source
Verified that both top-level and nested fields were correctly ingested

This dataset will be used in the next sections to explore querying, analytics, and geospatial operations in CrateDB.

Try CrateDB Live: Import Data

Sensor Dataset – 260k readings, 5 metrics

1. Create the target tables:

2. Understanding the schema.

3. Import the data using COPY FROM:

4. Validate the data:

Notes and considerations

Summary

Product

Developers

Company

Community