Try CrateDB Live
- 1. Run CrateDB
- 2. Choose Scenario
- 3. Get Ready
- 4. Import Data
- 5. Explore Queries
- 6. More Queries
- 7. Connect
- 8. Next Steps
Dataset
In this step, we load a sample IoT dataset into CrateDB using the COPY FROM statement.
The dataset is provided as a JSON file hosted:https://cdn.crate.io/downloads/datasets/cratedb-datasets/academy/chicago-data/chicago_community_areas_with_vectors.json
CrateDB can ingest data directly from HTTP(S) endpoints, so there is no need to download the file locally.
1. Create the target table:
Before importing the data, create a table that reflects the structure of the dataset.
CREATE TABLE demo.community_areas ( areanumber INTEGER PRIMARY KEY, name TEXT, details OBJECT(DYNAMIC) AS ( description TEXT INDEX USING fulltext WITH (analyzer = 'english'), description_vec FLOAT_VECTOR(2048), population BIGINT ), boundaries GEO_SHAPE );
2. Understanding the schema:
areanumberis the unique identifier of each area and therefore our primary key.nameis the human-readable name of the area.detailsis a dynamic object that groups additional attributes.
descriptionis a text field elaborating on the particularities of the respective community area. Text columns are indexed by default using a plain index. We need to opt-in to a full-text index by enabling it explicitly.description_vecis a 2048-dimensional vector generated from description. Vectors have been pre-computed during generation of the data set.
boundariesis aGEO_SHAPErepresenting the border of the area.
Using an OBJECT(DYNAMIC) column allows you to group related measurements while still querying them using standard SQL.
3. Import the data using COPY FROM:
Run the following statement to load the data:
COPY demo.community_areas FROM 'https://cdn.crate.io/downloads/datasets/cratedb-datasets/academy/chicago-data/chicago_community_areas_with_vectors.json' RETURN SUMMARY;
The success_count column of the returned result will indicate the number of imported rows.
4. Validate the data:
Check that the data has been loaded:
SELECT COUNT(*) FROM demo.community_areas;
Preview a few rows:
SELECT * FROM demo.community_areas LIMIT 10;
You can also query nested fields directly:
SELECT areanumber, name, details['description'], details['description_vec'] FROM demo.community_areas LIMIT 10;
Notes and considerations
- CrateDB distributes ingestion across nodes automatically in a cluster
- The
OBJECT(DYNAMIC)column allows additional fields to be added later without upfront schema changes - Even without specifying indexes, all columns are indexed by default
Summary
In this step, you:
- Created a table with a structured + semi-structured schema
- Imported JSON data directly from an external source
- Verified that both top-level and nested fields were correctly ingested
This dataset will be used in the next sections to explore queries and search in CrateDB.