Try CrateDB Hands-On: More Queries | Guided Paths for Data Engineers

1. Choose Scenario
2. Get Ready
3. Run CrateDB
4. Import Data
5. Explore Queries
6. More Queries
7. Connect
8. Next Steps

SELECT
  min(longitude(geo_location)) AS min_longitude,
  max(longitude(geo_location)) AS max_longitude,
  min(latitude(geo_location)) AS min_latitude,
  max(latitude(geo_location)) AS max_latitude
FROM
  demo.climate_data;

SELECT
  d.measurement_time as time,
  latitude(d.geo_location) as latitude,
  longitude(d.geo_location) as longitude,
  data['temperature'] -273.15 as temperature,
  gp.nearest_town
FROM
     demo.climate_data d
    ,demo.german_regions r
    ,demo.geo_points gp
WHERE WITHIN(d.geo_location, r.geo_coords)
AND gp.geo_location = d.geo_location
AND r.region_name = 'Bayern' 
AND  d.measurement_time = (SELECT max(d2.measurement_time) FROM demo.climate_data d2);

SELECT
  geo_location,
  distance(geo_location, 'POINT(13.4 52.5)') AS metres,
  min(data['temperature']) -273.15 AS min_temp,
  max(data['temperature']) -273.15 AS max_temp
FROM demo.climate_data
GROUP BY geo_location, metres
ORDER BY metres ASC
LIMIT 50;

What's interesting here is that the same database is doing normal RDBMS, GeoSpatial stuff and full text search.

D. Vector Search

CrateDB supports semantic ("vector") search natively via the FLOAT_VECTOR datatype and the KNN_MATCH operator. Unlike full text search, which matches on words, vector search matches on meaning. Two phrases with no words in common can still score highly if they describe the same concept.

In our demo.german_regions table the embedding column is declared as FLOAT_VECTOR(1536). Each row's embedding is a 1536-dimensional vector generated by OpenAI's text-embedding-3-small model, summarising the four text fields (tourism_info, transportation, economics, introduced_species) for that region.

To search this column you need a small Python helper: cratedb_knn_search.py, and a requirements.txt. Download them and save it next to your .env.

Then install the required packages:

Region                          Score
------------------------------  -----
#1  Bayern                        0.4088
#2  Rheinland-Pfalz               0.3928
#3  Baden-Württemberg             0.3819
#4  Saarland                      0.3800
#5  Hessen                        0.3731

INSERT INTO demo.climate_data (measurement_time, geo_location, data) VALUES (123, [8.78831111111111, 54.903], {"longitude" = 8.78831111111111, "latitude" = 54.903, "temperature" = 16.868310546875023, "u10" = 4.472952365875244, "v10" = -1.3958832025527954, "pressure" = 102426.1015625});

This is using the existing schema. What if we want to add a new 'humidity' field to our dataset, do we need to recreate the table?

In CrateDB we can define object columns (i.e. JSON) as one of three different types:

OBJECT(STRICT) – The schema for the object is fixed
OBJECT(DYNAMIC) – The schema is dynamic, and new attributes can be added
OBJECT(IGNORED) – The JSON is stored ignoring the datatypes of the attributes and not indexing the field

See this page for more detailed information.

When you created this table, you set the data column to be DYNAMIC, so we can add additional attributes easily, and these attributes will be indexed.

Add a new key in the JSON object without the need to modify the schema.

INSERT INTO demo.climate_data (measurement_time, geo_location, data) VALUES (123, [8.78831111111111, 54.903], {"humidity" = 43.4, "longitude" = 8.78831111111111, "latitude" = 54.903, "temperature" = 16.868310546875023, "u10" = 4.472952365875244, "v10" = -1.3958832025527954, "pressure" = 102426.1015625});

Try CrateDB Live: More Queries

B. Geospatial Queries

B1. How to extract latitude & longitude from a GEO_POINT

B2. Distance calculations:

C. Full text search Queries – access Apache Lucene using SQL

D. Vector Search

E. Queries – JSON object model

E1. Insert a new value into the dataset:

E2. Now try to query the new key:

Product

Developers

Company

Community