Skip to content
Explore

Try CrateDB Live

Scenario: Real-time IoT Analytics

A. Generic Queries

Try some queries to see the flexibility in the data model:

A1. What is the time range?

SELECT
  min(measurement_time) AS start_ts,
  max(measurement_time) AS end_ts
FROM
  demo.climate_data;

Timestamps in CrateDB are always stored without a time-zone. If the field is defined WITH a time-zone, then on ingest the timestamp is adjusted accordingly to UTC. In this data, it’s defined without a time-zone. It’s defined as the number of milliseconds since the Unix epoch (1970-01-01T00:00:00Z).

For a detailed guide to all the available datatypes see this page of the documentation.  

Note: The CrateDB console will display both the number of milliseconds and the human readable date and time.

A2. Distribution of temperature

SELECT
  min(data['temperature']) AS min_t,
  avg(data['temperature']) AS avg_t,
  max(data['temperature']) AS max_t
FROM
  demo.climate_data;

This is a straightforward aggregation query. It’s running a query across all data in the table and returning the minimum, maximum and average temperatures.

Notice the AS, which is an alias. Normally fields in an aggregate will be unnamed expressions, but with recent versions of CrateDB it will generate an alias automatically. However, it’s still recommended to give a name.

A3. Average temperature per day

SELECT
  date_trunc('day', measurement_time) AS day,
  min(data['temperature']) AS min_temp,
  max(data['temperature']) AS max_temp
FROM
  demo.climate_data
GROUP BY
  date_trunc('day', measurement_time)
ORDER BY
  date_trunc('day', measurement_time);

This query introduces a new function - date_trunc. This returns a timestamp value truncated to a specific precision, in this case to precision of a day and effectively ignores the HH:MM:SS of any timestamp. This gives us a per-day aggregate figure.

To get information on more functions like this, read this documentation page.

A4. Ten hottest points

SELECT
  measurement_time,
  geo_location,
  data['temperature'] AS temp
FROM
  demo.climate_data
ORDER BY
  temp DESC
LIMIT
  10;

To give us the top 10 hottest points, we use ORDER BY (to order by temperature in descending order) and LIMIT to only return 10 rows.

The SELECT part of the query returns us the timestamp (i.e. when), the location (as a GEO_POINT co-ordinate, which is explained later) and the temperature from the data JSON object.

A5. Are we missing any data points?

SELECT
  count(*) FILTER (WHERE data['temperature'] IS NULL) AS missing_temp,
  count(*) FILTER (WHERE data['pressure'] IS NULL) AS missing_pressure,
  count(*) FILTER (WHERE geo_location IS NULL) AS missing_geo
FROM 
  demo.climate_data;

As you can see, nothing is missing in the included data. How does this query work?

FILTER is a window function. It performs a calculation across a set of related rows defined by an expression (or multiple expressions). In this case we are COUNTing the number of rows where one of data['temperature'] or data['pressure'] or geo_location is NULL (i.e. undefined).

Window functions are very powerful, and it’s very much recommended to read this webpage to learn more about their capabilities.

Let’s add a point with a missing temperature and re-run:

INSERT INTO demo.climate_data (measurement_time, geo_location, data) VALUES (1754956800000, [8.78831111111111, 54.903], {"longitude" = 8.78831111111111, "latitude" = 54.903, "u10" = 4.472952365875244, "v10" = -1.3958832025527954, "pressure" = 102426.1015625});

If you now re-run the missing data point query, it will show:

missing_temp	missing_pressure	missing_geo
1		        0			        0

Now let’s update the record with a valid temperature:

UPDATE demo.climate_data
SET
  data['temperature'] = 16.8
WHERE
  measurement_time = 1754956800000
  AND geo_location = [8.78831111111111, 54.903];

Re-run the missing data point query, and you’ll see it now shows zero again, i.e. nothing is missing.

A6. Wind speed calculations:

This query will calculate min/max and avg wind speeds (in metres per second) using Pythagoras’ theorem to calculate the diagonal (hypotenuse) value:

SELECT
  min(sqrt(power(data['u10'], 2) + power(data['v10'], 2))) AS min_ws,
  avg(sqrt(power(data['u10'], 2) + power(data['v10'], 2))) AS avg_ws,
  max(sqrt(power(data['u10'], 2) + power(data['v10'], 2))) AS max_ws
FROM demo.climate_data;

The power function returns the field to the specified power, 2 in this case, i.e. it squares the field value. The sqrt functions returns the square root.

These are part of the CrateDB scalar functions, and more information is available here.

A7. Can we improve this query?

The mathematical expression is being calculated 3 times and this lacks elegance. We can use a CTE (common table expression) to improve this:

WITH wind_speed AS (
    SELECT
        sqrt(power(data['u10'], 2) + power(data['v10'], 2)) AS ws
    FROM demo.climate_data
)
SELECT
    MIN(ws) AS min_ws,
    AVG(ws) AS avg_ws,
    MAX(ws) AS max_ws
FROM wind_speed;

Now we are only calculating once. For more information on common table expressions, see this documentation page.

A8. What are the top 10 windiest observations?

SELECT
  measurement_time,
  geo_location,
  sqrt(power(data['u10'], 2) + power(data['v10'], 2)) AS wind_speed
FROM
  demo.climate_data
ORDER BY
  wind_speed DESC
LIMIT 10;

This uses the wind speed calculation as detailed in the previous query.

A9. Wind direction calculated using trigonometry

This query will show the wind direction in degrees.

SELECT
  date_trunc('hour', measurement_time) AS hour,
  avg(sqrt(power(data['u10'], 2) + power(data['v10'], 2))) AS avg_speed,
  avg(
    mod(degrees(atan2(data['u10'], data['v10'])) + 360, 360)
  ) AS avg_dir_deg
FROM demo.climate_data
GROUP BY 1
ORDER BY 1;

The function MOD has been introduced here, this refers to the modulus operation (mod is a shortened alias to modulus). This function returns the remainder of a divide operation; in this case we use it so that the angle in degrees is always between 0 and 360, i.e. an angle of 370 would return 10.

The other new function is ATAN2, which is a trigonometry function. All these are covered in the scalar functions guide.