Analyzing Device Readings with Metadata Integration¶
CrateDB is highly regarded as an optimal database solution for managing time series data thanks to its unique blend of features. It is particularly effective when you need to combine time series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis.
About
CrateDB supports effective time series analysis with enhanced features for fast aggregations.
Rich data types for storing structured nested data (OBJECT) alongside time series data.
A rich set of built-in functions for aggregations.
Relational JOIN operations.
Common table expressions (CTEs).
Data
This tutorial illustrates how to effectively query time series data with metadata, in order to conduct comprehensive data analysis.
It uses a time series dataset that includes telemetry readings from appliances, such as battery, CPU, and memory information, as well as metadata information like manufacturer, model, and firmware version.
Creating the Tables¶
CrateDB uses SQL, the most popular query language for database management. To store the device readings and the device info data, define two tables with columns tailored to the datasets.
To get started, let’s use a time series dataset that captures various device readings, such as battery, CPU, and memory information. Each record includes:
- ts:
Timestamp when each reading was taken.
- device_id:
Identifier of the device.
- battery:
Object containing battery level, status, and temperature.
- cpu:
Object containing average CPU loads over the last 1, 5, and 15 minutes.
- memory:
Object containing information about the device’s free and used memory.
The second dataset in this tutorial contains metadata information about various devices. Each record includes:
- device_id:
Identifier of the device.
- api_version:
Version of the API that the device supports.
- manufacturer:
Name of the manufacturer of the device.
- model:
Model name of the device.
- os_name:
Name of the operating system running on the device.
Create the tables using the CREATE TABLE
command:
CREATE TABLE IF NOT EXISTS doc.devices_readings (
"ts" TIMESTAMP WITH TIME ZONE,
"device_id" TEXT,
"battery" OBJECT(DYNAMIC) AS (
"level" BIGINT,
"status" TEXT,
"temperature" DOUBLE PRECISION
),
"cpu" OBJECT(DYNAMIC) AS (
"avg_1min" DOUBLE PRECISION,
"avg_5min" DOUBLE PRECISION,
"avg_15min" DOUBLE PRECISION
),
"memory" OBJECT(DYNAMIC) AS (
"free" BIGINT,
"used" BIGINT
)
);
CREATE TABLE IF NOT EXISTS doc.devices_info (
"device_id" TEXT,
"api_version" TEXT,
"manufacturer" TEXT,
"model" TEXT,
"os_name" TEXT
);
Using objects in the devices_readings
dataset allows for the structured and efficient organization of complex, nested data, enhancing both data integrity and flexibility.
Inserting Data¶
Now, insert the data using the COPY FROM
SQL statement.
COPY doc.devices_info
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/devices_info.json.gz'
WITH (compression='gzip', empty_string_as_null=true)
RETURN SUMMARY;
COPY doc.devices_readings
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/devices_readings.json.gz'
WITH (compression='gzip', empty_string_as_null=true)
RETURN SUMMARY;
Time Series Analysis with Metadata¶
JOIN Operations
To illustrate JOIN
operations, the first query retrieves the 30 rows of combined data from two tables, devices.readings
and devices.info
, based on a matching device_id
in both. It effectively merges the detailed readings and corresponding device information, providing a comprehensive view of each device’s status and metrics.
SELECT *
FROM devices.readings r
JOIN devices.info i ON r.device_id = i.device_id
LIMIT 30;
Aggregate Values
The next query illustrates the calculation of summaries for aggregate values. In particular, it finds average battery levels (avg_battery_level
) for each day and shows the result in an ascending order.
SELECT date_trunc('day', ts) AS "day", AVG(battery['level']) AS avg_battery_level
FROM doc.devices_readings
GROUP BY "day"
ORDER BY "day";
Rolling Averages and Window Functions
Rolling averages are crucial in time series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series.
The following example illustrates the average (AVG
), minimum (MIN
), and maximum (MAX
) battery temperature over a window of the last 100 temperature readings (ROWS BETWEEN 100 PRECEDING AND CURRENT ROW
). The window is defined in descending order by timestamp (ts
) and can be adapted to support different use cases.
SELECT r.device_id,
AVG(battery['temperature']) OVER w AS "last 100 temperatures",
MIN(battery['temperature']) OVER w AS "min temperature",
MAX(battery['temperature']) OVER w AS "max temperature"
FROM doc.devices_readings r
JOIN doc.devices_info i ON r.device_id = i.device_id
WINDOW w AS (ORDER BY "ts" DESC ROWS BETWEEN 100 PRECEDING AND CURRENT ROW);
Most Recent Observation
The next query shows how to extract the most recent reading for each device of
the mustang model. The query selects the latest timestamp (MAX(r.ts)
),
which represents the most recent reading time, and the corresponding latest
readings for battery, CPU, and memory. It uses MAX_BY
for each respective
component, using the timestamp as the determining factor.
These results are grouped by device_id
, manufacturer
, and model
to ensure
that the latest readings for each unique device are included. This query is
particularly useful for monitoring the most current status of specific devices
in a fleet.
SELECT
MAX(r.ts) as time,
r.device_id,
MAX_BY(r.battery, r.ts) as battery,
MAX_BY(r.cpu, r.ts) as cpu,
MAX_BY(r.memory, r.ts) as memory,
i.manufacturer,
i.model
FROM
devices_readings r
JOIN
devices_info i ON r.device_id = i.device_id
WHERE
i.model = 'mustang'
GROUP BY
r.device_id, i.manufacturer, i.model;
Common Table Expressions (CTEs)
Finally, we illustrate the use of Common Table Expressions (CTEs) on behalf of a complex query to aggregate and analyze device readings and metadata information. The query relies on three CTEs to temporarily capture data.
- max_timestamp:
Find the most recent timestamp (
MAX(ts)
) in thedoc.devices_readings
table. This CTE is used to focus the analysis on recent data.- device_readings_agg:
Calculate the average battery level and temperature for each device, but only for readings taken within the last week, as defined by
r.ts >= m.max_ts - INTERVAL '1 week'
.- device_model_info:
Select details from the
doc.devices_info
table, specifically thedevice_id
,manufacturer
,model
, andapi_version
, but only for devices with an API version between 21 and 25.
The main SELECT
statement joins the device_readings_agg
and device_model_info
CTEs, and aggregates data to provide the average battery level and temperature
for each combination of manufacturer, model, and API version.
It also provides the number of readings (COUNT(*)
) for each grouping.
The query aims to provide a detailed analysis of the battery performance (both level and temperature) for devices with specific API versions, while focusing only on recent data. It allows for a better understanding of how different models and manufacturers are performing in terms of battery efficiency within a specified API range and time frame.
WITH
max_timestamp AS (
SELECT MAX(ts) AS max_ts
FROM doc.devices_readings
),
device_readings_agg AS (
SELECT
r.device_id,
AVG(r.battery['level']) AS avg_battery_level,
AVG(r.battery['temperature']) AS avg_battery_temperature
FROM
devices_readings r, max_timestamp m
WHERE
r.ts >= m.max_ts - INTERVAL '1 week'
GROUP BY
r.device_id
),
device_model_info AS (
SELECT
device_id,
manufacturer,
model,
api_version
FROM
devices_info
WHERE
api_version BETWEEN 21 AND 25
)
SELECT
info.manufacturer,
info.model,
info.api_version,
AVG(read.avg_battery_level) AS model_avg_battery_level,
AVG(read.avg_battery_temperature) AS model_avg_battery_temperature,
COUNT(*) AS readings_count
FROM
device_readings_agg read
JOIN
device_model_info info
ON
read.device_id = info.device_id
GROUP BY
info.manufacturer,
info.model,
info.api_version
ORDER BY
model_avg_battery_level DESC;
Conclusion
This tutorial has guided you through the process of querying and analyzing time series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata.
These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time series data in real-world applications.