In this blog post, we are going to look at the differences between the query syntax of InfluxDB 2.0 (Flux) and CrateDB (SQL). This query comparison is done using a public time series dataset consisting of around 70K rows of weather station measurements. The data includes timestamped records with various meteorological parameters such as temperature, humidity, wind speed, wind direction, precipitation, pressure, and general weather conditions.
InfluxDB Terminology and Data Types
First, let’s review some InfluxDB terminology. InfluxDB stores time series data in buckets and measurements, while CrateDB stores data in schemas and tables.
- A bucket is a named location with a retention policy where time series data is stored.
- A series is a logical grouping of data defined by shared measurement, tag, and field.
- A measurement is like an SQL database table.
- A tag is like an indexed column in an SQL database.
- A field is like an un-indexed column in an SQL database.
- A point is like an SQL row.
Now, let's have a look at the supported data types both in InfluxDB and CrateDB. We can see that CrateDB covers all InfluxDB data types and offers larger capabilities around object and array types.
InfluxDB |
CrateDB |
|
Integer |
Integer (64-bit signed) |
Byte (8-bit), Short (16-bit), Integer (32-bit), Long (64-bit)
|
Float |
Float (64-bit IEEE-754) |
Float (32-bit IEEE-754), Double (64-bit IEEE-754) |
String |
String (Unicode) |
Varchar or Text (variable-length Unicode) |
Boolean |
Boolean (True or False) |
Boolean (True or False) |
Timestamp |
Timestamp (nanosecond-precision Unix timestamp) Does NOT natively include timezone information |
Timestamp (millisecond-precision), timestamp with time zone |
Array |
Array (homogeneous arrays of any supported type) |
Array (arrays of any other data type) |
Object |
Object (key-value pairs where keys are strings and values can be any supported type)
|
Object (Structured as a collection of key values. An object can contain any other type, including further child objects)
|
Geospatial |
Not supported natively |
Geo_point (Stores a point as latitude and longitude), Geo_shape (Stores more complex shapes like polygons, lines)
|
Vector |
Not supported natively |
Float_vector (allows to store dense vectors of float values of fixed length) |
Query Examples
Now, let's compare Flux and SQL queries for typical time-series scenarios.
Aggregations:
Example: Computing the average temperature for each location.
This query type helps analyze temperature trends across different locations in environmental monitoring systems, smart cities, or agriculture.
Flux:
from(bucket: "demo-bucket") |> range(start: -100y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> group(columns: ["location"]) |> mean()
SQL:
SELECT location, AVG(temperature) FROM "demo"."weather_data" GROUP BY location ORDER BY location LIMIT 100;
Transformation:
Example: Converting temperature from Celsius to Fahrenheit.
This type of transformation query is proper when integrating data with systems or reports requiring different units or formats, ensuring application consistency and compatibility.
Flux:
from(bucket: "demo-bucket") |> range(start: -100y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> group(columns: ["location"]) |> mean() |> duplicate(column: "_value", as: "_value_celsius") |> map(fn: (r) => ({ r with _value_fahrenheit: (r._value * 9.0 / 5.0) + 32.0 })) |> keep(columns: ["_time", "location", "_value_celsius", "_value_fahrenheit"]) |> yield(name: "average_temperature_celsius_and_fahrenheit")
SQL:
SELECT location, AVG(temperature) AS average_temperature_celsius, (AVG(temperature) * (9.0 / 5.0) + 32.0) AS average_temperature_fahrenheit FROM demo.weather_data WHERE time >= CURRENT_TIMESTAMP - INTERVAL '100 years' GROUP BY location ORDER by 1 limit 100;
Windowing:
Example: Computing the average temperature in 1-hour windows
This query type is useful for analyzing temperature trends over time in manageable chunks, allowing for detailed temporal insights in monitoring and reporting applications.
Flux:
from(bucket: "demo-bucket") |> range(start: 2023-08-30T00:00:00Z, stop: 2023-08-30T23:59:59Z) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> aggregate Window(every: 1h, fn: mean, create Empty: false) |> map(fn: (r) => ({ r with window start: time(v: int(v: r._time) - int(v: r._time) % 3600000000000), windowed: time(v: int(v: r._time) - int(v: r._time) % 3600000000000 + 3600000000000), mean_temperature: r._value })) |> keep(columns: ["window_start", "window_end", "mean_temperature", "location"]) |> group() |> sort(columns: ["window_start", "location"]) |> limit(n: 10) |> yield(name: "mean_temperature_per_hour")
SQL:
SELECT to_char(date_trunc('hour', time at time zone 'UTC'), 'YYYY-MM-DD HH24:MI:SS') AS window_start, to_char(date_trunc('hour', time at time zone 'UTC') + INTERVAL '1 hour', 'YYYY-MM-DD HH24:MI:SS') AS window_end, AVG(temperature) AS mean_temperature, location FROM demo.weather_data WHERE time BETWEEN '2023-08-30 00:00:00' AND '2023-08-30 23:59:59' GROUP BY window_start, window_end, location ORDER BY window_start, location LIMIT 10;
Interpolation:
Example: Interpolating temperature values using linear interpolation.
This interpolation is useful for ensuring data continuity and accuracy in analyses. It's worth noting that the formatting and sorting here is a bit different between Flux and CrateDB. This is because Influx uses microsecond precision and CrateDB milliseconds, offering different levels of accuracy for different needs.
Flux:
from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> aggregate Window(every: 1h, fn: mean, createEmpty: true) |> duplicate(column: "_value", as: "original_value") |> fill(column: "_value", usePrevious: true) |> map(fn: (r) => ({ r with interpolated_value: r._value, _value: r.original_value })) |> yield(name: "original_and_interpolated_temperature")
SQL:
WITH hourly_data AS ( SELECT date_trunc('hour', time) AS window_start, AVG(temperature) AS original_value, location FROM demo.weather_data WHERE time >= CURRENT_TIMESTAMP - INTERVAL '1 year' GROUP BY window_start, location ), filled_data AS ( SELECT window_start, location, original_value, COALESCE(original_value, LAG(original_value) OVER (PARTITION BY location ORDER BY window_start)) AS interpolated_previous_value, COALESCE(original_value, LEAD(original_value) OVER (PARTITION BY location ORDER BY window_start)) AS interpolated_next_value FROM hourly_data ) SELECT window_start AS timestamp, location, original_value AS temperature_celsius, interpolated_previous_value AS interpolated_previous_temperature, interpolated_next_value AS interpolated_next_temperature FROM filled_data where original_value is NULL LIMIT 10;
Downsampling:
Example: Computing the max temperature per week for each location in the last 12 months.
This query type is useful for long-term trend analysis and reporting, allowing you to focus on significant temperature peaks and reduce data volume while preserving critical information about extreme weather conditions.
Flux:
from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> aggregateWindow(every: 7d, fn: max, createEmpty: false) |> map(fn: (r) => ({ r with window_max_time: r._time })) |> keep(columns: ["_start", "_stop", "window_max_time", "_value", "location", "_field", "_measurement"]) |> yield(name: "weekly_max_temperature")
SQL:
WITH weekly_data AS ( SELECT date_trunc('week', time) AS window_start, date_trunc('week', time) + INTERVAL '7 days' AS window_end, MAX(temperature) AS max_temperature, MAX(time) AS window_max_time, location FROM demo.weather_data WHERE time >= CURRENT_TIMESTAMP - INTERVAL '1 year' GROUP BY window_start, window_end, location ) SELECT window_start AS _start, window_end AS _stop, window_max_time, max_temperature AS _value, location, 'temperature' AS _field, 'weather_data' AS _measurement FROM weekly_data ORDER BY window_start, location LIMIT 100;
Grouping multiple columns:
Example: Computing the average temperature and humidity for each location.
This query helps gain comprehensive insights into environmental conditions at different locations, which can be applied in climate studies, innovative city management, and agricultural monitoring.
Flux:
from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and (r._field == "temperature" or r._field == "humidity")) |> group(columns: ["location", "_field"]) |> mean()
SQL:
SELECT location, AVG(temperature) AS average_temperature, AVG(humidity) AS average_humidity FROM demo.weather_data WHERE time >= CURRENT_TIMESTAMP - INTERVAL '1 year' GROUP BY location ORDER BY location;
Filtering and Conditional Logic:
Example: Getting all temperature readings above 30 degrees Celsius over the past 11 months.
This type of query is useful for identifying periods of high temperature, which can be crucial for heatwave analysis, monitoring climate change impacts, and managing energy consumption in response to extreme weather conditions.
Flux:
from(bucket: "demo-bucket") |> range(start: -11mo) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature" and r._value > 30.0)
SQL:
SELECT time, location, temperature FROM demo.weather_data WERE time >= CURRENT_TIMESTAMP - INTERVAL '11 months' AND temperature > 30.0 ORDER BY location
Combining Queries:
Example: Computing both the average and maximum temperature for each location.
Flux:
temperature_mean = from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> group(columns: ["location"]) |> mean() temperature_max = from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and r._field == "temperature") |> group(columns: ["location"]) |> max() join(tables: {mean: temperature_mean, max: temperature_max}, on: ["location"])
SQL:
SELECT location, AVG(temperature) AS average_temperature, MAX(temperature) AS max_temperature FROM demo.weather_data WHERE time >= CURRENT_TIMESTAMP - INTERVAL '1 year' GROUP BY location ORDER BY location
Moving Average:
Example: Listing the average of the last 5 values for each timestamp.
This type of query is helpful for smoothing temperature data to reveal longer-term trends, which can be valuable in time-series analysis for fields such as climate monitoring, finance, and performance metrics.
Flux:
from(bucket: "demo-bucket") |> range(start: -1y) |> filter(fn: (r) => r._measurement == "weather_data" and (r._field == "temperature" or r._field == "humidity")) |> group(columns: ["location", "_field"]) |> mean()
SQL:
SELECT to_char(time, 'DD-MM-yyyy HH24:MI:SS'), location, temperature, AVG(temperature) OVER ( PARTITION BY location ORDER BY time ROWS BETWEEN 4 PRECEDING AND CURRENT ROW ) AS moving_average_temperature FROM demo.weather_data WHERE location = 'Berlin' AND time BETWEEN '2023-08-30 00:00:00' AND '2023-08-30 23:59:59' ORDER BY time DESC;