Advanced Querying¶
About all the advanced querying features of CrateDB, unifying data types and query characteristics. Mix full-text search with time series aspects, and run powerful aggregations or other kinds of complex queries on your data.
CrateDB supports effective time series analysis with fast aggregations, relational features for JOIN operations, and a rich set of built-in functions.
At a Glance¶
Analyzing Device Readings
Effectively query measurement readings using enhanced features for time series data.
Run aggregations with gap filling / interpolation, using common table expressions (CTEs) and LAG / LEAD window functions.
Find maximum values using the MAX_BY aggregate function, returning the value from one column based on the maximum or minimum value of another column within a group.
WITH OrderedData AS (
SELECT timestamp,
location,
temperature,
LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
FROM weather_data
WINDOW w AS (PARTITION BY location ORDER BY timestamp)
)
SELECT
timestamp,
location,
temperature,
COALESCE(temperature, (prev_temp + next_temp) / 2)
AS interpolated_temperature
FROM OrderedData
ORDER BY location, timestamp;
Aggregation
CTE
Gap Filling
Interpolation
Window Functions
Time Series
SQL
Time Bucketing
Based on sensor data, this query calculates:
time-buckets of 10 seconds
different aggregations per time-bucket and host group
SELECT
FLOOR(EXTRACT(epoch FROM m.timestamp) / 10) * 10 AS period,
h.host_group,
MIN(m.fields['usage_user']) AS "min",
AVG(m.fields['usage_user']) AS "avg",
MAX(m.fields['usage_user']) AS "max"
FROM telegraf.metrics m
LEFT JOIN telegraf.hosts h ON h.host_name = m.tags['host']
WHERE tags['cpu'] = 'cpu-total'
AND m.timestamp > NOW() - '150 seconds'::INTERVAL
GROUP BY 1, 2
ORDER BY 1 DESC;
Aggregation
Grouping
Time Bucketing
Time Intervals
Time Series
SQL
Aggregations¶
Fast aggregations, even with complex queries.
Bulk Operations¶
You can use the bulk operations interface feature to perform many inserts in a single operation. See also bulk operations for INSERTs. The advantages are:
Significantly less internal network traffic than executing each insert statement individually.
Even though you’re executing multiple insert statements, the bulk query only needs to be parsed, planned, and executed once.
CTEs¶
HyperLogLog¶
HyperLogLog is an efficient approximate cardinality estimation algorithm.
CrateDB’s hyperloglog_distinct aggregate function calculates an approximate count of distinct non-null values using the HyperLogLog++ algorithm. See also Introducing: HyperLogLog.
LOCF / NOCB¶
https://community.cratedb.com/t/interpolating-missing-time-series-values/1010
LTTB¶
https://community.cratedb.com/t/advanced-downsampling-with-the-lttb-algorithm/1287
Maximum/Minimum Values¶
Analyzing Device Readings with Metadata Integration Time Series: Analyzing Weather Data
Search¶
CrateDB provides capabilities for full-text search, vector search, and hybrid search, all based on vanilla SQL.
Time Bucketing¶
https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009
UNNEST¶
Window Functions¶
Note
This page is currently under construction. It only includes a few pointers to advanced use cases, which need expansion. It is also not in the same shape as the other pages in this section.
See also
Features: Relational / JOINs
Domains: Telemetry Data Store • Raw-Data Analytics • Industrial Data • Time Series Data • Machine Learning
Product: Relational Database • Indexing, Columnar Storage, and Aggregations
See also
Features: Relational / JOINs
Domains: Telemetry Data Store • Raw-Data Analytics • Industrial Data • Time Series Data • Machine Learning
Product: Relational Database • Indexing, Columnar Storage, and Aggregations