Advanced Querying

About all the advanced querying features of CrateDB, unifying data types and query characteristics. Mix full-text search with time series aspects, and run powerful aggregations or other kinds of complex queries on your data.

CrateDB supports effective time series analysis with fast aggregations, relational features for JOIN operations, and a rich set of built-in functions.

At a Glance

Analyzing Device Readings

Effectively query measurement readings using enhanced features for time series data.

Run aggregations with gap filling / interpolation, using common table expressions (CTEs) and LAG / LEAD window functions.

Find maximum values using the MAX_BY aggregate function, returning the value from one column based on the maximum or minimum value of another column within a group.

WITH OrderedData AS (
  SELECT timestamp,
    location,
    temperature,
    LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
    LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
  FROM weather_data
  WINDOW w AS (PARTITION BY location ORDER BY timestamp)
)
SELECT
  timestamp,
  location,
  temperature,
  COALESCE(temperature, (prev_temp + next_temp) / 2)
    AS interpolated_temperature
FROM OrderedData
ORDER BY location, timestamp;

Navigate to Tutorial

Aggregation
CTE
Gap Filling
Interpolation
Window Functions

Time Series
SQL

Time Bucketing

Based on sensor data, this query calculates:

  • time-buckets of 10 seconds

  • different aggregations per time-bucket and host group

SELECT
  FLOOR(EXTRACT(epoch FROM m.timestamp) / 10) * 10 AS period,
  h.host_group,
  MIN(m.fields['usage_user']) AS "min",
  AVG(m.fields['usage_user']) AS "avg",
  MAX(m.fields['usage_user']) AS "max"
FROM telegraf.metrics m
LEFT JOIN telegraf.hosts h ON h.host_name = m.tags['host']
WHERE tags['cpu'] = 'cpu-total'
  AND m.timestamp > NOW() - '150 seconds'::INTERVAL
GROUP BY 1, 2
ORDER BY 1 DESC;

Aggregation
Grouping
Time Bucketing
Time Intervals

Time Series
SQL

Aggregations

Fast aggregations, even with complex queries.

Bulk Operations

You can use the bulk operations interface feature to perform many inserts in a single operation. See also bulk operations for INSERTs. The advantages are:

  • Significantly less internal network traffic than executing each insert statement individually.

  • Even though you’re executing multiple insert statements, the bulk query only needs to be parsed, planned, and executed once.

CTEs

Time Series: Analyzing Weather Data

HyperLogLog

HyperLogLog is an efficient approximate cardinality estimation algorithm.

CrateDB’s hyperloglog_distinct aggregate function calculates an approximate count of distinct non-null values using the HyperLogLog++ algorithm. See also Introducing: HyperLogLog.

LOCF / NOCB

https://community.cratedb.com/t/interpolating-missing-time-series-values/1010

LTTB

https://community.cratedb.com/t/advanced-downsampling-with-the-lttb-algorithm/1287

Maximum/Minimum Values

Analyzing Device Readings with Metadata Integration Time Series: Analyzing Weather Data

Time Bucketing

https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009

UNNEST

Window Functions

Note

This page is currently under construction. It only includes a few pointers to advanced use cases, which need expansion. It is also not in the same shape as the other pages in this section.