Advanced Querying¶

About all the advanced querying features of CrateDB, unifying data types and query characteristics. Mix full-text search with time series aspects, and run powerful aggregations or other kinds of complex queries on your data.

CrateDB supports effective time series analysis with fast aggregations, relational features for JOIN operations, and a rich set of built-in functions.

At a Glance¶

Analyze Device Readings

Effectively query measurement readings using enhanced features for time series data.

Run aggregations with gap filling / interpolation, using common table expressions (CTEs) and LAG / LEAD window functions.

Find maximum values using the MAX_BY aggregate function, returning the value from one column based on the maximum or minimum value of another column within a group.

WITH OrderedData AS (
  SELECT timestamp,
    location,
    temperature,
    LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
    LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
  FROM weather_data
  WINDOW w AS (PARTITION BY location ORDER BY timestamp)
)
SELECT
  timestamp,
  location,
  temperature,
  COALESCE(temperature, (prev_temp + next_temp) / 2)
    AS interpolated_temperature
FROM OrderedData
ORDER BY location, timestamp;

Aggregation
CTE
Gap Filling
Interpolation
Window Functions

Time Series
SQL

Time Bucketing

Based on sensor data, this query calculates:

time-buckets of 10 seconds
different aggregations per time-bucket and host group

SELECT
  FLOOR(EXTRACT(epoch FROM m.timestamp) / 10) * 10 AS period,
  h.host_group,
  MIN(m.fields['usage_user']) AS "min",
  AVG(m.fields['usage_user']) AS "avg",
  MAX(m.fields['usage_user']) AS "max"
FROM telegraf.metrics m
LEFT JOIN telegraf.hosts h ON h.host_name = m.tags['host']
WHERE tags['cpu'] = 'cpu-total'
  AND m.timestamp > NOW() - '150 seconds'::INTERVAL
GROUP BY 1, 2
ORDER BY 1 DESC;

Aggregation
Grouping
Time Bucketing
Time Intervals

Time Series
SQL

Aggregations¶

Fast aggregations, even with complex queries.

Bulk Operations¶

You can use the bulk operations interface feature to perform many inserts in a single operation. See also bulk operations for INSERTs. The advantages are:

Significantly less internal network traffic than executing each insert statement individually.
Even though you’re executing multiple insert statements, the bulk query only needs to be parsed, planned, and executed once.

CTEs¶

Time Series: Analyze Weather Data

HyperLogLog¶

HyperLogLog is an efficient approximate cardinality estimation algorithm.

CrateDB’s hyperloglog_distinct aggregate function calculates an approximate count of distinct non-null values using the HyperLogLog++ algorithm. See also Introducing: HyperLogLog.

LOCF / NOCB¶

https://community.cratedb.com/t/interpolating-missing-time-series-values/1010

LTTB¶

https://community.cratedb.com/t/advanced-downsampling-with-the-lttb-algorithm/1287

Maximum/Minimum Values¶

Search¶

CrateDB provides capabilities for full-text search, vector search, and hybrid search, all based on vanilla SQL.

Text-to-SQL¶

Natural language to SQL conversions using adapters and frameworks.

Text-to-SQL

Time Bucketing¶

https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009

UNNEST¶

Window Functions¶

Note

This page is currently under construction. It only includes a few pointers to advanced use cases, which need expansion. It is also not in the same shape as the other pages in this section.