Aggregations¶

High-performance aggregations on massive volumes of data using SQL.

Introduction

Whether you are monitoring sensor networks, analyzing customer behavior, or powering dashboards, CrateDB’s distributed engine, columnar storage, and native support for structured and semi-structured data make aggregations blazingly fast, even across billions of rows.

Use CrateDB when you need to

Aggregate over high-ingestion datasets (millions of records per hour)
Analyze real-time metrics across structured, JSON, or time series fields
Build dynamic dashboards and run interactive ad-hoc analytics
Combine aggregations with full-text, geospatial, or vector filters

Benefits of using CrateDB for aggregations

Feature	Benefit
Distributed SQL engine	Parallel execution across nodes ensures linear scalability
Columnar storage	Reads only relevant columns for faster aggregations
Real-time ingestion	Query freshly ingested data without delay
Aggregations on any data type	Structured, JSON, full-text, geospatial, or vector
Smart indexing	Built-in indexing and configuration options that can boost performance

Supported Aggregation Functions¶

CrateDB supports a rich set of SQL-92-compliant and extended functions for aggregation, including:

COUNT(), SUM(), AVG(), MIN(), MAX()
STDDEV(), PERCENTILE(), VARIANCE(), TOPK()
HYPERLOGLOG_DISTINCT()
Windowed and conditional aggregations via OVER(...) and FILTER (WHERE ...) clauses

To learn about the full set of functions, please visit the reference documentation at Aggregate functions. See also Window functions.

Common Aggregation Patterns¶

Count & Grouping

SELECT city, COUNT(*) AS trip_count
FROM trips
GROUP BY city
ORDER BY trip_count DESC
LIMIT 10;

Time-Based Aggregation

SELECT DATE_TRUNC('day', timestamp) AS day, AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY day
ORDER BY day ASC;

Statistical Summaries

SELECT
  MIN(response_time),
  MAX(response_time),
  AVG(response_time),
  STDDEV_POP(response_time)
FROM logs
WHERE timestamp >= now() - INTERVAL '1 day';

Nested / Object Field Aggregation

SELECT payload['device']['os'], COUNT(*) AS count
FROM events
GROUP BY payload['device']['os'];

Statistics: Example using PERCENTILE for tail latency.

SELECT PERCENTILE(response_time, 0.95) AS p95
FROM api_logs
WHERE endpoint = '/checkout';

Real-World Examples¶

Industrial IoT: Monitor and aggregate sensor readings from thousands of devices in real time.

SELECT device_id, MAX(temperature) AS max_temp
FROM readings
WHERE timestamp >= now() - INTERVAL '1 hour'
GROUP BY device_id;

E-Commerce Analytics: Aggregate customer orders across dimensions like product, region, or time.

SELECT product_id, SUM(quantity) AS units_sold
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY product_id
ORDER BY units_sold DESC
LIMIT 20;

Fleet Monitoring: Aggregate location and status data of vehicles in motion.

SELECT status, COUNT(*)
FROM vehicle_tracking
WHERE updated_at >= now() - INTERVAL '10 minutes'
GROUP BY status;

Visualization & BI Tools¶

CrateDB integrates seamlessly with:

Grafana:: Build real-time dashboards with time series aggregations
Apache Superset:: Explore multidimensional data visually
Tableau, Power BI, Metabase:: Connect via PostgreSQL wire protocol

These tools submit SQL queries to CrateDB, which returns pre-aggregated or real-time data efficiently. To learn about the full set of integrations, please visit the documentation at Business Intelligence and Data Visualization.

Aggregations¶

Supported Aggregation Functions¶

Common Aggregation Patterns¶

Real-World Examples¶

Visualization & BI Tools¶

See also¶