Common Table Expressions (CTEs)
With Common Table Expressions (CTEs), CrateDB lets you break down complex SQL logic into smaller, reusable steps, all while maintaining the speed and scalability of its distributed SQL engine.
Write clean, maintainable queries. Analyze data in stages. And power real-time insights from multiple data sources, all in a single statement.
What is a Common Table Expression?
A Common Table Expression (CTE) is a temporary, named result set defined within your SQL query using the WITH clause.
It acts like a query within a query, letting you simplify complex analytics or reuse sub-results without repeating code.
WITH recent_events AS ( SELECT device_id, temperature, ts FROM sensor_data WHERE ts > now() - INTERVAL '5 minutes' ) SELECT device_id, AVG(temperature) FROM recent_events GROUP BY device_id;
CrateDB executes CTEs inside its distributed query planner, meaning they are parallelized and optimized automatically, even for large-scale, real-time datasets.
Why use CTEs in CrateDB?
- Readable SQL: Organize multi-step queries into logical building blocks that are easier to understand and maintain.
- Reusable logic: Reference CTEs multiple times in the same query, avoiding redundant subqueries.
- Optimized execution: CrateDB’s distributed engine automatically parallelizes and optimizes each CTE.
- Ideal for real-time analytics: Chain aggregations, transformations, and filters on streaming data, all executed in milliseconds.
- Multi-model support: Use CTEs across time series, JSON, text search, geospatial, and vector data, all in one SQL flow.
Perfect for Complex Analytics
CTEs shine when you need to:
- Build multi-step aggregations (e.g., compute rolling averages, then rank by performance).
- Combine data transformations with joins, window functions, or subselects.
- Filter and enrich time series or event streams in stages.
- Simplify machine learning feature pipelines by chaining intermediate calculations.
- Write self-documenting SQL for dashboards, APIs, and embedded analytics.
CTE Example
In the example below, CrateDB computes hourly averages and flags anomalies, all in a single SQL query that’s easy to read, maintain, and execute in real time.
WITH recent_data AS ( SELECT device_id, ts, power_usage FROM energy_readings WHERE ts > now() - INTERVAL '1 hour' ), avg_usage AS ( SELECT device_id, AVG(power_usage) AS avg_power FROM recent_data GROUP BY device_id ) SELECT d.device_id, d.ts, d.power_usage, a.avg_power FROM recent_data d JOIN avg_usage a USING (device_id) WHERE d.power_usage > a.avg_power * 1.2;
How CrateDB Makes CTEs Better
| Traditional SQL Engines | CrateDB’s Distributed CTEs |
|---|---|
| Execute sequentially | Execute in parallel across nodes |
| Struggle with large datasets | Scale automatically with cluster size |
| Designed for static tables | Work seamlessly with live, streaming data |
| Limited to relational joins | Combine time series, JSON, vector, and full-text data |
Combine with Other SQL Features
CTEs integrate seamlessly with CrateDB’s other real-time query capabilities:
- Aggregations for instant metrics at scale
- Window functions for running totals and time-based logic
- Hybrid search using MATCH and KNN_MATCH
- User-defined functions (UDFs) for custom business logic
Why Developers Like It
- Familiar syntax (WITH ... AS (...)) ; no proprietary extensions
- Inline, modular logic for complex pipelines
- Fully compatible with BI tools, PostgreSQL drivers, and ORM frameworks
- Automatic optimization across distributed nodes