Relational data in CrateDB
Standard SQL with joins, aggregations, CTEs, and window functions. On a distributed database that scales horizontally without sacrificing query expressiveness.
Traditional relational databases give you SQL but not scale. As data volumes grow, you either add expensive hardware, accept slower query times, or introduce a separate analytical layer that must be kept synchronized with the operational database, e.g. a data warehouse, a read replica, or a columnar store.
NoSQL databases give you scale but not SQL, forcing teams to pre-aggregate, build custom query layers, or accept limited analytical expressiveness.
CrateDB gives you both. Full SQL expressiveness including joins, aggregations, window functions, and CTEs runs on a distributed architecture that scales horizontally by adding nodes. No manual sharding, no read replicas to manage, no separate analytical layer to synchronize.
Query examples
Standard SQL. Queries executing in parallel across a distributed cluster, not on a single node.
/* Define a relational table with a composite primary key,
OBJECT columns for nested data, and explicit sharding configuration.
CrateDB's DDL is standard SQL with distributed-system extensions
for shard count and replication. */
CREATE TABLE IF NOT EXISTS "doc"."devices_readings" (
"ts" TIMESTAMP WITH TIME ZONE,
"device_id" TEXT,
"battery" OBJECT(DYNAMIC) AS (
"level" BIGINT,
"status" TEXT,
"temperature" DOUBLE PRECISION
),
"cpu" OBJECT(DYNAMIC) AS (
"avg_1min" DOUBLE PRECISION,
"avg_5min" DOUBLE PRECISION,
"avg_15min" DOUBLE PRECISION
)
PRIMARY KEY (ts, device_id)
)
CLUSTERED INTO 6 SHARDS
/* A standard JOIN between a readings table and a device info table.
CrateDB distributes this join across nodes automatically
using its distributed query engine.
No query hints, no manual optimization, no denormalization required. */
SELECT r.device_id,
r.ts,
r.battery['level'],
r.battery['status'],
i.manufacturer,
i.os_name
FROM devices_readings r
JOIN devices_info i ON r.device_id = i.device_id
LIMIT 10;
+------------+---------------+------------------+-------------------+--------------+---------+
| device_id | ts | battery['level'] | battery['status'] | manufacturer | os_name |
+------------+---------------+------------------+-------------------+--------------+---------+
| demo000025 | 1479920640000 | 26 | discharging | iobeam | 6.0.1 |
| demo000025 | 1480209300000 | 52 | discharging | iobeam | 6.0.1 |
| demo000025 | 1479663270000 | 53 | charging | iobeam | 6.0.1 |
| demo000062 | 1479461430000 | 14 | discharging | iobeam | 6.0.1 |
| demo000067 | 1480606380000 | 42 | discharging | iobeam | 5.0.0 |
| demo000081 | 1479387000000 | 85 | discharging | iobeam | 6.0.1 |
| demo000081 | 1480166250000 | 50 | discharging | iobeam | 6.0.1 |
| demo000081 | 1479369300000 | 35 | discharging | iobeam | 6.0.1 |
| demo000087 | 1479885600000 | 72 | discharging | iobeam | 5.0.0 |
| demo000087 | 1480322430000 | 65 | discharging | iobeam | 5.0.0 |
+------------+---------------+------------------+-------------------+--------------+---------+
/* Window functions for rolling average, minimum, and maximum battery temperature
over the last 100 readings per device.
The same syntax as PostgreSQL, executing in parallel across the cluster. */
SELECT r.device_id,
AVG(battery['temperature']) OVER w AS "last 100 temperatures",
MIN(battery['temperature']) OVER w AS "min temperature",
MAX(battery['temperature']) OVER w AS "max temperature"
FROM devices_readings r
JOIN devices_info i ON r.device_id = i.device_id
WINDOW w AS (ORDER BY "ts" DESC ROWS BETWEEN 100 PRECEDING AND CURRENT ROW);
+------------+-----------------------+-----------------+-----------------+
| device_id | last 100 temperatures | min temperature | max temperature |
+------------+-----------------------+-----------------+-----------------+
| demo000230 | 88.9 | 88.9 | 88.9 |
| demo000263 | 89.55000000000001 | 88.9 | 90.2 |
| demo000836 | 90.36666666666667 | 88.9 | 92.0 |
| demo001960 | 90.65 | 88.9 | 92.0 |
| demo001780 | 90.1 | 87.9 | 92.0 |
| demo001556 | 90.18333333333334 | 87.9 | 92.0 |
| demo001851 | 90.42857142857143 | 87.9 | 92.0 |
| demo001609 | 90.075 | 87.6 | 92.0 |
| demo002840 | 90.43333333333334 | 87.6 | 93.3 |
| demo002666 | 90.76 | 87.6 | 93.7 |
+------------+-----------------------+-----------------+-----------------+
/* A time-bounded aggregation across joined tables with a recent data filter.
This is the pattern behind operational dashboards — fresh data,
filtered by time window, enriched with device metadata at query time. */
SELECT FLOOR(EXTRACT(epoch FROM m.timestamp) / 10) * 10 AS period,
h.host_group,
MIN(m.fields['usage_user']) AS "min",
AVG(m.fields['usage_user']) AS "avg",
MAX(m.fields['usage_user']) AS "max"
FROM telegraf.metrics m
LEFT JOIN telegraf.hosts h ON h.host_name = m.tags['host']
WHERE tags['cpu'] = 'cpu-total'
AND m.timestamp > NOW() - '150 seconds'::INTERVAL
GROUP BY 1, 2
ORDER BY 1 DESC;
+------------+---------+-------+--------------+
| device_id | battery | cpu | manufacturer |
+------------+---------+-------+--------------+
| demo002053 | 96.6 | 6.17 | iobeam |
| demo002380 | 91.7 | 8.71 | iobeam |
| demo000391 | 96.3 | 21.0 | iobeam |
| demo000472 | 90.4 | 8.23 | iobeam |
| demo000157 | 93.9 | 4.17 | iobeam |
| demo000307 | 89.2 | 6.05 | iobeam |
| demo002227 | 88.9 | 5.24 | iobeam |
| demo000703 | 93.2 | 4.17 | iobeam |
| demo001792 | 89.3 | 4.18 | iobeam |
| demo001780 | 87.9 | 4.95 | iobeam |
+------------+---------+-------+--------------+
/* A three-stage CTE aggregating average battery level
and temperature across device models and API versions over the last week.
This query would timeout or require pre-aggregation in most time-series databases.
In CrateDB it runs on fresh data without pre-processing. */
WITH
max_timestamp AS (
SELECT MAX(ts) AS max_ts
FROM doc.devices_readings
),
device_readings_agg AS (
SELECT
r.device_id,
AVG(r.battery['level']) AS avg_battery_level,
AVG(r.battery['temperature']) AS avg_battery_temperature
FROM devices_readings r, max_timestamp m
WHERE r.ts >= m.max_ts - INTERVAL '1 week'
GROUP BY r.device_id
),
device_model_info AS (
SELECT
device_id,
manufacturer,
model,
api_version
FROM devices_info
WHERE api_version BETWEEN 21 AND 25
)
SELECT
info.manufacturer,
info.model,
info.api_version,
AVG(read.avg_battery_level) AS model_avg_battery_level,
AVG(read.avg_battery_temperature) AS model_avg_battery_temperature,
COUNT(*) AS readings_count
FROM device_readings_agg read
JOIN device_model_info info
ON read.device_id = info.device_id
GROUP BY
info.manufacturer,
info.model,
info.api_version
ORDER BY
model_avg_battery_level DESC;
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
| manufacturer | model | api_version | model_avg_battery_level | model_avg_battery_temperature | readings_count |
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
| iobeam | focus | 22 | 51.812515496614985 | 90.95264431126454 | 37 |
| iobeam | pinto | 23 | 51.320575821166265 | 90.94536040662497 | 29 |
| iobeam | mustang | 21 | 51.12593768085889 | 90.84655787872552 | 57 |
| iobeam | mustang | 22 | 50.92397024012975 | 90.8132448654412 | 29 |
| iobeam | focus | 23 | 50.3493936780697 | 90.86938401742282 | 27 |
| iobeam | mustang | 23 | 50.226325680521306 | 90.82969886537008 | 28 |
| iobeam | pinto | 21 | 50.11437819309262 | 90.81538121980917 | 60 |
| iobeam | focus | 21 | 49.60331350481256 | 90.87625045994797 | 59 |
| iobeam | pinto | 22 | 49.54924258513566 | 90.84554594103246 | 29 |
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
User story: migrating from MySQL to CrateDB
Jeff Nappi, ClearVoice, discovered CrateDB on Hacker News and benchmarked it against their existing MySQL setup. Aggregate queries that took 20 minutes on MySQL ran in milliseconds on CrateDB. Watch the full story.
CrateDB is not a replacement for PostgreSQL or MySQL in transactional workloads. It is built for high-volume analytical and operational workloads where horizontal scalability and ingestion throughput matter more than strict ACID guarantees.
| Traditional RDBMS | CrateDB | |
| High data ingestion | ||
| High data versatility | ||
| Real-time performance | ||
| Simple scalability | ||
| SQL access | ||
| ACID, stored procedures, foreign keys, strong consistency*, referential integrity, triggers |
* CrateDB does not support ACID transactions, while offering atomic operations at the row level and opting for eventual consistency. The choice of eventual consistency prioritizes high-availability and partition tolerance, i.e. resiliency to most hardware and network failures. In combination with Multi Version Concurrency Control, the storage engine can handle high-volume concurrent reads and writes. Atomic operations and durability guarantees on record level ensure that writes to documents either succeed entirely or have no effect, queries by primary key always return the latest results.
High scalability
CrateDB handles relational data across distributed nodes, accommodating workloads that exceed the capacity of single-node relational databases without manual sharding or vertical scaling. Add nodes as data volumes grow and the cluster rebalances automatically while staying live.
Fast read performance
CrateDB's columnar storage format and parallel query execution deliver fast analytical reads across large relational datasets. Aggregations, joins, and window functions that take minutes on a traditional relational database typically run in milliseconds on CrateDB because queries execute simultaneously across all nodes rather than sequentially on one.
Multiversion concurrency control
CrateDB uses multiversion concurrency control to handle simultaneous reads and writes without blocking. Read queries never wait for write operations to complete. This is the same concurrency model as PostgreSQL, enabling high query concurrency on analytical workloads without connection queuing or lock contention.
CrateDB handles relational data alongside time-series, JSON, vector, geospatial, and full-text data in the same engine. No separate pipelines.
How CrateDB compares
Compared to traditional relational databases
Traditional relational databases offer SQL familiarity, strong consistency, and mature tooling. They become expensive and operationally complex to scale beyond a single node: read replicas add lag, manual sharding adds maintenance overhead, and analytical queries compete with transactional workloads for resources.
CrateDB adds horizontal scalability and distributed execution while keeping SQL as the primary interface. The trade-off is that full ACID transactions, foreign keys, stored procedures, and referential integrity are not available.
Teams whose primary workload is high-volume analytical queries on continuously arriving data will find CrateDB significantly faster and cheaper to operate at scale than a vertically scaled relational database.
Compared to NoSQL databases
NoSQL databases scale well and handle flexible schemas but sacrifice query expressiveness. Complex joins, aggregations, and ad-hoc analytical queries require external tooling, pre-processing pipelines, or custom application code.
CrateDB keeps SQL as the primary interface, reducing the need for specialized engineering expertise and simplifying integration with existing BI tools, data science notebooks, and visualization platforms that expect a SQL endpoint.
Compared to running both
Teams running a relational database alongside a NoSQL or analytical database face data duplication, synchronization complexity, schema divergence, and doubled operational overhead. Every change to the source schema requires a coordinated change in the analytical layer.
CrateDB handles structured, semi-structured, and unstructured data in one system, eliminating the synchronization layer and the engineering overhead that comes with it.
"I started looking at CrateDB and was impressed by the quality of the code. Switching from MySQL to CrateDB took a couple of days. It was very convenient to integrate CrateDB into our existing base even though it was written for a different database. The fact that CrateDB uses SQL lowers the barrier to entry when using distributed search. And on top of that, with CrateDB you can replace MongoDB and Elastisearch with one scalable package."
Jeff Nappi
Director of Engineering
ClearVoice
Go deeper on relational data in CrateDB
Documentation
Technical deep dives
Conference
Related pages
FAQ
A relational data format organizes information into tables consisting of rows and columns, with each row representing a record and each column representing an attribute. Each row is uniquely identified by a primary key. This model allows for easy data management through SQL queries and supports complex relationships and data integrity. CrateDB enhances this model by supporting SQL queries and relational data models with JOINs for not only structured data, but also JSON data.
Relational databases store highly structured data organized into tables with predefined schemas. Non-relational databases, on the other hand, store unstructured or semi-structured data, providing flexibility in handling diverse data types. CrateDB excels by offering rapid access to both relational and non-relational data with a strong focus on fast-read operations, making it perfect for real-time analytics and search applications.
Relational data is typically stored in structured tables with predefined schemas, making it easier to organize and query. This structured approach supports complex queries and relationships between different data sets. In CrateDB, relational records can be stored either in tabular format or in JSON documents, allowing for dynamic schema changes and providing flexibility in handling evolving data structures.
Relational databases are commonly used for managing structured data and support complex queries, transactions, and data integrity. Popular examples include Oracle, MySQL, and PostgreSQL. CrateDB combines the strengths of relational databases with the flexibility of NoSQL, offering scalable solutions without compromising performance.
CrateDB does not support full ACID transactions across multiple rows or tables. It offers atomic operations at the row level and eventual consistency across the cluster, prioritizing high availability and partition tolerance over strict transactional guarantees. For workloads requiring multi-row transactions with rollback support — financial ledgers, order management systems, inventory with strict consistency requirements — a traditional relational database like PostgreSQL is the more appropriate choice. For workloads where high-volume ingestion, horizontal scalability, and fast analytical queries matter more than strict transactional consistency — operational analytics, IoT data, log analysis, real-time dashboards — CrateDB's consistency model is well suited.