Relational Data
CrateDB supports SQL queries and relational data models with JOINs for any type of data, both structured, semi-structured and unstructured.
Examples of SQL queries
Hyper-fast. Queries in milliseconds.
/* Table that stores device reading information.
* Composite primary key defined on two columns. */
CREATE TABLE IF NOT EXISTS "doc"."devices_readings" (
"ts" TIMESTAMP WITH TIME ZONE,
"device_id" TEXT,
"battery" OBJECT(DYNAMIC) AS (
"level" BIGINT,
"status" TEXT,
"temperature" DOUBLE PRECISION
),
"cpu" OBJECT(DYNAMIC) AS (
"avg_1min" DOUBLE PRECISION,
"avg_5min" DOUBLE PRECISION,
"avg_15min" DOUBLE PRECISION
)
PRIMARY KEY (ts, device_id)
)
CLUSTERED INTO 6 SHARDS
/* The query returns 10 records from two joined tables:
* devices_readings and devices_info. */
SELECT r.device_id,
r.ts,
r.battery['level'],
r.battery['status'],
i.manufacturer,
i.os_name
FROM devices_readings r
JOIN devices_info i ON r.device_id = i.device_id
LIMIT 10;
+------------+---------------+------------------+-------------------+--------------+---------+
| device_id | ts | battery['level'] | battery['status'] | manufacturer | os_name |
+------------+---------------+------------------+-------------------+--------------+---------+
| demo000025 | 1479920640000 | 26 | discharging | iobeam | 6.0.1 |
| demo000025 | 1480209300000 | 52 | discharging | iobeam | 6.0.1 |
| demo000025 | 1479663270000 | 53 | charging | iobeam | 6.0.1 |
| demo000062 | 1479461430000 | 14 | discharging | iobeam | 6.0.1 |
| demo000067 | 1480606380000 | 42 | discharging | iobeam | 5.0.0 |
| demo000081 | 1479387000000 | 85 | discharging | iobeam | 6.0.1 |
| demo000081 | 1480166250000 | 50 | discharging | iobeam | 6.0.1 |
| demo000081 | 1479369300000 | 35 | discharging | iobeam | 6.0.1 |
| demo000087 | 1479885600000 | 72 | discharging | iobeam | 5.0.0 |
| demo000087 | 1480322430000 | 65 | discharging | iobeam | 5.0.0 |
+------------+---------------+------------------+-------------------+--------------+---------+
/* The query returns for each device_id the average, minimum and maximum temperature
* over the last 100 temperature readings. */
SELECT r.device_id,
AVG(battery['temperature']) OVER w AS "last 100 temperatures",
MIN(battery['temperature']) OVER w AS "min temperature",
MAX(battery['temperature']) OVER w AS "max temperature"
FROM devices_readings r
JOIN devices_info i ON r.device_id = i.device_id
WINDOW w AS (ORDER BY "ts" DESC ROWS BETWEEN 100 PRECEDING AND CURRENT ROW);
+------------+-----------------------+-----------------+-----------------+
| device_id | last 100 temperatures | min temperature | max temperature |
+------------+-----------------------+-----------------+-----------------+
| demo000230 | 88.9 | 88.9 | 88.9 |
| demo000263 | 89.55000000000001 | 88.9 | 90.2 |
| demo000836 | 90.36666666666667 | 88.9 | 92.0 |
| demo001960 | 90.65 | 88.9 | 92.0 |
| demo001780 | 90.1 | 87.9 | 92.0 |
| demo001556 | 90.18333333333334 | 87.9 | 92.0 |
| demo001851 | 90.42857142857143 | 87.9 | 92.0 |
| demo001609 | 90.075 | 87.6 | 92.0 |
| demo002840 | 90.43333333333334 | 87.6 | 93.3 |
| demo002666 | 90.76 | 87.6 | 93.7 |
+------------+-----------------------+-----------------+-----------------+
/*
* Based on sensor data, this query calculates:
* - time-buckets of 10 seconds
* - different aggregations per time-bucket and host group
*/
SELECT FLOOR(EXTRACT(epoch FROM m.timestamp) / 10) * 10 AS period,
h.host_group,
MIN(m.fields['usage_user']) AS "min",
AVG(m.fields['usage_user']) AS "avg",
MAX(m.fields['usage_user']) AS "max"
FROM telegraf.metrics m
LEFT JOIN telegraf.hosts h ON h.host_name = m.tags['host']
WHERE tags['cpu'] = 'cpu-total'
AND m.timestamp > NOW() - '150 seconds'::INTERVAL
GROUP BY 1, 2
ORDER BY 1 DESC;
+------------+---------+-------+--------------+
| device_id | battery | cpu | manufacturer |
+------------+---------+-------+--------------+
| demo002053 | 96.6 | 6.17 | iobeam |
| demo002380 | 91.7 | 8.71 | iobeam |
| demo000391 | 96.3 | 21.0 | iobeam |
| demo000472 | 90.4 | 8.23 | iobeam |
| demo000157 | 93.9 | 4.17 | iobeam |
| demo000307 | 89.2 | 6.05 | iobeam |
| demo002227 | 88.9 | 5.24 | iobeam |
| demo000703 | 93.2 | 4.17 | iobeam |
| demo001792 | 89.3 | 4.18 | iobeam |
| demo001780 | 87.9 | 4.95 | iobeam |
+------------+---------+-------+--------------+
/* The query provides a detailed analysis of the battery performance
* (both level and temperature) for devices with specific API versions,
* while focusing only on recent data. */
WITH
max_timestamp AS (
SELECT MAX(ts) AS max_ts
FROM doc.devices_readings
),
device_readings_agg AS (
SELECT
r.device_id,
AVG(r.battery['level']) AS avg_battery_level,
AVG(r.battery['temperature']) AS avg_battery_temperature
FROM devices_readings r, max_timestamp m
WHERE r.ts >= m.max_ts - INTERVAL '1 week'
GROUP BY r.device_id
),
device_model_info AS (
SELECT
device_id,
manufacturer,
model,
api_version
FROM devices_info
WHERE api_version BETWEEN 21 AND 25
)
SELECT
info.manufacturer,
info.model,
info.api_version,
AVG(read.avg_battery_level) AS model_avg_battery_level,
AVG(read.avg_battery_temperature) AS model_avg_battery_temperature,
COUNT(*) AS readings_count
FROM device_readings_agg read
JOIN device_model_info info
ON read.device_id = info.device_id
GROUP BY
info.manufacturer,
info.model,
info.api_version
ORDER BY
model_avg_battery_level DESC;
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
| manufacturer | model | api_version | model_avg_battery_level | model_avg_battery_temperature | readings_count |
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
| iobeam | focus | 22 | 51.812515496614985 | 90.95264431126454 | 37 |
| iobeam | pinto | 23 | 51.320575821166265 | 90.94536040662497 | 29 |
| iobeam | mustang | 21 | 51.12593768085889 | 90.84655787872552 | 57 |
| iobeam | mustang | 22 | 50.92397024012975 | 90.8132448654412 | 29 |
| iobeam | focus | 23 | 50.3493936780697 | 90.86938401742282 | 27 |
| iobeam | mustang | 23 | 50.226325680521306 | 90.82969886537008 | 28 |
| iobeam | pinto | 21 | 50.11437819309262 | 90.81538121980917 | 60 |
| iobeam | focus | 21 | 49.60331350481256 | 90.87625045994797 | 59 |
| iobeam | pinto | 22 | 49.54924258513566 | 90.84554594103246 | 29 |
+--------------+---------+-------------+-------------------------+-------------------------------+----------------+
Traditional RDBMS | CrateDB | |
High data ingestion | ||
High data versatility | ||
Real-time performance | ||
Simple scalability | ||
SQL access | ||
ACID, stored procedures, foreign keys, strong consistency*, referential integrity, triggers |
* CrateDB does not support ACID transactions, while offering atomic operations at the row level and opting for eventual consistency. The choice of eventual consistency prioritizes high-availability and partition tolerance, i.e. resiliency to most hardware and network failures. In combination with Multi Version Concurrency Control, the storage engine can handle high-volume concurrent reads and writes. Atomic operations and durability guarantees on record level ensure that writes to documents either succeed entirely or have no effect, queries by primary key always return the latest results.
High scalability
Easily handle relational data with CrateDB’s high scalability, supported by its distributed architecture. It accommodates small-scale and enterprise-level relational data requirements without compromising speed or performance.
Rapid read capabilities
Access relational data rapidly with CrateDB’s strong focus on fast-read operations. Its storage and retrieval mechanisms expedite data retrieval, making it an ideal choice for applications prioritizing read speed.
Multiversion concurrency control
Experience consistent and efficient data access with multiversion concurrency control. CrateDB utilizes multiversion concurrency control to enable simultaneous data read and write operations without blocking, improving query performance and scalability.
Multi-model approach
Use one database for all structured, semi-structured, and unstructured data: tables, JSON, time series, vectors, geospatial data, full-text and binary objects. Create hybrid data models comprised of structured and semi-structured data - all in one place.
Data enrichment
Enrich relational data with JSON. CrateDB seamlessly integrates JSON data alongside traditional relational data through its flexible schema, allowing for dynamic and nested structures to be stored and queried efficiently. Combine relational and JSON data to facilitate a comprehensive analysis and harness the advantages of structured and semi-structured data models within a single database environment.
Storage as JSON
Store relational records as JSON documents. Relational records stored as JSON can change structure on the fly, providing greater flexibility in handling evolving data structures and simplifying storage of large datasets.
ClearVoice migrates from MySQL to CrateDB
Jeff Nappi discovered CrateDB in Hacker News and decided to benchmark its performance against his existing MySQL setup. Learn how their aggregate querying went from 20 minutes to milliseconds after migrating to CrateDB.
Curious to learn more?
While CrateDB focuses on relational data handling, it doesn't neglect the significance of NoSQL. It emphasizes the tradeoffs between relational and non-relational databases, balancing the strengths of relational models with distributed capabilities.
Compared to relational databases
Pros: traditional relational databases offer SQL accessibility, high data availability and easy integration.
Cons: they are expensive to scale, unable to process the vast amount of incoming data and unable to perform complex queries with quick response times.
Advantage of CrateDB: CrateDB is easy to scale and offers no impact of volume on performance.
Compared to NoSQL databases
Pros: NoSQL databases offer scalability, flexibility and distributed architecture.
Cons: they generate high costs, require complex infrastructures (with intensive planning and administration to operate correctly) and need specific engineering expertise.
Advantage of CrateDB: CrateDB is very simple to use with SQL, offers easy integration with 3rd party solutions and is much more cost efficient.
Compared to having both SQL and NoSQL databases
Pros: combining both types of databases might go around some limitations.
Cons: this approach is highly costly, needs complex data synchronization to run in parallel, requires complex setups and has unnecessarily large cloud footprints. Not even mentioning the complexity of managing multiple database schemas, designing cross-database queries and the resulting slow performance.
Advantage of CrateDB: CrateDB offers the benefits of both with no additional investment.
Other resources on relational data
conference
Workshop
FAQ
A relational data format organizes information into tables consisting of rows and columns, with each row representing a record and each column representing an attribute. Each row is uniquely identified by a primary key. This model allows for easy data management through SQL queries and supports complex relationships and data integrity. CrateDB enhances this model by supporting SQL queries and relational data models with JOINs for not only structured data, but also JSON data.
Relational databases store highly structured data organized into tables with predefined schemas. Non-relational databases, on the other hand, store unstructured or semi-structured data, providing flexibility in handling diverse data types. CrateDB excels by offering rapid access to both relational and non-relational data with a strong focus on fast-read operations, making it perfect for real-time analytics and search applications.
Relational data is typically stored in structured tables with predefined schemas, making it easier to organize and query. This structured approach supports complex queries and relationships between different data sets. In CrateDB, relational records can be stored either in tabular format or in JSON documents, allowing for dynamic schema changes and providing flexibility in handling evolving data structures.
Relational databases are commonly used for managing structured data and support complex queries, transactions, and data integrity. Popular examples include Oracle, MySQL, and PostgreSQL. CrateDB combines the strengths of relational databases with the flexibility of NoSQL, offering scalable solutions without compromising performance.