Relational / JOINs¶
Overview
CrateDB implements the relational concept of joining tables.
About
When selecting data from CrateDB, you can join one or more relations (tables) to combine columns into one result set.
Details
Joins are essential operations in relational databases. They create a link between rows based on common values and allow the meaningful combination of these rows.
CrateDB was designed to support distributed joins effectively from the very beginning.
Reference Manual
Related
SQL JOIN
Blog: Support for Joins on Multi-Row Sub-Selects
Joining on virtual tables is a crucial feature for many users, and it is especially useful when doing analytics on your data. A virtual table is what we call the result set returned by a subquery.
Being able to use sub-selects as virtual tables for the lifetime of a query is very useful because it means that you can slice and dice your data multiple ways without having to alter your source data, or store duplicate versions of it.
Example
SELECT *
FROM (SELECT *
FROM table_1
ORDER BY column_a
LIMIT 100)
AS virtual_table_1,
INNER JOIN (SELECT *
FROM (SELECT *
FROM table_2
ORDER BY column_b
LIMIT 100)
AS virtual_table_2
GROUP BY column_c)
AS virtual_table_3
ON virtual_table_1.column_a =
virtual_table_3.column_a
Lab Notes
Join Feature Sub-Selects
2018
Blog: How We Made Joins 23 Thousand Times Faster
Introduces you to the nested-loop join, equi-join, sorted merge vs. hash join, and the block hash join algorithms, and the advancement into a distributed block hash join algorithm.
This blog post illustrates the implementation of the distributed block hash join algorithm to support users who want to run joins on large tables for their analytics needs.
Lab Notes
CrateDB Internals Join Algorithms
2018
Blog: Fine-tuning the query optimizer in CrateDB
In cases where you need it, the query optimizer, which tries to find the best logical plan possible for a given query, can be fine-tuned for specific queries, in order to yield better performance.
By default, the effective join order is based on table statistics collected
into the pg_catalog.pg_stats
system table.
CrateDB offers the option to disable join-reordering and rely exactly on the order of the tables as described in the query, which is helpful to get more control over the query execution.
Example
SET optimizer_reorder_hash_join = false;
SET optimizer_reorder_nested_loop_join = false;
Tuning Tipps
Join Performance Performance Settings
2023
See also
Features: Advanced Querying
Domains: Telemetry Data Store • Raw-Data Analytics • Industrial Data • Time Series Data • Machine Learning
Product: Relational Database • Indexing, Columnar Storage, and Aggregations