Document Store¶
Overview
Learn how to efficiently store JSON documents or other structured data, also nested, using CrateDB’s OBJECT and ARRAY container data types, and how to query this data with ease.
CrateDB combines the advantages of typical SQL databases and strict schemas with the dynamic properties of NoSQL databases. While traditional object-relational databases allow you to store and process JSON data only opaquely, CrateDB handles objects as first-level citizens.
About
This feature allows users to access object properties in the same manner as columns in a table, including full-text indexing and aggregation capabilities.
Even when using dynamic objects, i.e. when working without a strict object schema, all attributes are indexed by default, and can be queried efficiently.
Storing documents in CrateDB provides the same convenience like the document-oriented storage layers of Lotus Notes / Domino, CouchDB, MongoDB, or PostgreSQL’s JSON(B) types.
Details
CrateDB uses Lucene as a storage layer, so it inherits its concepts about storage entities and units, in the same spirit as Elasticsearch.
- Document:
In Lucene, the Document is a fundamental entity, as it is the unit of search and index. An index consists of one or more Documents.
- Field:
A Document consists of one or more Fields. A Field is simply a name-value pair.
While Elasticsearch uses a query DSL based on JSON, in CrateDB, you can work with Lucene Documents using SQL.
Reference Manual
Related
JSON Container Document Object Array Nested Indexed
Synopsis¶
Store and query structured data, in this case blending capabilities of InfluxDB and MongoDB, but with much more headroom for other features, and using SQL instead of proprietary query languages.
DDL
CREATE TABLE reading (
"timestamp" TIMESTAMP,
"tags" OBJECT(DYNAMIC),
"fields" OBJECT(DYNAMIC)
);
DML
INSERT INTO reading (
timestamp,
tags,
fields
) VALUES (
'2024-03-02T23:42:42',
{
"sensor_id" = '4834CC52',
"site_id" = 23
},
{
"temperature" = 42.42,
"humidity" = 84.84
}
);
DQL
SELECT
*
FROM
reading
WHERE
tags['sensor_id'] =
'4834CC52';
Result
+---------------+------------------------------------------+-------------------------------------------+
| timestamp | tags | fields |
+---------------+------------------------------------------+-------------------------------------------+
| 1709422962000 | {"sensor_id": "4834CC52", "site_id": 23} | {"humidity": 84.84, "temperature": 42.42} |
+---------------+------------------------------------------+-------------------------------------------+
SELECT 1 row in set (0.058 sec)
Usage¶
Working with structured data and container data types in CrateDB.
Object Column Strictness
For columns of type OBJECT, CrateDB supports different policies about the behaviour with undefined attributes, namely STRICT, DYNAMIC, and IGNORED, see Object column policy.
- STRICT:
Reject any sub-column that is not defined upfront.
- DYNAMIC:
INSERT operations may dynamically add new sub-columns to the object definition. This is the default setting.
- IGNORED:
Also means DYNAMIC, but dynamically added sub-columns do not cause a schema update, and the new values will not be indexed. Because IGNORED columns are not recorded in the schema, you can insert mixed types into them. For example, one row may insert an integer and the next row may insert an object. Objects with a STRICT or DYNAMIC column policy do not allow this.
Querying DYNAMIC OBJECTs
To support querying DYNAMIC OBJECTs using SQL, where keys may not exist within an OBJECT, CrateDB provides the error_on_unknown_object_key session setting. It controls the behaviour when querying unknown object keys to dynamic objects.
By default, CrateDB will raise an error if any of the queried object keys are
unknown. When adjusting this setting to false
, it will return NULL
as the
value of the corresponding key.
Example using SET error_on_unknown_object_key = false;
cr> CREATE TABLE testdrive (item OBJECT(DYNAMIC));
CREATE OK, 1 row affected (0.563 sec)
cr> SELECT item['unknown'] FROM testdrive;
ColumnUnknownException[Column item['unknown'] unknown]
cr> SET error_on_unknown_object_key = false;
SET OK, 0 rows affected (0.001 sec)
cr> SELECT item['unknown'] FROM testdrive;
+-----------------+
| item['unknown'] |
+-----------------+
+-----------------+
SELECT 0 rows in set (0.051 sec)
Learn¶
Written tutorials and video guides about working with CrateDB’s container data types.
Articles
Tutorials
The Basics of CrateDB Objects
Learn the basics of CrateDB Objects. This tutorial is also available as video Getting Started with CrateDB Objects.
Fundamentals
Docker
OBJECT
SQL
Querying Nested Structured Data
Today’s data management tasks need to handle multi-structured and nested data from different data sources. CrateDB’s dynamic OBJECT data type allows you to store and analyze complex and nested data efficiently.
In this tutorial, we will explore how to leverage this feature in marketing data analysis, along with the use of generated columns, to parse and manage URLs.
Fundamentals
OBJECT
SQL
Videos
Getting Started with CrateDB Objects
In this video, you will learn the basics of CrateDB Objects. It illustrates a simple use case to demonstrate how CrateDB Objects can add clarity to data models.
The talk gives an overview of the column policies for CrateDB OBJECTs: dynamic, strict, and ignored. It also provides examples of how these policies affect INSERT statements. Last but not least, it outlines how to insert, update, and delete records with OBJECT data types.
Fundamentals
OBJECT
SQL
Ingesting and Querying JSON Documents with SQL
Learn how to unleash the power of nested data with CrateDB on behalf of an IoT use case, and a marketing analytics use case, using deeply nested data.
Fundamentals
OBJECT
SQL
Dynamic Schemas and Indexing Objects
Learn more about OBJECTs from the perspective of dynamic schema evolution and about OBJECT indexing.
Fundamentals
OBJECT
SCHEMA
See also
Product: Multi-model Database • JSON Database • Dynamic Database Schemas • Nested Data Structure • Relational Database