New release: CrateDB 6.2

A new feature release of CrateDB, version 6.2, has been released today. This release focuses on PostgreSQL compatibility and operational maturity: featuring CREATE/DROP SCHEMA, UUID datatype, performance and memory optimizations for high workloads scenario and improved metadata gathering.

The release notes can be found here.

SQL and PostgreSQL compatibility

Explicit schemas

While CrateDB implicitly creates schemas, they can now be managed explicitly with the new SQL statements: CREATE SCHEMA/DROP SCHEMA.

Schemas created with a CREATE TABLE statement are implicitly deleted once the last remaining table is dropped. Schemas created with CREATE SCHEMA can only be dropped with DROP SCHEMA.

UUID Datatype

Previously, UUIDs (or any non-integer id) needed to be stored in TEXT; they can now be stored in their own efficient binary implementation, heavily reducing storage footprint.

An UUID, which has 36 characters in text (base16 with dashes) will now take 16 bytes instead of 36. To insert an UUID it is expected to be in base16-dashes encoding.

INSERT INTO t VALUES ('2205e7cc-5357-46d7-a5a8-4dc2a1f20e41');

Anything not conforming to the spec will fail.

Lifted column name restriction

It is now possible to use an underscore (_) as the first character but it is recommended to use that sparingly to avoid possible clashes if CrateDB adds new system columns later.

Because of the restriction lift, prior to version 6.2, write operations on internal and system columns would be ignored and from 6.2, these write operations will cause an error.

INSERT INTO
  tbl (x, _id, _version, _seq_no)
VALUES
  (1, 100, 200, 300);
-- InvalidColumnNameException ["_id" Cannot write to system column]

Improved SQL compatibility of EXISTS subqueries

Now subqueries in EXISTS predicates can return more than one column, in this example: c.customer_id and c.country.

SELECT
  *
FROM
  orders o
WHERE
  EXISTS (
    SELECT
      c.customer_id,  --< first column
      c.country       --< second column
    FROM
      customers c
    WHERE
      c.customer_id = o.customer_id
      AND c.country = 'AT'
  )

Several 3rd party tools – including the PostgreSQL driver for R – can issue subqueries with multiple columns. The change enhances SQL compatibility and makes it possible to reuse subqueries without rewrites.

Postgres Wire Protocol 3.2 compatibility

With PostgreSQL 18, a new version of the PG Wire Protocol was released. CrateDB 6.2 can negotiate with clients to lower the PG Wire Protocol version to a supported version.

Stability and performance

hyperloglog_distinct more efficient

The memory footprint of the hyperloglog_distinct aggregate function has been improved. Benefitting queries where the function is used together with GROUP BY and many unique keys.

SQL Parser optimized

The memory footprint of the SQL parser has been optimised for cases with hundreds of thousands of unique SQL statements where the internal cache can use a significant amount of memory.

Faster views with subqueries

A new filter pushdown optimisation was added for views that include a scalar subquery in its definition. Queries on these views can now take advantage of the search index.

Consider this view:

CREATE VIEW v1 AS
SELECT
  *
FROM
  t1
WHERE
  x in (select a from t2)

A query like:

SELECT * from v1 where y = 10

Will perform faster since the filter y = 10 is no longer its own separate step; the filter is now pushed down to the collect phase, where it’ll be executed in Lucene, leveraging the index.

We can visualise this by comparing the query plans.

b(20)

Administration

New system information columns

To be able to get better insights into workloads, many new system information columns were added:

The total_affected_row_count has been added to the QueryStats JMX MBean and the sys.jobs_metrics table. This is useful in cases where statements are batched together.

The columns is_master and roles have been added to the sys.nodes tables. These can also be exported through the NodeInfo JMX MXBean in Prometheus format by using the jmx_exporter version 1.2.3.
The columns refresh_stats and merge_stats have been added to sys.shards table.
The column_details[‘oid’] has been added to information_schema.columns

Alter table timeout

It is now possible to specify a timeout when modifying a table using a WITH clause.

Some operations, like changing the number of shards, especially on large tables, may require a long time to finish, where the default timeout of 30s (or 60s depending on the operation) is not enough. Supporting user-defined timeouts make such operation succeed where they previously failed with a timeout exception.

The following example changes the timeout to 10 minutes (600 seconds)

ALTER TABLE
  tbl PARTITION (p = 1)
SET
  (number_of_shards = 2) WITH (timeout = '600s')

Tcp fallback

It is now possible to configure TCP fallback for SRV DNS

Improved HTTP error codes

Some error codes were improved; two are breaking changes:

5030 -> 5005 and 4100 -> 40910.

See the list of errors for more details.

New release: CrateDB 6.2

SQL and PostgreSQL compatibility

Explicit schemas

UUID Datatype

Lifted column name restriction

Improved SQL compatibility of EXISTS subqueries

Postgres Wire Protocol 3.2 compatibility

Stability and performance

hyperloglog_distinct more efficient

SQL Parser optimized

Faster views with subqueries

Administration

New system information columns

Alter table timeout

Tcp fallback

Improved HTTP error codes

Company

Ecosystem

Contact

New release: CrateDB 6.2

SQL and PostgreSQL compatibility

Explicit schemas

UUID Datatype

Lifted column name restriction

Improved SQL compatibility of EXISTS subqueries

Postgres Wire Protocol 3.2 compatibility

Stability and performance

hyperloglog_distinct more efficient

SQL Parser optimized

Faster views with subqueries

Administration

New system information columns

Alter table timeout

Tcp fallback

Improved HTTP error codes

Latest Posts

Why Time Series Data Without Context Is Just Noise

Why Most JSON Databases Fail at Real-Time Analytics

Distributed Databases for Real-Time Analytics: Architecture and Tradeoffs