Logical replication

Table of contents

Logical replication is a method of data replication across multiple clusters. CrateDB uses a publish and subscribe model where subscribers pull data from the publications of the publisher they subscribed to.

Replicated tables on a subscriber can again be published further to other clusters and thus chaining subscriptions is possible.

Note

A replicated index on a subscriber is read-only.

Logical replication is useful for the following use cases:

  • Consolidating data from multiple clusters into a single one for aggregated reports.

  • Ensure high availability if one cluster becomes unavailable.

  • Replicating between different compatible versions of CrateDB.

Publication

A publication is the upstream side of logical replication and it’s created on the cluster which acts as a data source.

Each table can be added to multiple publications if needed. Publications can only contain tables. All operation types (INSERT, UPDATE, DELETE and schema changes) are replicated.

Every publication can have multiple subscribers.

A publication is created using the CREATE PUBLICATION command. The individual tables can be added or removed dynamically using ALTER PUBLICATION. Publications can be removed using the DROP PUBLICATION command.

Caution

The publishing cluster must have soft_deletes.enabled set to true so that a subscribing cluster can catch up with all changes made during replication pauses caused by network issues or explicitly done by a user.

Also, soft_deletes.retention_lease.period should be greater than or equal to replication.logical.reads_poll_duration.

Subscription

A subscription is the downstream side of logical replication. A subscription defines the connection to another database and set of publications to which it wants to subscribe. By default, the subscription creation triggers the replication process on the subscriber cluster. The subscriber cluster behaves in the same way as any other CrateDB cluster and can be used as a publisher for other clusters by defining its own publications.

A cluster can have multiple subscriptions. It is also possible for a cluster to have both subscriptions and publications. A cluster cannot subscribe to locally already existing tables, therefore it is not possible to setup a bi-directional replication (both sides subscribing to ALL TABLES leads to a cluster trying to replicate its own tables from another cluster). However, two clusters still can cross-subscribe to each other if one cluster subscribes to locally non-existing tables of another cluster and vice versa.

A subscription is added using the CREATE SUBSCRIPTION command and can be removed using the DROP SUBSCRIPTION command. A subscription starts replicating on its creation and stops on its removal (if no failure happen in-between).

Published tables must not exist on the subscriber. A cluster cannot subscribe to a table on another cluster if it exists already on its side, therefore it’s not possible to drop and re-create a subscription without starting from scratch i.e removing all replicated tables.

Only regular tables (including partitions) may be the target of a replication. For example, you can not replicate system tables or views.

The tables are matched between the publisher and the subscriber using the fully qualified table name. Replication to differently-named tables on the subscriber is not supported.

Security

To create, alter or drop a publication, a user must have the AL privilege on the cluster. Only the owner (the user who created the publication) or a superuser is allowed to ALTER or DROP a publication. To add tables to a publication, the user must have DQL, DML, and DDL privileges on the table. When a user creates a publication that publishes all tables automatically, only those tables where the user has DQL, DML, and DDL privileges will be published. The user a subscriber uses to connect to the publisher must have DQL privileges on the published tables. Tables, included into a publication but not available for a subscriber due to lack of DQL privilege, will not be replicated.

To create or drop a subscription, a user must have the AL privilege on the cluster. Only the owner (the user who created the subscription) or a superuser is allowed to DROP a subscription.

Caution

A network setup that allows the two clusters to communicate is a pre-requisite for a working publication/subscription setup. See HBA.

Monitoring

All publications are listed in the pg_publication table. More details for a publication are available in the pg_publication_tables table. It lists the replicated tables for a specific publication.

All subscriptions are listed in the pg_subscription table. More details for a subscription are available in the pg_subscription_rel table. The table contains detailed information about the replication state per table, including error messages if there was an error.