`CREATE REPOSITORY`¶

You can use the CREATE REPOSITORY statement to register a new repository that you can use to create, manage, and restore snapshots.

Synopsis ¶

CREATE REPOSITORY repository_name TYPE type
[ WITH (parameter_name [= value], [, ...]) ]

Description ¶

The CREATE REPOSITORY statement creates a repository with a repository name and repository type. You can configure the different types of repositories WITH additional parameters.

Note

If the back-end data storage (specific to the repository type) already contains CrateDB snapshots, they will become available to the cluster.

Parameters ¶

repository_name: The name of the repository to register.

type: The repository type.

Caution

You cannot change any repository parameters after creating the repository (including parameters set by the WITH clause).

Suppose you want to use new parameters for an existing repository. In that case, you must first drop the repository using the DROP REPOSITORY statement and then recreate it with a new CREATE REPOSITORY statement.

When you drop a repository, CrateDB deletes the corresponding record from sys.repositories but does not delete any snapshots from the corresponding backend data storage. If you create a new repository using the same backend data storage, any existing snapshots will become available again.

Clauses ¶

`WITH`¶

You can use the WITH clause to specify one or more repository parameter values:

[ WITH (parameter_name [= value], [, ...]) ]

Parameters¶

The following parameters apply to all repository types:

max_restore_bytes_per_sec

The maximum rate (bytes per second) at which a single CrateDB node will read snapshot data from this repository.

Default: 40mb

max_snapshot_bytes_per_sec

The maximum rate (bytes per second) at which a single CrateDB node will write snapshot data to this repository.

Default: 40mb

All other parameters (see the next section) are specific to the repository type.

Types ¶

CrateDB includes built-in support for the following types:

fs
s3
azure
gcs
url

CrateDB can support additional types via plugins.

`fs`¶

An fs repository stores snapshots on the local file system. If a cluster has multiple nodes, you must use a shared data storage volume mounted locally on all master nodes and data nodes.

Note

To create fs repositories, you must configure the list of allowed file system paths using the path.repo setting.

Parameters¶

location

Type: text

Required

An absolute or relative path to the directory where CrateDB will store snapshots. If the path is relative, CrateDB will append it to the first entry in the path.repo setting.

Windows UNC paths are allowed if the server name and shares are specified and backslashes are escaped.

The path must be allowed by the path.repo setting.

compress

Type: boolean

Default: true

Whether CrateDB should compress the metadata part of the snapshot or not.

CrateDB does not compress the actual table data.

chunk_size: Type: bigint or text

Default: null

Defines the maximum size of any single file that comprises the snapshot. If set to null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g, 5m, or 9k). If no unit is specified, the unit defaults to bytes.

`s3`¶

An s3 repository stores snapshot on the Amazon Simple Storage Service (Amazon S3).

Note

If you are using Amazon S3 in conjunction with IAM roles, the access_key and secret_key parameters must be left undefined.

Additionally, make sure to attach the IAM to each EC2 instance that will run a CrateDB master node or data node. The attached IAM role will provide the necessary credentials when required.

Parameters¶

access_key: Type: text

Required: false

Access key used for authentication against Amazon Web Services (AWS).

Note

CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.

secret_key: Type: text

Required: false

The secret key used for authentication against AWS.

Note

CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.

endpoint: Type: text

Default: The default AWS API endpoint

The AWS API endpoint to use.

Tip

You can specify a regional endpoint to force the use of a specific AWS region.

protocol: Type: text

Values: http, https

Default: https

Protocol to use.

bucket

Type: text

Name of the Amazon S3 bucket used for storing snapshots.

If the bucket does not yet exist, CrateDB will attempt to create a new bucket on Amazon S3.

base_path

Type: text

Default: root directory

The bucket path to use for snapshots.

The path is relative, so the base_path value must not start with a / character.

compress

Type: boolean

Default: true

Whether CrateDB should compress the metadata part of the snapshot or not.

CrateDB does not compress the actual table data.

chunk_size: Type: bigint or text

Default: null

Defines the maximum size of any single file that comprises the snapshot. If set to null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g, 5m, or 9k). If no unit is specified, the unit defaults to bytes.

readonly: Type: boolean

Default: false

If true, the repository is read-only.

server_side_encryption: Type: boolean

Default: false

If true, files are server-side encrypted by AWS using the AES256 algorithm.

buffer_size

Type:    text
Default: 5mb
Minimum: 5mb

If a chunk is smaller than buffer_size, CrateDB will upload the chunk with a single request.

If a chunk exceeds buffer_size, CrateDB will split the chunk into multiple parts of buffer_size length and upload them separately.

max_retries: Type: integer

Default: 3

The number of retries in case of errors.

use_throttle_retries: Type: boolean

Default: true

Whether CrateDB should throttle retries (i.e., should back off).

canned_acl: Type: text

Values: private, public-read, public-read-write, authenticated-read, log-delivery-write, bucket-owner-read, or bucket-owner-full-control

Default: private

When CrateDB creates new buckets and objects, the specified Canned ACL is added.

storage_class: Type: text

Values: standard, reduced_redundancy or standard_ia

Default: standard

The S3 storage class used for objects stored in the repository. This only affects the S3 storage class used for newly created objects in the repository.

use_path_style_access: Type: boolean

Default: false

Whether CrateDB should use path style access. Useful for some S3-compatible providers.

`azure`¶

An azure repository stores snapshots on the Azure Blob storage service.

Parameters¶

account: Type: text

The Azure Storage account name.

Note

CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.

key: Type: text

The Azure Storage account secret key.

Note

CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.

endpoint: Type: text

The Azure Storage account endpoint.

Tip

You can use an sql-create-repo-azure-endpoint to access Azure Storage instances served on private endpoints.

Note

endpoint cannot be used in combination with sql-create-repo-azure-endpoint_suffix.

secondary_endpoint: Type: text

The Azure Storage account secondary endpoint.

Note

secondary_endpoint cannot be used if sql-create-repo-azure-endpoint is not specified.

endpoint_suffix: Type: text

Default: core.windows.net

The Azure Storage account endpoint suffix.

Tip

You can use an endpoint suffix to force the use of a specific Azure service region.

container: Type: text

Default: crate-snapshots

The blob container name.

Note

You must create the container before creating the repository.

base_path: Type: text

Default: root directory

The container path to use for snapshots.

compress

Type: boolean

Default: true

Whether CrateDB should compress the metadata part of the snapshot or not.

CrateDB does not compress the actual table data.

chunk_size: Type: bigint or text

Default: 256mb

Maximum: 256mb

Minimum: 1b

Defines the maximum size of any single file that comprises the snapshot. If set to null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g, 5m, or 9k). If no unit is specified, the unit defaults to bytes.

readonly: Type: boolean

Default: false

If true, the repository is read-only.

location_mode: Type: text

Values: primary_only, secondary_only, primary_then_secondary, secondary_then_primary

Default: primary_only

The location mode for storing blob data.

Note

If you set location_mode to secondary_only, readonly will be forced to true.

max_retries: Type: integer

Default: 3

The number of retries (in the case of failures) before considering the snapshot to be failed.

timeout: Type: text

Default: 30s

The client side timeout for any single request to Azure.

proxy_type: Type: text

Values: http, socks, or direct

Default: direct

The type of proxy to use when connecting to Azure.

proxy_host: Type: text

The hostname of the proxy.

proxy_port: Type: integer

Default: 0

The port number of the proxy.

`gcs`¶

A gcs repository stores snapshots on the Google Cloud Storage service.

Parameters¶

bucket: Type: text

Required

Name of the Google Cloud Storage bucket used for storing snapshots. The bucket must already exist before the repository is created.

private_key_id: Type: text

Required

The Private key id for the Google Service account from the json Google Service account credentials.

Note

This parameter will be masked (shown as [xxxxx]) when querying sys.repositories table.

private_key: Type: text

Required

The private key in PKCS 8 format for the Google Service account from the json Google Service account credentials.

Note

This parameter will be masked (shown as [xxxxx]) when querying sys.repositories table.

client_id: Type: text

Required

The client id for the Google Service account from the json Google Service account credentials.

Note

This parameter will be masked (shown as [xxxxx]) when querying sys.repositories table.

client_email: Type: text

Required

The client email for the Google Service account from the json Google Service account credentials.

Note

This parameter will be masked (shown as [xxxxx]) when querying sys.repositories table.

base_path: Type: text

Default: root directory

The container path to use for snapshots.

compress: Type: boolean

Default: true

Whether CrateDB should compress the metadata part of the snapshot or not.

chunk_size: Type: bigint or text

Default: null

Defines the maximum size of any single file that comprises the snapshot. If set to null, the default value 5 Terabyte is used. You can specify the chunk size with units (e.g., 1g, 5m, or 9k). If no unit is specified, the unit defaults to bytes.

connect_timeout: Type: text

Default: 0

Defines the timeout to establish a connection to the Google Cloud Storage service. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value 0 indicates to use the default value of 20s from the Google Cloud Storage library.

read_timeout: Type: `text

Default: 0

Defines the timeout to read data from an established connection. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value 0 indicates to use the default value of 20s from the Google Cloud Storage library.

endpoint: Type: text

Required: false

Endpoint root url to connect to an alternative storage provider.

token_uri: Type: text

Required: false

Endpoint oauth token URI to connect to an alternative oauth provider.

`url`¶

A url repository provides read-only access to an fs repository via one of the supported network access protocols.

You can use a url repository to restore snapshots.

Parameters¶

url: Type: text

The root URL of the fs repository.

Note

The URL must match one of the URLs configured by the repositories.url.allowed_urls setting.

CREATE REPOSITORY¶

Parameters¶

Parameters¶

Parameters¶

Parameters¶

Parameters¶

Parameters¶

Subscribe to the CrateDB Newsletter now

`CREATE REPOSITORY`¶